Do vertical loopfiltering in parallel
This patch followed "Add filter_selectively_vert_row2 to enable parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. For other optimizations (neon and dspr2), current 16-pixel functions were done by calling 8-pixel functions twice, and real 16-pixel functions could be added later. Decoder speedup: tulip clip: 2% speed gain; old_town_cross: 1.2% speed gain; bus: 2% speed gain. Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
Showing
- vp9/common/arm/neon/vp9_loopfilter_16_neon.c 31 additions, 0 deletionsvp9/common/arm/neon/vp9_loopfilter_16_neon.c
- vp9/common/mips/dspr2/vp9_loopfilter_filters_dspr2.c 55 additions, 0 deletionsvp9/common/mips/dspr2/vp9_loopfilter_filters_dspr2.c
- vp9/common/vp9_loopfilter.c 12 additions, 19 deletionsvp9/common/vp9_loopfilter.c
- vp9/common/vp9_loopfilter_filters.c 81 additions, 0 deletionsvp9/common/vp9_loopfilter_filters.c
- vp9/common/vp9_rtcd_defs.sh 11 additions, 2 deletionsvp9/common/vp9_rtcd_defs.sh
- vp9/common/x86/vp9_loopfilter_intrin_sse2.c 125 additions, 57 deletionsvp9/common/x86/vp9_loopfilter_intrin_sse2.c
Loading
Please register or sign in to comment