Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list: vp8_sad16x16x8_sse4 vp8_sad16x8x8_sse4 vp8_sad8x16x8_sse4 vp8_sad8x8x8_sse4 vp8_sad4x4x8_sse4 (test clip: tulip) For best quality mode, this gave encoder a 5% performance boost. For good quality mode with speed=1, this gave encoder a 3% performance boost. Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
Showing
- build/make/configure.sh 1 addition, 0 deletionsbuild/make/configure.sh
- configure 1 addition, 0 deletionsconfigure
- vp8/encoder/generic/csystemdependent.c 6 additions, 0 deletionsvp8/encoder/generic/csystemdependent.c
- vp8/encoder/mcomp.c 153 additions, 1 deletionvp8/encoder/mcomp.c
- vp8/encoder/mcomp.h 1 addition, 0 deletionsvp8/encoder/mcomp.h
- vp8/encoder/onyx_if.c 5 additions, 0 deletionsvp8/encoder/onyx_if.c
- vp8/encoder/sad_c.c 90 additions, 0 deletionsvp8/encoder/sad_c.c
- vp8/encoder/variance.h 43 additions, 0 deletionsvp8/encoder/variance.h
- vp8/encoder/x86/mcomp_x86.h 9 additions, 0 deletionsvp8/encoder/x86/mcomp_x86.h
- vp8/encoder/x86/sad_sse4.asm 353 additions, 0 deletionsvp8/encoder/x86/sad_sse4.asm
- vp8/encoder/x86/variance_x86.h 27 additions, 0 deletionsvp8/encoder/x86/variance_x86.h
- vp8/encoder/x86/x86_csystemdependent.c 16 additions, 7 deletionsvp8/encoder/x86/x86_csystemdependent.c
- vp8/vp8cx.mk 1 addition, 0 deletionsvp8/vp8cx.mk
- vpx_ports/x86.h 3 additions, 0 deletionsvpx_ports/x86.h
Loading
Please register or sign in to comment