Optimize 8x8 idct function
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
Showing
- vp9/common/vp9_idct.h 3 additions, 0 deletionsvp9/common/vp9_idct.h
- vp9/common/vp9_rtcd_defs.sh 2 additions, 2 deletionsvp9/common/vp9_rtcd_defs.sh
- vp9/common/x86/vp9_idct_x86.c 399 additions, 0 deletionsvp9/common/x86/vp9_idct_x86.c
- vp9/decoder/vp9_dequantize.c 2 additions, 2 deletionsvp9/decoder/vp9_dequantize.c
- vp9/encoder/x86/vp9_dct_sse2_intrinsics.c 0 additions, 3 deletionsvp9/encoder/x86/vp9_dct_sse2_intrinsics.c
Loading
Please register or sign in to comment