Implemented DST 16x16 SSE2 intrinsics optimization
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16(). - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2(). - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in fwd_txfm_16x16(). - Added vp10_fht16x16_sse2() unit test against C version: vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*). - Unit test passed. - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m, and mobile_cif.y4m. Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
Showing
- test/test.mk 1 addition, 0 deletionstest/test.mk
- test/vp10_fht16x16_test.cc 124 additions, 0 deletionstest/vp10_fht16x16_test.cc
- vp10/common/vp10_rtcd_defs.pl 1 addition, 1 deletionvp10/common/vp10_rtcd_defs.pl
- vp10/encoder/hybrid_fwd_txfm.c 1 addition, 4 deletionsvp10/encoder/hybrid_fwd_txfm.c
- vp10/encoder/x86/dct_sse2.c 402 additions, 0 deletionsvp10/encoder/x86/dct_sse2.c
Loading
Please register or sign in to comment