Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
Showing
- test/variance_test.cc 327 additions, 40 deletionstest/variance_test.cc
- vp9/common/vp9_rtcd_defs.sh 19 additions, 23 deletionsvp9/common/vp9_rtcd_defs.sh
- vp9/encoder/x86/vp9_subpel_variance.asm 1061 additions, 0 deletionsvp9/encoder/x86/vp9_subpel_variance.asm
- vp9/encoder/x86/vp9_subpel_variance_impl_sse2.asm 0 additions, 308 deletionsvp9/encoder/x86/vp9_subpel_variance_impl_sse2.asm
- vp9/encoder/x86/vp9_variance_impl_mmx.asm 0 additions, 341 deletionsvp9/encoder/x86/vp9_variance_impl_mmx.asm
- vp9/encoder/x86/vp9_variance_impl_sse2.asm 0 additions, 27 deletionsvp9/encoder/x86/vp9_variance_impl_sse2.asm
- vp9/encoder/x86/vp9_variance_impl_ssse3.asm 0 additions, 372 deletionsvp9/encoder/x86/vp9_variance_impl_ssse3.asm
- vp9/encoder/x86/vp9_variance_mmx.c 0 additions, 235 deletionsvp9/encoder/x86/vp9_variance_mmx.c
- vp9/encoder/x86/vp9_variance_sse2.c 82 additions, 372 deletionsvp9/encoder/x86/vp9_variance_sse2.c
- vp9/encoder/x86/vp9_variance_ssse3.c 0 additions, 142 deletionsvp9/encoder/x86/vp9_variance_ssse3.c
- vp9/vp9cx.mk 1 addition, 2 deletionsvp9/vp9cx.mk
Loading
Please register or sign in to comment