AVX2 SubPixel AVG Variance Optimization
Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_avg_variance64x64 2. vp9_sub_pixel_avg_variance32x32 both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 80% function level gain and 2% user level gain Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
Showing
- vp9/common/vp9_rtcd_defs.sh 2 additions, 2 deletionsvp9/common/vp9_rtcd_defs.sh
- vp9/encoder/x86/vp9_subpel_variance_impl_intrin_avx2.c 451 additions, 553 deletionsvp9/encoder/x86/vp9_subpel_variance_impl_intrin_avx2.c
- vp9/encoder/x86/vp9_variance_avx2.c 59 additions, 0 deletionsvp9/encoder/x86/vp9_variance_avx2.c
Loading
Please register or sign in to comment