AVX2 Convolve Optimization
Two convolve functions were optimized for AVX2: 1. vp9_filter_block1d16_h8 2. vp9_filter_block1d16_v8 vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of loop strides by half, two strides were processed in parallel. vp9_filter_block1d16_v8 was also optimized in the same way also some of the loads were being done outside of the loop and by that preventing redundant loads. This Optimization gives 43% function level gain and 1.3% user level gain. Now can be compiled in Windows Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
Showing
- vp9/common/vp9_rtcd_defs.sh 3 additions, 3 deletionsvp9/common/vp9_rtcd_defs.sh
- vp9/common/x86/vp9_asm_stubs.c 42 additions, 0 deletionsvp9/common/x86/vp9_asm_stubs.c
- vp9/common/x86/vp9_subpixel_8t_intrin_avx2.c 542 additions, 0 deletionsvp9/common/x86/vp9_subpixel_8t_intrin_avx2.c
- vp9/vp9_common.mk 1 addition, 0 deletionsvp9/vp9_common.mk
Loading
Please register or sign in to comment