Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
Showing
- vp9/common/vp9_rtcd_defs.pl 1 addition, 1 deletionvp9/common/vp9_rtcd_defs.pl
- vp9/encoder/vp9_encoder.c 30 additions, 0 deletionsvp9/encoder/vp9_encoder.c
- vp9/encoder/vp9_mcomp.c 10 additions, 16 deletionsvp9/encoder/vp9_mcomp.c
- vp9/encoder/vp9_mcomp.h 3 additions, 3 deletionsvp9/encoder/vp9_mcomp.h
- vp9/encoder/x86/vp9_diamond_search_sad_avx.c 322 additions, 0 deletionsvp9/encoder/x86/vp9_diamond_search_sad_avx.c
- vp9/vp9cx.mk 1 addition, 0 deletionsvp9/vp9cx.mk
Loading
Please register or sign in to comment