-
Rupert Swarbrick authored
For large blocks this is about 8x the speed of the C version. The code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit multiplies. The patch uses av1_convolve_scale_test (written already to test the low bit depth path) to make sure the optimised code matches the C version. Change-Id: I9304d6bb3d2cb31390de93ed08ff1a852e3ace86
724d31eb