-
Rupert Swarbrick authored
For large blocks this is almost 8x the speed of the C version. The code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit multiplies. This patch also makes av1_convolve_scale_test actually test something, making sure the optimised code matches the C version. The slightly excessive generality in the test (all the templating) is because of a following patch, which is for the high bit depth path and can then use most of the same test code. Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
98dc22b8