av1/av1_common.mk · abd945105282e72a9409bb0104441847a5f71c87 · Xiph.Org / aom-rav1e

Add an SSE4.1 implementation of av1_convolve_2d_scale · 98dc22b8

Rupert Swarbrick authored Oct 04, 2017

For large blocks this is almost 8x the speed of the C version. The
code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD
32-bit multiplies.

This patch also makes av1_convolve_scale_test actually test something,
making sure the optimised code matches the C version. The slightly
excessive generality in the test (all the templating) is because of a
following patch, which is for the high bit depth path and can then use
most of the same test code.

Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11

98dc22b8