Skip to content
  • Rupert Swarbrick's avatar
    Add an SSE4.1 implementation of av1_convolve_2d_scale · 98dc22b8
    Rupert Swarbrick authored
    For large blocks this is almost 8x the speed of the C version. The
    code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD
    32-bit multiplies.
    
    This patch also makes av1_convolve_scale_test actually test something,
    making sure the optimised code matches the C version. The slightly
    excessive generality in the test (all the templating) is because of a
    following patch, which is for the high bit depth path and can then use
    most of the same test code.
    
    Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
    98dc22b8