Skip to content
  • Rupert Swarbrick's avatar
    Add an SSE4.1 implementation of av1_highbd_convolve_2d_scale · 724d31eb
    Rupert Swarbrick authored
    For large blocks this is about 8x the speed of the C version. The code
    needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit
    multiplies.
    
    The patch uses av1_convolve_scale_test (written already to test the
    low bit depth path) to make sure the optimised code matches the C
    version.
    
    Change-Id: I9304d6bb3d2cb31390de93ed08ff1a852e3ace86
    724d31eb