• Imdad Sardharwalla's avatar
    AVX2 implementation of the Wiener filter · aab6aee3
    Imdad Sardharwalla authored
    Added an AVX2 version of the Wiener filter, along with associated tests. Speed
    tests have been added for all implementations of the Wiener filter.
    
    Speed Test results
    ==================
    
    GCC
    ---
    
    Low bit-depth filter:
    - SSE2 vs C: SSE2 takes ~92% less time
    - AVX2 vs C: AVX2 takes ~96% less time
    - SSE2 vs AVX2: AVX2 takes ~43% less time (~74% faster)
    
    High bit-depth filter:
    - SSSE3 vs C: SSSE3 takes ~92% less time
    - AVX2  vs C: AVX2  takes ~96% less time
    - SSSE3 vs AVX2: AVX2 takes ~46% less time (~84% faster)
    
    CLANG
    -----
    
    Low bit-depth filter:
    - SSE2 vs C: SSE2 takes ~84% less time
    - AVX2 vs C: AVX2 takes ~88% less time
    - SSE2 vs AVX2: AVX2 takes ~27% less time (~36% faster)
    
    High bit-depth filter:
    - SSSE3 vs C: SSSE3 takes ~85% less time
    - AVX2  vs C: AVX2  takes ~89% less time
    - SSS3  vs AVX2: AVX2 takes ~24% less time (~31% faster)
    
    Change-Id: Ide22d7c09c0be61483e9682caf17a39438e4a208
    aab6aee3
aom_highbd_convolve_hip_avx2.c 11.6 KB