1. 02 Feb, 2018 1 commit
    • Imdad Sardharwalla's avatar
      AVX2 implementation of the Wiener filter · aab6aee3
      Imdad Sardharwalla authored
      Added an AVX2 version of the Wiener filter, along with associated tests. Speed
      tests have been added for all implementations of the Wiener filter.
      
      Speed Test results
      ==================
      
      GCC
      ---
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~92% less time
      - AVX2 vs C: AVX2 takes ~96% less time
      - SSE2 vs AVX2: AVX2 takes ~43% less time (~74% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~92% less time
      - AVX2  vs C: AVX2  takes ~96% less time
      - SSSE3 vs AVX2: AVX2 takes ~46% less time (~84% faster)
      
      CLANG
      -----
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~84% less time
      - AVX2 vs C: AVX2 takes ~88% less time
      - SSE2 vs AVX2: AVX2 takes ~27% less time (~36% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~85% less time
      - AVX2  vs C: AVX2  takes ~89% less time
      - SSS3  vs AVX2: AVX2 takes ~24% less time (~31% faster)
      
      Change-Id: Ide22d7c09c0be61483e9682caf17a39438e4a208
      aab6aee3
  2. 27 Dec, 2017 1 commit
  3. 23 May, 2017 1 commit
    • David Barker's avatar
      Vectorize high-precision convolve filter · 5d34e6a7
      David Barker authored
      Add SSE2 lowbd and SSSE3 highbd versions of the filters
      introduced in https://aomedia-review.googlesource.com/c/11962/ .
      
      These filters are equivalent in speed to the SSE2 implementations
      of the regular convolve filter. The average time to filter a
      64x64 block is:
      
      lowbd C: 52us
      lowbd SSE2: 5.6us
      highbd C: 53us
      highbd SSSE3: 5.8us
      
      Also add a correctness test based on the warp filter tests.
      
      Change-Id: Ia0d81100e8a414bbfc2b5f664d751cf24765299e
      5d34e6a7