1. 16 Jan, 2018 1 commit
    • David Michael Barr's avatar
      [CFL] SSSE3/AVX2 versions of cfl_build_prediction_hbd · c363ab76
      David Michael Barr authored
      Includes unit tests for conformance and speed.
      
      SSSE3/CFLPredictHBDTest:
      4x4: C time = 1436 us, SIMD time = 358 us (~4x)
      8x8: C time = 4821 us, SIMD time = 598 us (~8.1x)
      16x16: C time = 18528 us, SIMD time = 1793 us (~10x)
      32x32: C time = 72998 us, SIMD time = 6400 us (~11x)
      
      AVX2/CFLPredictHBDTest:
      4x4: C time = 1436 us, SIMD time = 398 us (~3.6x)
      8x8: C time = 4924 us, SIMD time = 644 us (~7.6x)
      16x16: C time = 18624 us, SIMD time = 1617 us (~12x)
      32x32: C time = 73509 us, SIMD time = 3635 us (~20x)
      
      Change-Id: Icbcfefbf165facdbd77c9b3861af2bbf464254a0
      c363ab76
  2. 15 Jan, 2018 1 commit
    • Yaowu Xu's avatar
      Change to use an unaligned store · 533ac34d
      Yaowu Xu authored
      This fixes a segmentation fault in unit test:
      AVX2/CFLPredictTest.PredictTest/7
      
      Change-Id: I173340965f465a82019167e0964b9901683b60a8
      533ac34d
  3. 11 Jan, 2018 1 commit
    • David Michael Barr's avatar
      [CFL] SSSE3/AVX2 versions of cfl_build_prediction_lbd · 16f38c2c
      David Michael Barr authored
      Includes unit tests for conformance and speed.
      
      SSSE3/CFLPredictTest:
      4x4: C time = 2063 us, SIMD time = 313 us (~6.6x)
      8x8: C time = 6656 us, SIMD time = 493 us (~14x)
      16x16: C time = 24970 us, SIMD time = 1327 us (~19x)
      32x32: C time = 59020 us, SIMD time = 5178 us (~11x)
      
      AVX2/CFLPredictTest:
      4x4: C time = 2052 us, SIMD time = 333 us (~6.2x)
      8x8: C time = 6712 us, SIMD time = 513 us (~13x)
      16x16: C time = 25292 us, SIMD time = 1023 us (~25x)
      32x32: C time = 58994 us, SIMD time = 2828 us (~21x)
      
      Change-Id: I08690a548be981ff10e184de468b9e0e691ee812
      16f38c2c
  4. 08 Jan, 2018 1 commit
    • Luc Trudeau's avatar
      [CFL] SSSE3/AVX2 versions of luma_subsampling_420_lbd · 9bd42785
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSSE2/SubsampleSpeedTest:
      4x4: C time = 868 us, SIMD time = 200 us (~4.3x)
      8x8: C time = 3054 us, SIMD time = 293 us (~10x)
      16x16: C time = 11887 us, SIMD time = 760 us (~16x)
      
      AVX2/SubsampleSpeedTest:
      4x4: C time = 784 us, SIMD time = 205 us (~3.8x)
      8x8: C time = 2774 us, SIMD time = 307 us (~9x)
      16x16: C time = 10978 us, SIMD time = 489 us (~22x)
      
      Change-Id: I7d5958097542599d57d1a9f9a0a1b809c6a345b0
      9bd42785
  5. 21 Dec, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] SSE2/AVX2 versions of subtract_average · b4faea73
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSE2/CFLAverageSpeedTest:
      4x4: C time = 499 us, SIMD time = 156 us (~3.2x)
      8x8: C time = 1124 us, SIMD time = 221 us (~5.1x)
      16x16: C time = 4228 us, SIMD time = 620 us (~6.8x)
      32x32: C time = 8743 us, SIMD time = 2236 us (~3.9x)
      
      AVX2/CFLAverageSpeedTest:
      4x4: C time = 482 us, SIMD time = 180 us (~2.7x)
      8x8: C time = 1007 us, SIMD time = 227 us (~4.4x)
      16x16: C time = 3471 us, SIMD time = 324 us (~11x)
      32x32: C time = 8758 us, SIMD time = 1443 us (~6.1x)
      
      Change-Id: Id5ae80142a9764f388c0770ebcff4e46fa3a4dad
      b4faea73