1. 02 Feb, 2018 2 commits
  2. 31 Jan, 2018 1 commit
  3. 29 Jan, 2018 1 commit
  4. 26 Jan, 2018 1 commit
  5. 17 Jan, 2018 1 commit
    • Imdad Sardharwalla's avatar
      SIMD implementation of horz superres · 454697ca
      Imdad Sardharwalla authored
      SSE4.1 implementations of av1_convolve_horiz_rs and
      av1_highbd_convolve_horiz_rs have been added, along
      with the corresponding speed and correctness tests.
      
      The interp_taps argument was defunct and has now been
      removed and replaced with the UPSCALE_NORMATIVE_TAPS
      macro.
      
      Code associated with values of UPSCALE_NORMATIVE_TAPS
      that are no longer used has been removed.
      
      Change-Id: Ie74d8ca479a70c8d473ac12883cfe4f10b37a66d
      454697ca
  6. 15 Jan, 2018 1 commit
  7. 12 Jan, 2018 1 commit
    • Imdad Sardharwalla's avatar
      Added AVX2 implementation of self-guided filter · c6acc531
      Imdad Sardharwalla authored
      The self-guided filter has now been implemented using
      the intrinsics for AVX2. The corresponding speed and
      correctness tests have also been added.
      
      Note: All AVX2 functions are in synonyms_avx2.h, as
      GCC produces 'ABI change' warnings if they are
      included in synonyms.h.
      
      Change-Id: I2a283a4acf8c01ee835d5edc526abc242d87ad9b
      c6acc531
  8. 08 Jan, 2018 1 commit
    • Luc Trudeau's avatar
      [CFL] SSSE3/AVX2 versions of luma_subsampling_420_lbd · 9bd42785
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSSE2/SubsampleSpeedTest:
      4x4: C time = 868 us, SIMD time = 200 us (~4.3x)
      8x8: C time = 3054 us, SIMD time = 293 us (~10x)
      16x16: C time = 11887 us, SIMD time = 760 us (~16x)
      
      AVX2/SubsampleSpeedTest:
      4x4: C time = 784 us, SIMD time = 205 us (~3.8x)
      8x8: C time = 2774 us, SIMD time = 307 us (~9x)
      16x16: C time = 10978 us, SIMD time = 489 us (~22x)
      
      Change-Id: I7d5958097542599d57d1a9f9a0a1b809c6a345b0
      9bd42785
  9. 06 Jan, 2018 1 commit
  10. 03 Jan, 2018 1 commit
    • Yaowu Xu's avatar
      Sync configure/make and cmake on daala_tx · 14fb1af6
      Yaowu Xu authored
      The two build systems treat inclusion of daala_tx related source file
      differently, this commit makes them consistent.
      
      This fixes unused object files warning in msvc build.
      
      Change-Id: Ic7d098bcc580cb021706154ab35e0ec83b25394e
      14fb1af6
  11. 27 Dec, 2017 2 commits
  12. 23 Dec, 2017 1 commit
  13. 22 Dec, 2017 2 commits
  14. 21 Dec, 2017 2 commits
    • Luc Trudeau's avatar
      [CFL] SSE2/AVX2 versions of subtract_average · b4faea73
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSE2/CFLAverageSpeedTest:
      4x4: C time = 499 us, SIMD time = 156 us (~3.2x)
      8x8: C time = 1124 us, SIMD time = 221 us (~5.1x)
      16x16: C time = 4228 us, SIMD time = 620 us (~6.8x)
      32x32: C time = 8743 us, SIMD time = 2236 us (~3.9x)
      
      AVX2/CFLAverageSpeedTest:
      4x4: C time = 482 us, SIMD time = 180 us (~2.7x)
      8x8: C time = 1007 us, SIMD time = 227 us (~4.4x)
      16x16: C time = 3471 us, SIMD time = 324 us (~11x)
      32x32: C time = 8758 us, SIMD time = 1443 us (~6.1x)
      
      Change-Id: Id5ae80142a9764f388c0770ebcff4e46fa3a4dad
      b4faea73
    • Steinar Midtskogen's avatar
      Remove CDEF_SINGLEPASS defines · 8322ff04
      Steinar Midtskogen authored
      The experiment has been adopted and has been enabled by default for a
      while and the alternative code path has not been maintained for a long
      time, which is now removed.
      
      Change-Id: Iaf22f2969b45b71b2bf67707e131ab4c439b7fa6
      8322ff04
  15. 18 Dec, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: highbd SIMD implemenation · a50f9f57
      Cheng Chen authored
      Add av1/common/x86/highbd_convolve_2d_sse4.c.
      Specifically, function av1_highbd_jnt_convolve_2d
      
      Change-Id: I1bfe0431c793bd5f78c30bc763aa7691e5d74b04
      a50f9f57
  16. 17 Dec, 2017 1 commit
    • Nathan E. Egge's avatar
      daala_tx: Unify the asym and ortho DST designs. · b2f82ebd
      Nathan E. Egge authored
      This patch refactors the DST transforms so that the orthonormal and
       asymmetric transforms are now nearly identical (up to multiplicaiton
       constants and an extra set of shifts).
      This means that the DST designs are now embeddable for every level
       and should address hardware concerns about gate area.
      
      In addition, minor changes were made to improve transform accuracy:
      
       - all of the transforms now have perfect reconstruction for those
          computations outside the rotations, i.e., all +/- butterfly steps
          are exactly invertible
       - two multiplication constants were reduced below < 1.0 (better for
          SIMD and gives slightly improved accuracy)
       - the averaging bias is removed which saves an extra addition for each
          of the averaging steps
      
      Additional averaging steps can be removed from the 8-point Type-IV DST
       giving a 68% reduction in MSE for the 32-point DCT, but has not been
       done in the event we use it in place of the 8-point Type-VII DST.
      
      subset-1:
      
      master-daala_tx@2017-12-10T22:38:19.651Z ->
       new-daala_tx@2017-12-10T22:37:50.844Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0057 | -0.0210 | -0.1821 |   0.0085 | -0.0002 |  0.0147 |    -0.0674
      
      Change-Id: Ib124eebf6f2e4b3c51c078d4e8f229fc5ec26171
      b2f82ebd
  17. 14 Dec, 2017 2 commits
    • Urvang Joshi's avatar
      Move encoder-only transform code to encoder/ · 2314566a
      Urvang Joshi authored
      Update make files, include paths etc.
      
      Change-Id: I78153b28890c7610d65c846eb72cb9dacd30bc2e
      2314566a
    • Urvang Joshi's avatar
      round_shift_array: Use SSE4 version everywhere. · 1ac47a7c
      Urvang Joshi authored
      Usage of CPU by round_shift_array goes from 2.01% to 1.04%.
      Overall encoding is slightly faster (~0.05%).
      
      This means some of the intermediate array have to be aligned.
      Also, these functions were moved to common header/source files.
      
      BUG=aomedia:1106
      
      Change-Id: I492c9b1f2e7339c6cb83cfe68a61218642654d1b
      1ac47a7c
  18. 13 Dec, 2017 3 commits
  19. 06 Dec, 2017 1 commit
  20. 05 Dec, 2017 1 commit
  21. 04 Dec, 2017 1 commit
    • Timothy B. Terriberry's avatar
      daala_tx: Add inverse TX SIMD dispatch · 18c803fa
      Timothy B. Terriberry authored
      This just adds a top-level daala_inv_txfm_add_avx2(), but no actual
      SIMD functions yet. It dispatches back to the C version for all TX
      types and sizes for the moment.
      
      Change-Id: I7a578a4af363f989615d01ea67ce031d8ceff977
      18c803fa
  22. 28 Nov, 2017 1 commit
  23. 21 Nov, 2017 1 commit
  24. 17 Nov, 2017 2 commits
  25. 14 Nov, 2017 2 commits
    • Monty Montgomery's avatar
      Simplify Daala inverse TX toplevel for constant shift · 359854fe
      Monty Montgomery authored
      Rather than backing out all the LGT-related shifting matrices
      throughout the existing TX code, separate out and simplify Daala
      inverse TX into a single dedicated entry point.  When DAALA_TX is
      enabled, CONFIG_HIGHBITDEPTH is also forced, and all of Daala TX
      (lowbd and highbd) uses this single TX dispatch.
      
      This patch is purely non-functional changes.
      
      subset 1:
      monty-TXtesting-fwd-s1@2017-11-12T05:25:09.557Z ->
       monty-TXtesting-inv-s1@2017-11-12T05:25:43.878Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      objective-1-fast:
      monty-TXtesting-fwd-o1f@2017-11-12T05:25:29.386Z ->
       monty-TXtesting-inv-o1f@2017-11-12T05:25:58.897Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: I790e8d7ac08eb214eb712f5441d6e5f76ebddf17
      359854fe
    • Cheng Chen's avatar
      JNT_COMP: highbd simd and unit tests · cce312fb
      Cheng Chen authored
      Change-Id: I2c913198b7ad136cdf15d4af86b9b0b9e6850b72
      cce312fb
  26. 13 Nov, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: SIMD for av1_warp_affine · fbaf5135
      Cheng Chen authored
      Add low bit-depth SIMD function for av1_warp_affine based on
      existing SIMD implementation.
      Unit tests are added.
      
      Change-Id: I1b4033fa75b53a81cb20a4bb5cc60413708b568c
      fbaf5135
  27. 12 Nov, 2017 1 commit
    • Monty Montgomery's avatar
      Simplify Daala forward TX toplevel for constant shift · a2d40a39
      Monty Montgomery authored
      Rather than backing out all the LGT-related shifting matrices
      throughout the existing TX code, separate out and simplify Daala
      forward TX into a single dedicated entry point.  When DAALA_TX is
      enabled, CONFIG_HIGHBITDEPTH is also forced, and all of Daala TX
      (lowbd and highbd) uses this single TX dispatch.
      
      At present, this should result in no effective functional change,
      however rectangular transforms are now always column-first-- that
      has minor rounding effects.
      
      subset 1:
      monty-daalaTX-fulltest-DaalaRDO-s1@2017-11-07T00:02:56.282Z ->
       monty-daalaTX-fulltest-fwd-s1@2017-11-07T03:08:55.478Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0576 |     N/A | -0.2646 |  -0.0125 | -0.0439 | -0.0479 |    -0.1798
      
      objective 1 fast:
      monty-daalaTX-fulltest-DaalaRDO-o1f4@2017-11-07T05:59:50.180Z ->
       monty-daalaTX-fulltest-fwd-o1f4@2017-11-07T06:00:08.500Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0036 |  0.0477 |  0.1132 |   0.0863 | -0.0017 |  0.0209 |     0.0240
      
      Change-Id: I182a5c4388c410cbea8810e2f9e36fd37a4a46e5
      a2d40a39
  28. 11 Nov, 2017 1 commit
    • Frederic Barbier's avatar
      Remove experimental flag of CDEF · 1aeee2e9
      Frederic Barbier authored
      This experiment has been adopted, we can simplify the code
      by dropping the associated preprocessor conditionals.
      
      Change-Id: I17bd46ebad7796d04fb4065fb36da0e1c4eeaf9b
      1aeee2e9
  29. 09 Nov, 2017 1 commit
    • Tom Finegan's avatar
      cmake: Silence ranlib warnings when jnt_comp is disabled. · 2ca19f90
      Tom Finegan authored
      Build aom_dsp/x86/variance_ssse3.c and
      av1/common/x86/convolve_2d_sse4.c only when jnt_comp is enabled.
      
      When jnt_comp is disabled the sources define no symbols and
      ranlib complains.
      
      Change-Id: I3de42be6a88bd65459799d3a523e307c40a36a72
      2ca19f90
  30. 06 Nov, 2017 1 commit
  31. 04 Nov, 2017 1 commit