1. 09 Mar, 2017 2 commits
    • David Barker's avatar
      Add SSE4.1 highbitdepth self-guided filter · 4d2af5db
      David Barker authored
      Performance is very similar to the lowbd path (only 4-5% slower)
      
      Change-Id: Ifdb272c3f6c0e6f41e7046cc49497c72b5a796d9
      4d2af5db
    • Yaowu Xu's avatar
      Avoid out-of-range memory access · 7e9f59e0
      Yaowu Xu authored
      The commit increase size of a few heap allocations to make sure later
      access is not out of bounds.
      
      BUG=aomedia:383
      
      Change-Id: Iadb08faa1e55be361dd3d4adaafeb85cecf23bbb
      7e9f59e0
  2. 08 Mar, 2017 2 commits
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
    • David Barker's avatar
      Handle non-multiple-of-4 widths in SSE4.1 self-guided filter · 5765fad5
      David Barker authored
      Adjust the vectorized filter so that it can handle tile widths
      which are not a multiple of 4, so we do not have to fall back
      to the C version of the filter.
      
      Negligible speed impact for tiles with widths which are multiples
      of 4, and greatly improves speed on tiles with non-multiple-of-4
      widths.
      
      Change-Id: Iae9d14f812c52c6f66910d27da1d8e98930df7ba
      5765fad5
  3. 07 Mar, 2017 1 commit
  4. 06 Mar, 2017 2 commits
    • Sarah Parker's avatar
      Disable sse2 for inverse tx with emulate-hardware · 2f103aad
      Sarah Parker authored
      This fixes compile errors when both rect-tx and emulate-hardware are
      enabled.
      
      Change-Id: I19125916bc90caf348caefe906335f9b765a2487
      2f103aad
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  5. 03 Mar, 2017 1 commit
  6. 02 Mar, 2017 2 commits
    • Steinar Midtskogen's avatar
      Remove ASM_REGISTER_STATE_CHECK when testing v64/v128/v256 intrinsics · c20176e5
      Steinar Midtskogen authored
      Since the tested functions are always forced inline in regular use,
      ASM_REGISTER_STATE_CHECK doesn't make sense on this level (the test
      should rather be applied to unit tests checking functions making use
      of these inlined functions).  The test fails on Win64 because the
      Win64 ABI requires xmm6 to xmm15 to be preserved across function
      calls, but the ABI is only relevant for non-inlined functions.
      
      BUG=aomedia:371
      
      Change-Id: Icb795083f69465cf09ec8f6871899943efaeaab8
      c20176e5
    • Yue Chen's avatar
      Use 3-tap spatial filter in FILTER_INTRA experiment · 8d8638a1
      Yue Chen authored
      3-tap recursive intra prediction filters are added.
      Macro USE_3TAP_INTRA_FILTER is set to 1 to use 3-tap by default.
      Coding gain of FILTER_INTRA experiment in AWCY, high delay 150f
      3-tap: 0.51%
      4-tap: 0.68%
      
      Change-Id: I44192dd08bfd8155f58a9b0b5cf1de88fceb762e
      8d8638a1
  7. 01 Mar, 2017 1 commit
    • James Zern's avatar
      simd_cmp_impl,TestSimd*Arg: break on failure · 8c636c12
      James Zern authored
      check for googletest failures as well as mismatches. this greatly
      reduces the error output and time to failure.
      
      BUG=aomedia:371
      
      Change-Id: Ic617905430a8ec39fbee2af9ce6655a8ef6796c0
      8c636c12
  8. 27 Feb, 2017 1 commit
  9. 24 Feb, 2017 1 commit
    • Angie Chiang's avatar
      Let hbd conv func be flexible · 0a2c0cbc
      Angie Chiang authored
      This CL allow us to change filter coefficients easily for SIMD
      implementation of high bitdepth convolution functions
      
      Change-Id: I454a5c76d3ba9e4454118c6a9d87737b3aa24898
      0a2c0cbc
  10. 23 Feb, 2017 1 commit
    • Urvang Joshi's avatar
      SMOOTH_PRED: Use 12-bit multiplications instead of 18-bit. · 81760810
      Urvang Joshi authored
      Compression performance is roughly neutral:
      
      AWCY:
      -----
                       High Latency     Low Latency
        All Keyframes  0.00             0.00
        Video overall  0.01            -0.01
      
      Google sets:
      ------------
      
      - All Keyframes:
      
        lowres  -0.001
        midres   0.000
        hdres    0.001
      
      - Video overall:
        lowres   0.019
        midres   0.000
        hdres   -0.013
      
      Change-Id: I89be2739203bf3e2848e4ba7ae2988c625f54513
      81760810
  11. 22 Feb, 2017 3 commits
  12. 18 Feb, 2017 1 commit
  13. 17 Feb, 2017 1 commit
    • Urvang Joshi's avatar
      InterpFilter type: Create an enum. · a9b174bd
      Urvang Joshi authored
      We use a single enum instead of multiple #defines.
      - Ensures better type checking
      - Enum values are generated implicitly, and hard-coded #defines are not
      required.
      - We use ATTRIBUTE_PACKED to indicate that the enum should still use the
      smallest integral type.
      
      Change-Id: I7532428da31744d3441b363bd932a7f233ee7ab5
      a9b174bd
  14. 13 Feb, 2017 4 commits
  15. 10 Feb, 2017 2 commits
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
    • Angie Chiang's avatar
      Turn on adapt_scan by default · 76ebf7ce
      Angie Chiang authored
      Change-Id: Ibf160e83e7cb1c7dce8b40e7cbead48416440974
      76ebf7ce
  16. 08 Feb, 2017 1 commit
  17. 07 Feb, 2017 2 commits
  18. 04 Feb, 2017 1 commit
  19. 03 Feb, 2017 4 commits
  20. 02 Feb, 2017 3 commits
  21. 01 Feb, 2017 1 commit
    • Tom Finegan's avatar
      Fix tests on macosx. · 29ba6756
      Tom Finegan authored
      - Wrap functions hidden by CONFIG_MOTION_VAR properly in test code.
      - Add some missing ampersands.
      
      Change-Id: Ie7c4e1f14cbacec1c157c7ce110b01350b2ed78e
      29ba6756
  22. 27 Jan, 2017 2 commits
  23. 26 Jan, 2017 1 commit