1. 14 Jul, 2017 1 commit
    • Urvang Joshi's avatar
      selfguided_filter_test: Remove unnecessary memset. · d98661cc
      Urvang Joshi authored
      The memset to 0 wasn't required because the temporary variable is only
      written to, before being read in the next function call.
      
      Tested:
      ./test_libaom --gtest_filter=*SelfguidedFilterTest*
      
      Change-Id: Ie1628d43b050744ae97a8be55f551edb602b018b
      d98661cc
  2. 12 Apr, 2017 1 commit
  3. 20 Mar, 2017 1 commit
    • David Barker's avatar
      Fix two bugs in highbitdepth self-guided filter · 7e08ac3f
      David Barker authored
      This filter was temporarily removed due to test failures.
      This patch reintroduces the filter and fixes two bugs:
      
      * The test cases would occasionally segfault on x86, since
        the highbd filter requires its inputs to be aligned to
        16 bytes. This will always be true when used on real videos,
        so adjust the test cases to match.
      
      * The function calc_block was incorrect for bit_depth > 8,
        due to passing an incorrect argument to _mm_srl_epi32().
        This was the cause of the original test failures.
      
      BUG=aomedia:392
      
      Change-Id: Ia06b76c3e6122eebadd0995fb62f32c2fcab8b3e
      7e08ac3f
  4. 13 Mar, 2017 1 commit
  5. 10 Mar, 2017 1 commit
  6. 09 Mar, 2017 2 commits
    • David Barker's avatar
      Add SSE4.1 highbitdepth self-guided filter · 4d2af5db
      David Barker authored
      Performance is very similar to the lowbd path (only 4-5% slower)
      
      Change-Id: Ifdb272c3f6c0e6f41e7046cc49497c72b5a796d9
      4d2af5db
    • Yaowu Xu's avatar
      Avoid out-of-range memory access · 7e9f59e0
      Yaowu Xu authored
      The commit increase size of a few heap allocations to make sure later
      access is not out of bounds.
      
      BUG=aomedia:383
      
      Change-Id: Iadb08faa1e55be361dd3d4adaafeb85cecf23bbb
      7e9f59e0
  7. 08 Mar, 2017 2 commits
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
    • David Barker's avatar
      Handle non-multiple-of-4 widths in SSE4.1 self-guided filter · 5765fad5
      David Barker authored
      Adjust the vectorized filter so that it can handle tile widths
      which are not a multiple of 4, so we do not have to fall back
      to the C version of the filter.
      
      Negligible speed impact for tiles with widths which are multiples
      of 4, and greatly improves speed on tiles with non-multiple-of-4
      widths.
      
      Change-Id: Iae9d14f812c52c6f66910d27da1d8e98930df7ba
      5765fad5
  8. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5