1. 15 May, 2017 1 commit
  2. 22 Apr, 2017 1 commit
  3. 20 Apr, 2017 1 commit
  4. 12 Apr, 2017 1 commit
  5. 10 Mar, 2017 2 commits
    • David Barker's avatar
      Vectorize new highpass filter for loop-restoration · eed824ef
      David Barker authored
      Change-Id: Ibe5d4933f599456cb496f636de244694bc786a4c
      eed824ef
    • Debargha Mukherjee's avatar
      Replace one self guided filter with highpass · b7bb0976
      Debargha Mukherjee authored
      Adds an option controlled by a macro to replace one of
      the guided filters in the self-guided tool with a simple
      bandpass filtered version generated with a 3x3 kernel.
      By default the macro USE_HIGHPASS_IN_SGRPROJ is 0 (turned
      off), that defaults us to the dual self-guided filter.
      When the macro is turned on, the larger radius guided
      filter is replaced by a simpler filter that is much faster.
      
      Results (if USE_HIGHPASS_IN_SGRPROJ is on vs. off):
      lowres: performance drop by +0.14% (BDRATE)
      midres: performance drop by +0.27% (BDRATE)
      
      Further experiments on this variation of guided filters is
      pending.
      
      Change-Id: I7bbcfcad7ee266cd49a8dc6d96795a454feb1a94
      b7bb0976
  6. 09 Mar, 2017 3 commits
  7. 08 Mar, 2017 3 commits
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
    • David Barker's avatar
      Fix a bug in the C selfguided filter · cff43bb2
      David Barker authored
      Patch https://aomedia-review.googlesource.com/c/8321/ introduced
      a bug in the C version of the self-guided filter in the case where
      w = 384 and h > 368 or w > 368 and h = 384. This was due to forgetting
      to adjust the offset between A and B in the C code.
      
      This patch sets the offset correctly, resolving this bug.
      
      Change-Id: I6bdf11aa76c37d0ecae02788b262e7a2e0a11a6e
      cff43bb2
    • Alex Converse's avatar
      loop_restoration: Prevent some wild memory access · 1511ea10
      Alex Converse authored
      On recode frames the encoder will attempt to serialize the bitstream
      before choosing loop filter parameters to get a rough size estimate.
      This can result in wild reads in encode restoration if leftover values
      from the previous frame aren't available.
      
      Even with a realloc instead of free-ing and reallocing all the data,
      wild reads are possible on frame size changes.
      
      Change-Id: I9956d9e11c6ed61999563436051c2fe469718538
      1511ea10
  8. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  9. 02 Mar, 2017 1 commit
    • David Barker's avatar
      Remove double rounding in selfguided filter · 7dcd7f5e
      David Barker authored
      In av1_selfguided_restoration, the values stored into 'dgd' are
      unnecessarily rounded twice. This patch replaces this by a single
      rounding operation.
      
      Change-Id: I188d283137b74823f5d5447d441250520d6ee294
      7dcd7f5e
  10. 27 Feb, 2017 2 commits
    • Alex Converse's avatar
      Remove aom_realloc() · 7f094f10
      Alex Converse authored
      It only handles the realloc constraint (preserving low elements) by
      serendipity, and we don't actually rely on that behavior anyway.
      Meanwhile the calls may do extra copying that gets immediately clobbered
      by the callers.
      
      Cherry-pick from libvpx:
      3063c3760 Remove vpx_realloc()
      
      Change-Id: I8dfa89e4a81084b084889c27bd272fdf85184e8d
      7f094f10
    • Alex Converse's avatar
      loop_restoration: Cleanup allocations · 232e3847
      Alex Converse authored
      Change-Id: Id3824c09cbaae814df1d8fb029215f28e8c7a6b1
      232e3847
  11. 22 Feb, 2017 1 commit
    • David Barker's avatar
      Rearrange self-guided filter for vectorization · 9198d135
      David Barker authored
      By rearranging the order of operations, we can ensure that all
      intermediate values fit into 32 bits. This will help when we
      vectorize the self-guided filter.
      
      Results in the noise range.
      
      Change-Id: Ic0c73613882bd103c4e8e57a0155b3132672ae04
      9198d135
  12. 17 Feb, 2017 1 commit
    • Debargha Mukherjee's avatar
      Replace division in self-guided filter · 4be12628
      Debargha Mukherjee authored
      Replaces division with multiplication in self-guided
      filter.
      
      The guided filter requires computation of:
      n^2.s^2/(n^2.s^2 + n^2.e).
      This is now implemented by computation of n^2.s^2/n^2.e followed
      by using a lookup table for the function f(x) = x/(x+1).
      To compute n^2.s^2/n^2.e, we use an integer multiplication based
      implementation which becomes feasible since n^2.e can only
      take a few values and their corresponding multipliers can be
      pre-computed.
      There is also another divison by n, that is also integerized.
      
      Change-Id: Id7b81bbafead0b8f04a1853ec69b9dec423bb66a
      4be12628
  13. 14 Feb, 2017 1 commit
  14. 12 Feb, 2017 1 commit
    • David Barker's avatar
      Fix segfault with loop-restoration on x86. · befcc425
      David Barker authored
      The WienerInfo struct requires a 16-byte alignment on x86,
      since it contains filter coefficients which are loaded using
      SSE aligned load instructions. But on 32-bit x86, the default
      alignment of aom_malloc/aom_realloc is only 8 bytes, leading
      to occasional segfaults.
      
      To fix this, rather than using aom_realloc to resize WienerInfo
      structures, we always free and re-allocate them using aom_memalign
      
      BUG=aomedia:345
      
      Change-Id: Ib1b2a42d4a2fa215dcc81ea481c51271ab068a37
      befcc425
  15. 27 Jan, 2017 1 commit
  16. 19 Jan, 2017 2 commits
    • David Barker's avatar
      Bring highbd loop restoration filters in line with lowbd ones · 0b04e9b8
      David Barker authored
      * Use the same function for domaintxfmrf in both highbd and lowbd
        cases
      * Move an assertion out of a loop in
        apply_selfguided_restoration_highbd, to match the lowbd case
      
      No change to output, but a decoder speed improvement of ~3.5%
      (roughly independent of bitrate) with loop-restoration on a
      10bpp sample.
      
      Change-Id: I970a3bb8f1c6b0ac60aa4a6fe4e7f54d1e6c1452
      0b04e9b8
    • David Barker's avatar
      Miscellaneous cleaning up for loop-restoration · 1e8e6b95
      David Barker authored
      * Change Wiener filter storage to match the format expected
        by the convolve functions
      
      Change-Id: I4d1fb08a13cfc31e69e12c1cb4b2e510c6d8ae30
      1e8e6b95
  17. 18 Jan, 2017 1 commit
  18. 11 Jan, 2017 1 commit
  19. 09 Jan, 2017 2 commits
  20. 07 Jan, 2017 2 commits
    • David Barker's avatar
      Optimize Wiener filter selection · 33f3bfde
      David Barker authored
      * Change the behaviour of search_wiener at borders to match
        the behaviour of the Wiener filter itself
      * Reorder the calculation in compute_stats, saving ~5% of
        encode time at low bitrates (tested on bus_cif.y4m at 200kbps)
      
      Change-Id: I5f649d77fd66584451aaf37697ce9c9af69524e4
      33f3bfde
    • David Barker's avatar
      Various loop-restoration optimizations · 6928a5d2
      David Barker authored
      * Optimize the self-guided and domaintxfmrf filters
      * Save 576KiB of buffers in the encoder and decoder
      * Disable self-guided filter for videos whose width or
        height is < 5, in order to help simplify the filter.
      
      This results in an overall 30-40% improvement in decoder
      speed with loop-restoration enabled (depending on source
      and bitate), with no effect on video quality, *except* for
      videos with width or height < 5 pixels.
      
      Change-Id: Ide9181118ec3a63a0335338f316505b08df2d831
      6928a5d2
  21. 06 Jan, 2017 1 commit
    • Debargha Mukherjee's avatar
      Add UV wiener loop restoration · a43a2d98
      Debargha Mukherjee authored
      Enables Wiener based loop restoration only for the UV
      frames. The selfguided and domaintranform filters do not
      work very well for UV components, hence they are disabled.
      For each UV frame a single set of wiener parameters are
      sent. They are applied tile-wise, but all tiles use the
      same parameters.
      
      BDRATE (Global PSNR) results:
      -----------------------------
      lowres: -1.266% (up from -0.666%, good improvement)
      midres: -1.815% (up from -1.792%, tiny improvement)
      
      Tiling on UV components will be explored subsequently.
      
      Change-Id: Ib5be93121c4e88e05edf3c36c46488df3cfcd1e2
      a43a2d98
  22. 04 Jan, 2017 1 commit
    • David Barker's avatar
      Simplify buffer management for self-guided restoration filter · 3a0df186
      David Barker authored
      * Remove some unused variables
      * Reduce need for casts by typing intermediate buffers appropriately
      * Avoid copying data which is never modified; use the original data
        instead.
      * Reduce number of intermediate buffers required, saving allocations
        of 576KiB in the decoder and ~1MiB in the encoder
      
      No effect on performance
      
      Change-Id: I55243904dd8e818fb6d43fa431903736475d23ff
      3a0df186
  23. 03 Jan, 2017 1 commit
    • David Barker's avatar
      Add new convolve variant for loop-restoration · be6cc07d
      David Barker authored
      The convolve filters generated by loop_wiener_filter_tile
      are not compatible with some existing convolve implementations
      (they can have coefficients >128, sums of (certain subsets of)
      coefficients >128, etc.)
      
      So we implement a new variant, which takes a filter with 128
      subtracted from its central element and which adds an extra copy
      of the source just before clipping to a pixel (reinstating the
      128 we subtracted). This should be easy to adapt from the existing
      convolve functions, and this patch includes SSE2 highbd and
      SSSE3 lowbd implementations.
      
      Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
      be6cc07d
  24. 16 Dec, 2016 3 commits
  25. 15 Dec, 2016 2 commits
  26. 14 Dec, 2016 3 commits
    • David Barker's avatar
      Change Wiener filter in loop-restoration · 025b2545
      David Barker authored
      The Wiener filter now uses the same convolution code as the
      inter predictors.
      
      Change-Id: Ia3bfbc778171eb25c6a0141426d1f69d92c17992
      025b2545
    • David Barker's avatar
      Remove feedback between tiles in loop-restoration · 9666e756
      David Barker authored
      This is intended to simplify hardware and multithreaded
      implementations.
      
      Change-Id: I6aa95b67c03b794a0f3d5cf2f65c576d05f2ca7d
      9666e756
    • Debargha Mukherjee's avatar
      Disable filtering for Cb and Cr components · 818e42a7
      Debargha Mukherjee authored
      The parameters are optimized only on Y, so disable chrominance
      filtering for now. Later we can extend the syntax to have
      separate parameters for the chrominance, or optimize the
      parameters jointly over luminance and chrominance components.
      
      lowres: -0.676% (from -0.759%) becomes a little worse
      midres: -1.837% (from -1.520%) substantial improvement
      hdres: pending
      
      Change-Id: I98d71f48de98394b05fd9036de259cb43d007614
      818e42a7