1. 10 Sep, 2017 2 commits
    • Debargha Mukherjee's avatar
      Refactoring/simplification of buffers used for sgr · 1330dfd1
      Debargha Mukherjee authored
      Inlcudes miscellaneous cleanups, test fixes, and code reorganization
      for loop-restoration components.
      
      Change-Id: I5b2e6419234d945e6f4344b22636119b50df4054
      1330dfd1
    • Debargha Mukherjee's avatar
      Reduce/Eliminate line buffer for loop-restoration. · e168a783
      Debargha Mukherjee authored
      This patch forces the vertical filtering for the top and bottom
      rows of a processing unit for the Wiener filter to not use border
      more than what is set in the WIENER_BORDER_VERT macro.
      This macro is currently set at 0 to eliminate line buffer completely,
      but it could be increased to 1 or 2 to use limited line buffers
      if the coding efficiency is affected too much with a 0 line-buffer.
      
      Also, for the sgr filter we added the option of using overlapping
      windows horizonttally and vertically to improve coding efficiency.
      The vertical border used is set by the SGRPROJ_BORDER_VERT
      macro, while the horizontal border can be set by the
      SGRPROJ_BORDER_HORZ macro set at 2, the max needed. Currently we do not
      recommend changing SGRPROJ_BORDER_HORZ below 2.
      
      The overall line buffer requirement for LR is twice the max of
      WIENER_BORDER_VERT and SGRPROJ_BORDER_VERT.
      Currently both are set as 0, eliminating line buffers completely.
      
      Also this patch extends borders consistently before CDEF / LR.
      
      Change-Id: Ie58a98c784a0db547627b9cfcf55f018c30e8e79
      e168a783
  2. 07 Sep, 2017 1 commit
    • Debargha Mukherjee's avatar
      Reduce line buffer size for Wiener filter. · 22bbe4cc
      Debargha Mukherjee authored
      This patch forces the vertical filtering for the top and bottom
      rows of a processing unit for the Wiener filter to be 5-tap.
      The 5-taps are derived from the primary 7-tap fitler by forcing
      the taps at the end to be zero, and absorbing their weights into
      the other taps to maintain normalization.
      This will effectively reduce the line buffer size for luma Wiener
      filter to 4 (from 6).
      
      Change-Id: I5e21b58369777eabf553a8987387d112f98a5598
      22bbe4cc
  3. 06 Sep, 2017 2 commits
    • Rupert Swarbrick's avatar
      Round up subsampled frame size in av1_loop_restoration_corners_in_sb · 7380b25e
      Rupert Swarbrick authored
      The previous code converted a frame_w (say) of 1 to zero for a plane
      where subsampling was enabled, causing a division by zero in
      av1_get_rest_ntiles. This doesn't match the spec, which says
      subsampling rounds up.
      
      The patch adds the rounding, and also adds an assertion to
      av1_get_rest_ntiles to help diagnose any other broken callsites.
      
      Change-Id: Ia6c249fa935c3a16d122ba6e7b450fe99f412fde
      7380b25e
    • Debargha Mukherjee's avatar
      Make loop-restoration use 64x64 processing units · 7a5587a8
      Debargha Mukherjee authored
      Changes loop-restoration to use processing unit size that is
      64x64 for luma; for chroma the processing unit is coupled to
      64x64 support region for luma.
      Thus for chroma the processing unit size is 32x32 for 4:2:0,
      32x64 for 4:2:2 and 64x64 for 4:4:4, etc.
      
      While the Wiener filter output should not change with this patch,
      the sgr filter will change since the boundary pixel handling in
      sgr is internal within the filter.
      
      Change-Id: I65a9e2df88927a19445420ce400acb1fcf7afa93
      7a5587a8
  4. 03 Sep, 2017 1 commit
    • Rupert Swarbrick's avatar
      Move loop restoration coefficients to within the frame · 6c545216
      Rupert Swarbrick authored
      Rather than encoding the loop restoration coefficients at the start of
      the frame header, this patch moves them to occur just after certain
      top-level superblocks.
      
      You might hope that we could just encode coefficients on top-level
      superblocks where the top-left corner of the superblock was also the
      top-left corner of the loop restoration tile. Unfortunately, this
      can't work with the superres experiment, where the loop restoration
      tiles don't necessarily line up with the superblocks. Indeed, in
      general there can be multiple different loop restoration coefficients
      that apply in a given top-level superblock. This patch defines a
      function, av1_loop_restoration_corners_in_sb, which yields the
      rectangle [rrow0, rrow1) x [rcol0, rcol1) of loop restoration tiles
      whose top left corners lie in this top-level superblock.
      
      The total file size should be unchanged by this patch: the bits have
      just been moved from the frame header and spread out among the rest of
      the frame.
      
      Change-Id: Icf43b0560964a63dea0d2cd801313f04139188d7
      6c545216
  5. 16 Aug, 2017 1 commit
    • Debargha Mukherjee's avatar
      Use only up to 5x5 windows for sgr in loop-rest. · 76be32df
      Debargha Mukherjee authored
      The tables for guided filter in loop-restoration are changed
      to use a max of 5x5 windows (or radius 2).
      The aim is to reduce the gate count for hardware implementation.
      
      Change-Id: I7178d6ac09e4731a626f9bccf5151467c63e00c3
      76be32df
  6. 13 Jun, 2017 1 commit
    • Fergus Simpson's avatar
      Make loop-restoration compatible w/ frame_superres · 9cd57cf8
      Fergus Simpson authored
      There were several places where loop_restoration used the encoded width
      and height while superres was active. This patch changes it to use the
      upscaled width and height, since loop_restoration is supposed to occur
      after superres has done its upscaling.
      
      Change-Id: I2b9bbb06b5370618758bf81d8eb63f2eef26af80
      9cd57cf8
  7. 06 Jun, 2017 1 commit
    • Debargha Mukherjee's avatar
      Make loop-restoration compatible w/ frame_superres · 2dd982e4
      Debargha Mukherjee authored
      When frame_superres is on, loop-restoration should work
      on the size of the upscaled frame and not on the internal
      width and height in the common structure. This patch
      makes the necessary changes on the encoder and decoder
      side to enable that.
      
      Change-Id: I1d1c024ac6f95944169d90647b4c5a61354a5cc6
      2dd982e4
  8. 15 May, 2017 1 commit
  9. 22 Apr, 2017 1 commit
  10. 20 Apr, 2017 1 commit
  11. 12 Apr, 2017 1 commit
  12. 10 Mar, 2017 2 commits
    • David Barker's avatar
      Vectorize new highpass filter for loop-restoration · eed824ef
      David Barker authored
      Change-Id: Ibe5d4933f599456cb496f636de244694bc786a4c
      eed824ef
    • Debargha Mukherjee's avatar
      Replace one self guided filter with highpass · b7bb0976
      Debargha Mukherjee authored
      Adds an option controlled by a macro to replace one of
      the guided filters in the self-guided tool with a simple
      bandpass filtered version generated with a 3x3 kernel.
      By default the macro USE_HIGHPASS_IN_SGRPROJ is 0 (turned
      off), that defaults us to the dual self-guided filter.
      When the macro is turned on, the larger radius guided
      filter is replaced by a simpler filter that is much faster.
      
      Results (if USE_HIGHPASS_IN_SGRPROJ is on vs. off):
      lowres: performance drop by +0.14% (BDRATE)
      midres: performance drop by +0.27% (BDRATE)
      
      Further experiments on this variation of guided filters is
      pending.
      
      Change-Id: I7bbcfcad7ee266cd49a8dc6d96795a454feb1a94
      b7bb0976
  13. 09 Mar, 2017 3 commits
  14. 08 Mar, 2017 3 commits
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
    • David Barker's avatar
      Fix a bug in the C selfguided filter · cff43bb2
      David Barker authored
      Patch https://aomedia-review.googlesource.com/c/8321/ introduced
      a bug in the C version of the self-guided filter in the case where
      w = 384 and h > 368 or w > 368 and h = 384. This was due to forgetting
      to adjust the offset between A and B in the C code.
      
      This patch sets the offset correctly, resolving this bug.
      
      Change-Id: I6bdf11aa76c37d0ecae02788b262e7a2e0a11a6e
      cff43bb2
    • Alex Converse's avatar
      loop_restoration: Prevent some wild memory access · 1511ea10
      Alex Converse authored
      On recode frames the encoder will attempt to serialize the bitstream
      before choosing loop filter parameters to get a rough size estimate.
      This can result in wild reads in encode restoration if leftover values
      from the previous frame aren't available.
      
      Even with a realloc instead of free-ing and reallocing all the data,
      wild reads are possible on frame size changes.
      
      Change-Id: I9956d9e11c6ed61999563436051c2fe469718538
      1511ea10
  15. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  16. 02 Mar, 2017 1 commit
    • David Barker's avatar
      Remove double rounding in selfguided filter · 7dcd7f5e
      David Barker authored
      In av1_selfguided_restoration, the values stored into 'dgd' are
      unnecessarily rounded twice. This patch replaces this by a single
      rounding operation.
      
      Change-Id: I188d283137b74823f5d5447d441250520d6ee294
      7dcd7f5e
  17. 27 Feb, 2017 2 commits
    • Alex Converse's avatar
      Remove aom_realloc() · 7f094f10
      Alex Converse authored
      It only handles the realloc constraint (preserving low elements) by
      serendipity, and we don't actually rely on that behavior anyway.
      Meanwhile the calls may do extra copying that gets immediately clobbered
      by the callers.
      
      Cherry-pick from libvpx:
      3063c3760 Remove vpx_realloc()
      
      Change-Id: I8dfa89e4a81084b084889c27bd272fdf85184e8d
      7f094f10
    • Alex Converse's avatar
      loop_restoration: Cleanup allocations · 232e3847
      Alex Converse authored
      Change-Id: Id3824c09cbaae814df1d8fb029215f28e8c7a6b1
      232e3847
  18. 22 Feb, 2017 1 commit
    • David Barker's avatar
      Rearrange self-guided filter for vectorization · 9198d135
      David Barker authored
      By rearranging the order of operations, we can ensure that all
      intermediate values fit into 32 bits. This will help when we
      vectorize the self-guided filter.
      
      Results in the noise range.
      
      Change-Id: Ic0c73613882bd103c4e8e57a0155b3132672ae04
      9198d135
  19. 17 Feb, 2017 1 commit
    • Debargha Mukherjee's avatar
      Replace division in self-guided filter · 4be12628
      Debargha Mukherjee authored
      Replaces division with multiplication in self-guided
      filter.
      
      The guided filter requires computation of:
      n^2.s^2/(n^2.s^2 + n^2.e).
      This is now implemented by computation of n^2.s^2/n^2.e followed
      by using a lookup table for the function f(x) = x/(x+1).
      To compute n^2.s^2/n^2.e, we use an integer multiplication based
      implementation which becomes feasible since n^2.e can only
      take a few values and their corresponding multipliers can be
      pre-computed.
      There is also another divison by n, that is also integerized.
      
      Change-Id: Id7b81bbafead0b8f04a1853ec69b9dec423bb66a
      4be12628
  20. 14 Feb, 2017 1 commit
  21. 12 Feb, 2017 1 commit
    • David Barker's avatar
      Fix segfault with loop-restoration on x86. · befcc425
      David Barker authored
      The WienerInfo struct requires a 16-byte alignment on x86,
      since it contains filter coefficients which are loaded using
      SSE aligned load instructions. But on 32-bit x86, the default
      alignment of aom_malloc/aom_realloc is only 8 bytes, leading
      to occasional segfaults.
      
      To fix this, rather than using aom_realloc to resize WienerInfo
      structures, we always free and re-allocate them using aom_memalign
      
      BUG=aomedia:345
      
      Change-Id: Ib1b2a42d4a2fa215dcc81ea481c51271ab068a37
      befcc425
  22. 27 Jan, 2017 1 commit
  23. 19 Jan, 2017 2 commits
    • David Barker's avatar
      Bring highbd loop restoration filters in line with lowbd ones · 0b04e9b8
      David Barker authored
      * Use the same function for domaintxfmrf in both highbd and lowbd
        cases
      * Move an assertion out of a loop in
        apply_selfguided_restoration_highbd, to match the lowbd case
      
      No change to output, but a decoder speed improvement of ~3.5%
      (roughly independent of bitrate) with loop-restoration on a
      10bpp sample.
      
      Change-Id: I970a3bb8f1c6b0ac60aa4a6fe4e7f54d1e6c1452
      0b04e9b8
    • David Barker's avatar
      Miscellaneous cleaning up for loop-restoration · 1e8e6b95
      David Barker authored
      * Change Wiener filter storage to match the format expected
        by the convolve functions
      
      Change-Id: I4d1fb08a13cfc31e69e12c1cb4b2e510c6d8ae30
      1e8e6b95
  24. 18 Jan, 2017 1 commit
  25. 11 Jan, 2017 1 commit
  26. 09 Jan, 2017 2 commits
  27. 07 Jan, 2017 2 commits
    • David Barker's avatar
      Optimize Wiener filter selection · 33f3bfde
      David Barker authored
      * Change the behaviour of search_wiener at borders to match
        the behaviour of the Wiener filter itself
      * Reorder the calculation in compute_stats, saving ~5% of
        encode time at low bitrates (tested on bus_cif.y4m at 200kbps)
      
      Change-Id: I5f649d77fd66584451aaf37697ce9c9af69524e4
      33f3bfde
    • David Barker's avatar
      Various loop-restoration optimizations · 6928a5d2
      David Barker authored
      * Optimize the self-guided and domaintxfmrf filters
      * Save 576KiB of buffers in the encoder and decoder
      * Disable self-guided filter for videos whose width or
        height is < 5, in order to help simplify the filter.
      
      This results in an overall 30-40% improvement in decoder
      speed with loop-restoration enabled (depending on source
      and bitate), with no effect on video quality, *except* for
      videos with width or height < 5 pixels.
      
      Change-Id: Ide9181118ec3a63a0335338f316505b08df2d831
      6928a5d2
  28. 06 Jan, 2017 1 commit
    • Debargha Mukherjee's avatar
      Add UV wiener loop restoration · a43a2d98
      Debargha Mukherjee authored
      Enables Wiener based loop restoration only for the UV
      frames. The selfguided and domaintranform filters do not
      work very well for UV components, hence they are disabled.
      For each UV frame a single set of wiener parameters are
      sent. They are applied tile-wise, but all tiles use the
      same parameters.
      
      BDRATE (Global PSNR) results:
      -----------------------------
      lowres: -1.266% (up from -0.666%, good improvement)
      midres: -1.815% (up from -1.792%, tiny improvement)
      
      Tiling on UV components will be explored subsequently.
      
      Change-Id: Ib5be93121c4e88e05edf3c36c46488df3cfcd1e2
      a43a2d98
  29. 04 Jan, 2017 1 commit
    • David Barker's avatar
      Simplify buffer management for self-guided restoration filter · 3a0df186
      David Barker authored
      * Remove some unused variables
      * Reduce need for casts by typing intermediate buffers appropriately
      * Avoid copying data which is never modified; use the original data
        instead.
      * Reduce number of intermediate buffers required, saving allocations
        of 576KiB in the decoder and ~1MiB in the encoder
      
      No effect on performance
      
      Change-Id: I55243904dd8e818fb6d43fa431903736475d23ff
      3a0df186