1. 28 Apr, 2017 1 commit
    • David Barker's avatar
      Fix encode/decode mismatch with global/warped motion · b62eef7b
      David Barker authored
      When predicting a 4x4 warp block (either using ZEROMV with
      global-motion, or the WARPED_CAUSAL motion mode with
      warped-motion), the warp filter would previously write
      4 bytes to the right of the block.
      
      This caused encode/decode mismatches when encoding with
      multiple threads and tile_cols > 1, since in that case
      we could end up overwriting already-generated pixels from
      the next tile across.
      
      This patch changes the filter so that we only overwrite the
      intended pixels.
      
      Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
      b62eef7b
  2. 27 Apr, 2017 1 commit
  3. 26 Apr, 2017 2 commits
  4. 24 Apr, 2017 1 commit
  5. 21 Apr, 2017 1 commit
    • Urvang Joshi's avatar
      Revert "warp_affine_c: Refactor highbd and lowbd versions." · 0d08afdc
      Urvang Joshi authored
      This reverts commit 8cd0e7ef.
      
      Reason for revert:
      This change breaks av1_warp_affine_c when CONFIG_HIGHBITDEPTH is enabled.
      
      In particular, running ./test_libaom --gtest_filter=*Warp* compiled with --enable-warped-motion --enable-highbitdepth shows several test failures, followed by a segmentation fault when it gets up to test SSE2/AV1WarpFilterTest.CheckOutput/4
      
      The tricky part is that the use the lowbd version of the function is dependent on a mix of two conditions:
      (1) Compile time check for CONFIG_HIGHBITDEPTH and
      (2) Run time check to see if bit-depth == 8
      So, it is tricky to refactor.
      
      BUG=aomedia:442
      
      Change-Id: I610c537fb65bde4f357185a13081639f906351de
      0d08afdc
  6. 20 Apr, 2017 1 commit
  7. 17 Apr, 2017 1 commit
  8. 13 Apr, 2017 1 commit
    • Debargha Mukherjee's avatar
      Adds option to use 1/32 subpel precision for gm/wm · 16056f5b
      Debargha Mukherjee authored
      Adds filters for 1/32 subpel precision for warping.
      To use 1/32 subpel precision make WARPEDPIXEL_PREC_BITS 5.
      By default, WARPEDPIXEL_PREC_BITS is set as 6 in common/mv.h,
      which uses 1/64 subpel precision.
      
      If 1/32 precision is used, BDRATE drops:
      on lowres:
      -1.101 (vs. -1.186% with 1/64) w/warped-motion
      -1.587 (vs. -1.650% with 1/64) w/global-motion
      
      on cam_lowres:
      -2.638 (vs. -2.707% with 1/64) w/warped-motion
      -3.396 (vs. -3.453% with 1/64) w/global-motion
      
      Change-Id: I82fbfddaad9bd9be658fe382401d212833c7ceef
      16056f5b
  9. 12 Apr, 2017 1 commit
  10. 11 Apr, 2017 2 commits
  11. 10 Apr, 2017 2 commits
  12. 08 Apr, 2017 1 commit
  13. 07 Apr, 2017 1 commit
  14. 06 Apr, 2017 1 commit
    • David Barker's avatar
      Prepare for vectorizing highbd warp filter · 2bcf280e
      David Barker authored
      This applies the same refactorings to highbd_warp_plane
      which were applied to warp_plane a while ago, and lays the
      groundwork for the relevant tests.
      
      Change-Id: Ic4c00bce1accc5a3624bba0c3b4b325e69a42c1a
      2bcf280e
  15. 05 Apr, 2017 1 commit
  16. 04 Apr, 2017 1 commit
    • Debargha Mukherjee's avatar
      Reduce precision in find_affine_int() · f2f3bcd8
      Debargha Mukherjee authored
      Reduces precision in find_affine_int() function. Makes the maximum
      mv allowed 512 from 1024.
      Negligible impact on coding efficiency.
      
      Change-Id: I76d4c6824528e3f940d1275fe0bd22d71015a8d0
      f2f3bcd8
  17. 02 Apr, 2017 1 commit
    • Yue Chen's avatar
      Use 1 sample per neighbor for local warping model estimation · 5558e5da
      Yue Chen authored
      Only 1 sample needs to be collected. Max of 8 neighbors are
      used.
      In LS estimation, the projection samples (sx, sy)->(dx, dy) are
      intentionally smoothed by assuming 3 shifted versions
      (sx, sy+n)->(dx, dy+n), (sx+n, sy)->(dx+n, dy), (sx+n,
      sy+n)->(dx+n, dy+n) also contribute to the estimation.
      For example, instead of using A[0] = sx^2, we use the sum of
      squares of source x of four points, A[0] += 4sx^2+4*n*sx+n^2.
      But computational cost wise, it does not add much overhead. Coding
      gain is mostly same as the old version. If no smoothing is added,
      will lose 0.3% on lowres.
      
      Change-Id: I04be32cffa525f7dc8ee583c0bf211d7bdc6e609
      5558e5da
  18. 31 Mar, 2017 1 commit
  19. 30 Mar, 2017 1 commit
    • Debargha Mukherjee's avatar
      A few fixes for global motion · 11f0e40d
      Debargha Mukherjee authored
      Handles a rare divisin by 0 case.
      Also adds a check on global motion parameters to disable
      if the parameters obtained are outside the range that the
      shear supports. This fixes a rare assert failure.
      Also changes the recode loop threshold somewhat.
      
      Change-Id: I4c6e74b914ac653cd9caa0563d78b0a19a2a8627
      11f0e40d
  20. 23 Mar, 2017 2 commits
    • Debargha Mukherjee's avatar
      Simplify warped motion estimation to use 2d ls · b9370acd
      Debargha Mukherjee authored
      Use a simpler warped motion estimation scheme that uses a 2d
      least squares problem, where the underlying assumption
      applied is that the motion vector computed at the center
      of the current block using the warp model is exactly the same
      as the motion vector transmitted for the block.
      
      The main motivation is to reduce the complexity of the
      estimation process.
      
      Coding efficiency drop is about +0.25% on lowres:
      -1.152% (from -1.396%).
      
      Also, removes code for non-approximate division and bakes
      approximate divison in.
      
      Change-Id: Ie4ad8e32593b09f7e1920c70b0b92545236ddc54
      b9370acd
    • Debargha Mukherjee's avatar
      Split current block samples for warp estimation · e8e6cad7
      Debargha Mukherjee authored
      Change-Id: Iebc74024475c7cb88650b65df9f23b1a5e70021c
      e8e6cad7
  21. 17 Mar, 2017 1 commit
    • Debargha Mukherjee's avatar
      Replace division in warped motion least squares · 082d4df7
      Debargha Mukherjee authored
      Replaces the int64 and int32 divisions in least-squares and
      gamma or delta computation with a mechanism that decomposes
      the divisor D such that 1/D = y * 2^-k where y is obtained
      from a lookup table indexed by 8 highest bits of the difference
      D - 2^floor(log2(D)). The main complexity is now only from
      computing this decomposition, which is essentially equivalent
      to finding floor(log2(D)) (position of highest
      bit in a 64-bit integer).
      
      Also includes an out of memory bug fix and some cleanups.
      
      Change-Id: I9247fdff5f6b4191175d4b4656357bfff626f02c
      082d4df7
  22. 02 Mar, 2017 1 commit
    • Debargha Mukherjee's avatar
      Some optimizations on integer affine estimation · 93105538
      Debargha Mukherjee authored
      1. Adds a limit on number of candidate samples used for the
      estimation.
      2. Adds a limit on max mv magnitude for use in the least-squares
      3. Makes some of the internal variables 32-bit.
      
      Impact on coding efficiency in the noise range.
      
      Change-Id: I8c1c3216368ceb2e3548660a3b8c159df54a8312
      93105538
  23. 28 Feb, 2017 1 commit
  24. 27 Feb, 2017 1 commit
    • Debargha Mukherjee's avatar
      Integerize warped motion computation · e6eb3b53
      Debargha Mukherjee authored
      Integerizes computation of the least squares for warped motion.
      The model is restricted to only Affine. Affine seems easiest
      to compute and integerize since it can be split into two 3-dim
      least squares problems, as opposed to rotation-zoom which needs
      a 4-dim least-squares problem to be solved.
      The current implementation requires only one division per block.
      
      BDRATE impact is mminimal. The upgrade to the affine model improves
      coding efficiency but integerization also degrades efficiency a
      little. Overall there is a net gain of about -0.07% BDRATE on
      the lowres set.
      BDRATE lowres: -1.113% with ----enable-warped-motion vs. without
      (up from -1.044%).
      
      Change-Id: I6b9216ac0737d76f59054293eabee48e17739ec4
      e6eb3b53
  25. 17 Feb, 2017 1 commit
    • Debargha Mukherjee's avatar
      Support trapezoidal models for global motion · 5dfa9300
      Debargha Mukherjee authored
      Adds functinoality for least-squares, RANSAC as well as encoding and
      decoding with new constrained homographies that warp blocks to horizontal
      and/or vertical trapezoids. This is for future experimentation. None
      of the models are actually enabled in the code.
      
      Change-Id: I1936018c6b11587d6fd83c3a2c63548cb641b33f
      5dfa9300
  26. 14 Feb, 2017 1 commit
  27. 01 Feb, 2017 1 commit
    • Debargha Mukherjee's avatar
      Misc global motion changes. · d978cd5e
      Debargha Mukherjee authored
      A few encoder global-motion estimation parameter changes.
      lowres: -0.844% (up by 0.08%)
      
      Change-Id: Ib080125803cf56a91ce7d482d6d1445160105010
      d978cd5e
  28. 27 Jan, 2017 1 commit
  29. 23 Jan, 2017 1 commit
    • David Barker's avatar
      Warp filter improvements · 13797462
      David Barker authored
      * The restriction on the parameter 'delta' was too strict, so we
        loosen it (delta only ever gets multiplied by -4, ... , 4,
        whereas beta gets multiplied by -7, ..., 7)
      * Correct a comment about the border clamping
      * Fix an issue with the test case
      
      Change-Id: I30e55203455ba6e419b5a8b646151a6d1fd5cc3b
      13797462
  30. 20 Jan, 2017 1 commit
    • Debargha Mukherjee's avatar
      Change the warp filter to use real 8-tap · e6044fec
      Debargha Mukherjee authored
      The warp filter for the (0,1) case is changed to use a real
      8-tap filter.
      
      Improves coding efficiency.
      
      BDRATE on lowres:
      -0.772% (up from -0.633%) with --enable-global-motion
      -1.124% (up from -1.001%) with --enable-warped-motion
      
      Change-Id: I296efe36dbc72a7af74773b71b445f19a2aa7205
      e6044fec
  31. 19 Jan, 2017 2 commits
    • David Barker's avatar
      Add correctness tests for the SSE2 warp filter · 838367db
      David Barker authored
      Also rename warp_affine() to av1_warp_affine()
      
      Change-Id: I945baff6be8a1ea942ce88dfcfa5344af6b3a966
      838367db
    • David Barker's avatar
      Optimize SSE2 warp filter · 1b888f2e
      David Barker authored
      Improve the speed of the warp filter itself by ~30%. This leads
      to an overall decoder speedup of 5-20%, depending on bitrate,
      for the global-motion experiment, and a small speedup for
      warped-motion.
      
      Applies a very minor change to the rounding during filter
      selection (ROUND_POWER_OF_TWO makes slightly more sense here
      than ROUND_POWER_OF_TWO_SIGNED, and is faster)
      
      Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43
      1b888f2e
  32. 12 Jan, 2017 1 commit
    • David Barker's avatar
      Add SSE2 vectorized warp filter for lowbd · d5dfa96e
      David Barker authored
      End-to-end speed improvements: (measured on tempete_cif.y4m,
      20 frames for encoder and all 260 frames for decoder)
      
      * GLOBAL_MOTION encoder: ~10% faster
      * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
      * WARPED_MOTION encoder: ~2.5% faster
      * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
      
      The improvement in the GLOBAL_MOTION decoder is particularly
      large because its runtime is dominated by calls to warp_plane().
      
      This introduces minor changes to the output of the warp filter,
      but these should be rare.
      
      Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
      d5dfa96e
  33. 09 Jan, 2017 1 commit
    • Yue Chen's avatar
      Use fast warping algorithm for warped motion mode · 7d2109e5
      Yue Chen authored
      Disable warped motion mode when the model parameters are out of the
      range of the new interpolation algorithm.
      Performance: 1.1% lowres (was 1.2%)
      
      Change-Id: I947ce3fd07e0d574d66333c1a729e85ba0294b4a
      7d2109e5
  34. 07 Jan, 2017 1 commit
    • David Barker's avatar
      Fix new warp filter in the case wmmat[2] == 0 · fa19516f
      David Barker authored
      In this case, calculating the shear parameters fails
      with a divide-by-zero error. So disable the new filter
      in this case.
      
      We also temporarily remove the asserts blocking use
      of the old filter with debugging enabled.
      
      Change-Id: I788ff51c3bc1d841eab1099881cc3b55038ae342
      fa19516f
  35. 21 Dec, 2016 1 commit