1. 06 Sep, 2017 1 commit
    • David Barker's avatar
      Adjust chroma position in warp filter · a60dc9d6
      David Barker authored
      When using chroma subsampling, the warp filter currently behaves
      strangely when projecting chroma pixels, especially when the
      subsamplings are not equal along the x and y axes.
      
      For example, when subsampling_x = 1 and subsampling_y = 0, we
      calculate the destination coordinates (dx, dy) from the source
      coordinates (sx, sy) as:
      dx = project(2*sx+0.5, 2*sy+0.5)/2 - 0.5
      dy = project(sx, sy)
      where project() applies the affine warp model.
      
      This patch changes to a simpler and more consistent model,
      where we:
      * Project the chroma sample into luma coordinates, taking
        the chroma sample to be co-located with the top-left luma
        sample in its (2x2, or 2x1, or 1x2) subsampling block
        (this is done for simplicity; we don't expect the exact
         position to make much difference to the output quality)
      * Apply the transformation in luma coordinates
      * Project the resulting luma sample back into chroma coordinates
      
      Change to software speed is in the noise, but this approach
      should be simpler in hardware, and should slightly improve
      quality for 4:2:2 and 4:4:0 videos.
      
      Change-Id: Idd455fdd3897594ca7d4edff5b85b78961d1638d
      a60dc9d6
  2. 21 Aug, 2017 1 commit
    • Rupert Swarbrick's avatar
      Obey do_average flag when doing convolve_round · 07089c68
      Rupert Swarbrick authored
      Doing this means that we don't have to memset temporary buffers to
      zero in reconinter.c, which was taking ~5% of cycles in a short
      encoding test (using perf to attach to a running encode).
      
      Change-Id: Ibb6e31920000b876c6ee99f454d89c8a97e9fb31
      07089c68
  3. 31 Jul, 2017 1 commit
    • Peter de Rivaz's avatar
      Unified warp_affine and warp_affine_post_round · b6a31753
      Peter de Rivaz authored
      This patch removes the need for a separate warp_affine_post_round
      function by adding the functionality to the warp_affine function.
      
      The encoded output should remain unchanged, but the encoder/decoder
      should operate faster because the sse2 and ssse3 warp implementation
      can now be used when post_rounding is being used.
      
      Change-Id: Ide52cae55de59a9da9c27c5793e17390f6d2c03e
      b6a31753
  4. 12 Jul, 2017 1 commit
  5. 08 Jul, 2017 1 commit
    • Debargha Mukherjee's avatar
      Changed scaling of MVs to use higher precision. · 15836145
      Debargha Mukherjee authored
      This is intended to be a no-op when scaling is not
      enabled, but is expected to result in more accurate
      prediction when references need to be scaled.
      
      However note all xs, yx, subpel_x and subpel_y values
      are now at higher than 1/16th precision.
      
      Change-Id: I4b22573ea290a31fc58ead980bb0d5e5a9e89243
      15836145
  6. 27 Jun, 2017 1 commit
    • Debargha Mukherjee's avatar
      Reduce multiplier precision for warp least squares · f053cba2
      Debargha Mukherjee authored
      Includes reordering and other clamping changes, as well as
      changes to reduce multiplier precision.
      
      cam_lowres (60 frames): -0.092% BDRATE improvement in
      --disable-cdef --disable-global-motion --disable-ext-tx
      configuation.
      
      Change-Id: I0660c45b44fcd5a193534d8dadd1aa1ae5c5e27a
      f053cba2
  7. 21 Jun, 2017 3 commits
  8. 20 Jun, 2017 2 commits
  9. 06 Jun, 2017 1 commit
    • David Barker's avatar
      Fix some UBSan warnings · 185575a7
      David Barker authored
      * Make intermediate arrays in av1(_highbd)_warp_affine_c signed,
        to avoid integer overflow when multiplying an 'unsigned int'
        by a negative 'int' value.
      
      * Pad out arrays in masked_variance_test.cc so that the array
        stride is a multiple of 16 bytes.
        This fixes some UBSan errors in masked_variance_intrin_ssse3.c
        related to unaligned loads of 32-bit values.
      
      BUG=aomedia:572
      
      Change-Id: I0cf786c94870ff128c883bed8e900b0686afc3f7
      185575a7
  10. 05 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Early termination for warp error computation · 81f6ecd1
      Sarah Parker authored
      This terminates the computation for the warp error once
      the frame error exceeds the best frame error found
      so far to avoid unneccessary computation.
      
      Change-Id: I094a0b3e13f8b91610e051cb91d20a815879dd80
      81f6ecd1
  11. 01 Jun, 2017 1 commit
    • David Barker's avatar
      Fix integer overflow in warp filter · 17c37ceb
      David Barker authored
      Patch https://aomedia-review.googlesource.com/c/12602/ made the
      variable 'sum' in the warp filter unsigned, to indicate that its
      value should always be >= 0. But 'sum' is used to accumulate
      signed values, and it is expected that some of those values
      will be negative.
      
      The issue is that, when running 'x += y', if x is a uint32_t
      and y is an int (and is 32 bits), the C standard says to
      convert y to a uint32_t before doing the addition. This causes
      overflow, and so undefined behaviour, if y < 0.
      
      This is fixed by making 'sum' signed, and by explicitly bounds
      checking against zero at the end of the filter.
      
      BUG=aomedia:572
      
      Change-Id: I1d484b5f5698db0ec9761807610b3b2b35647983
      17c37ceb
  12. 30 May, 2017 1 commit
    • David Barker's avatar
      Tidy up warp filter · facac4f5
      David Barker authored
      * Simplify the C version of the warp filter to make the intent
        of the code clearer
      * Replace saturate_uint() in the C warp filter with an assertion
        that the intermediate values are in-range. This is because they
        should (provably) *never* go out-of-range.
      * Add a comment describing the intended hardware architecture
      * Miscellaneous comment updates
      
      Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e
      facac4f5
  13. 29 May, 2017 1 commit
  14. 26 May, 2017 1 commit
  15. 16 May, 2017 1 commit
    • David Barker's avatar
      Further speedups to warp filter · 58616eb0
      David Barker authored
      * Calculate sx4, sy4 by truncation instead of rounding
      * Move some repeated calculations out of the filter loop
      
      This is expected to have a roughly neutral effect on BDRATE.
      The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
      7-10%, for a total speedup of 14-18% when considered together
      with patches f7a5ee53 and 14b8112b.
      
      Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
      58616eb0
  16. 15 May, 2017 1 commit
  17. 12 May, 2017 2 commits
  18. 11 May, 2017 2 commits
    • Sean Purser-Haskell's avatar
      Extra rounding to let hw to use narrower integers. · 14b8112b
      Sean Purser-Haskell authored
      Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
      14b8112b
    • David Barker's avatar
      More accurate chroma warping · f7a5ee53
      David Barker authored
      Previously, the projected positions of chroma pixels would effectively
      undergo double rounding, since we round both when calculating x4 / y4
      and when calculating the filter index. Further, the two roundings
      were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
      the filter index uses ROUND_POWER_OF_TWO.
      
      It is slightly more accurate (and faster) to replace the first
      rounding by a shift; this is motivated by the fact that
      ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)
      
      Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
      f7a5ee53
  19. 06 May, 2017 1 commit
  20. 05 May, 2017 2 commits
  21. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  22. 03 May, 2017 4 commits
  23. 02 May, 2017 2 commits
  24. 01 May, 2017 3 commits
    • Yaowu Xu's avatar
      labs() -> llabs() · 321357ee
      Yaowu Xu authored
      llabs() takes int64_t as input paramemter, therefore fixes warnings of
      explict type conversion from int64_t to long.
      
      Change-Id: I2569a5c7e425e3690f5dc7a607bad2539c2324f6
      321357ee
    • Yaowu Xu's avatar
      Avoid left shift of negative values · cc6bdab7
      Yaowu Xu authored
      Convert shifts of int/int64 into multiplications
      
      Change-Id: I3d7ef400249096a6c3712c46f59c35c3ddfde5ca
      cc6bdab7
    • Debargha Mukherjee's avatar
      Turn off SSE2 version of warping temporarily · 1abf447b
      Debargha Mukherjee authored
      Temporarily force C version until the SSE2 version is fixed
      
      Change-Id: I51450068259f998d178b1c681872e59d056b254b
      1abf447b
  25. 28 Apr, 2017 3 commits
    • Debargha Mukherjee's avatar
      Revert "Limit to 192 filters for warp, clamp index since in some cases index 192" · 79362e33
      Debargha Mukherjee authored
      This reverts commit 266db85d.
      
      Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch.
      
      Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f
      79362e33
    • Yue Chen's avatar
      Fix test failures and warnings of WARPED_MOTION · f3e1ead3
      Yue Chen authored
      Properly set number of projection samples for seg skip blocks
      at the encoder side to clear unit test failure when both seg feature
      and warped_motion is on.
      Clear 'implicit conversions' warnings
      
      Change-Id: I29e40ffae75880dae2584dbc8772c81321f6d69e
      f3e1ead3
    • David Barker's avatar
      Fix encode/decode mismatch with global/warped motion · b62eef7b
      David Barker authored
      When predicting a 4x4 warp block (either using ZEROMV with
      global-motion, or the WARPED_CAUSAL motion mode with
      warped-motion), the warp filter would previously write
      4 bytes to the right of the block.
      
      This caused encode/decode mismatches when encoding with
      multiple threads and tile_cols > 1, since in that case
      we could end up overwriting already-generated pixels from
      the next tile across.
      
      This patch changes the filter so that we only overwrite the
      intended pixels.
      
      Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
      b62eef7b
  26. 27 Apr, 2017 1 commit