1. 20 Nov, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: refactor if statements · 8263f80c
      Cheng Chen authored
      Refactor if statement that use frame_offset == -1 to indicate
      jnt_comp is not chosen, as distance now can not be negative.
      Instead, add a variable use_jnt_comp_avg for the same functionality.
      
      Change-Id: Ie6b9c6ab36131b48bc9e066babada17046729cd8
      8263f80c
  2. 13 Nov, 2017 1 commit
    • Frederic Barbier's avatar
      Cleanup useless constant parameters · c33aec54
      Frederic Barbier authored
      warp motion is disabled from a scaled reference.
      Thus, the parameters are constant (SCALE_SUBPEL_SHIFTS).
      
      Change-Id: I6757ea855ef0fd91cf2378f756f92774f9b9e39a
      c33aec54
  3. 08 Nov, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: fix issues with convolve_round · 3866a142
      Cheng Chen authored
      Fix an issue of sudden PSNR drop on a few frames when convolve_round
      is turned on.
      
      C functions is added.
      
      Corresponding simd functions and unit tests will be updated in
      future CL.
      
      Change-Id: I0126ea4d54c98951e5b1efeaecd5468fdc18724a
      3866a142
  4. 31 Oct, 2017 1 commit
  5. 26 Oct, 2017 1 commit
  6. 23 Oct, 2017 2 commits
  7. 13 Oct, 2017 1 commit
  8. 07 Sep, 2017 1 commit
  9. 06 Sep, 2017 1 commit
    • David Barker's avatar
      Adjust chroma position in warp filter · a60dc9d6
      David Barker authored
      When using chroma subsampling, the warp filter currently behaves
      strangely when projecting chroma pixels, especially when the
      subsamplings are not equal along the x and y axes.
      
      For example, when subsampling_x = 1 and subsampling_y = 0, we
      calculate the destination coordinates (dx, dy) from the source
      coordinates (sx, sy) as:
      dx = project(2*sx+0.5, 2*sy+0.5)/2 - 0.5
      dy = project(sx, sy)
      where project() applies the affine warp model.
      
      This patch changes to a simpler and more consistent model,
      where we:
      * Project the chroma sample into luma coordinates, taking
        the chroma sample to be co-located with the top-left luma
        sample in its (2x2, or 2x1, or 1x2) subsampling block
        (this is done for simplicity; we don't expect the exact
         position to make much difference to the output quality)
      * Apply the transformation in luma coordinates
      * Project the resulting luma sample back into chroma coordinates
      
      Change to software speed is in the noise, but this approach
      should be simpler in hardware, and should slightly improve
      quality for 4:2:2 and 4:4:0 videos.
      
      Change-Id: Idd455fdd3897594ca7d4edff5b85b78961d1638d
      a60dc9d6
  10. 21 Aug, 2017 1 commit
    • Rupert Swarbrick's avatar
      Obey do_average flag when doing convolve_round · 07089c68
      Rupert Swarbrick authored
      Doing this means that we don't have to memset temporary buffers to
      zero in reconinter.c, which was taking ~5% of cycles in a short
      encoding test (using perf to attach to a running encode).
      
      Change-Id: Ibb6e31920000b876c6ee99f454d89c8a97e9fb31
      07089c68
  11. 31 Jul, 2017 1 commit
    • Peter de Rivaz's avatar
      Unified warp_affine and warp_affine_post_round · b6a31753
      Peter de Rivaz authored
      This patch removes the need for a separate warp_affine_post_round
      function by adding the functionality to the warp_affine function.
      
      The encoded output should remain unchanged, but the encoder/decoder
      should operate faster because the sse2 and ssse3 warp implementation
      can now be used when post_rounding is being used.
      
      Change-Id: Ide52cae55de59a9da9c27c5793e17390f6d2c03e
      b6a31753
  12. 12 Jul, 2017 1 commit
  13. 08 Jul, 2017 1 commit
    • Debargha Mukherjee's avatar
      Changed scaling of MVs to use higher precision. · 15836145
      Debargha Mukherjee authored
      This is intended to be a no-op when scaling is not
      enabled, but is expected to result in more accurate
      prediction when references need to be scaled.
      
      However note all xs, yx, subpel_x and subpel_y values
      are now at higher than 1/16th precision.
      
      Change-Id: I4b22573ea290a31fc58ead980bb0d5e5a9e89243
      15836145
  14. 27 Jun, 2017 1 commit
    • Debargha Mukherjee's avatar
      Reduce multiplier precision for warp least squares · f053cba2
      Debargha Mukherjee authored
      Includes reordering and other clamping changes, as well as
      changes to reduce multiplier precision.
      
      cam_lowres (60 frames): -0.092% BDRATE improvement in
      --disable-cdef --disable-global-motion --disable-ext-tx
      configuation.
      
      Change-Id: I0660c45b44fcd5a193534d8dadd1aa1ae5c5e27a
      f053cba2
  15. 21 Jun, 2017 3 commits
  16. 20 Jun, 2017 2 commits
  17. 06 Jun, 2017 1 commit
    • David Barker's avatar
      Fix some UBSan warnings · 185575a7
      David Barker authored
      * Make intermediate arrays in av1(_highbd)_warp_affine_c signed,
        to avoid integer overflow when multiplying an 'unsigned int'
        by a negative 'int' value.
      
      * Pad out arrays in masked_variance_test.cc so that the array
        stride is a multiple of 16 bytes.
        This fixes some UBSan errors in masked_variance_intrin_ssse3.c
        related to unaligned loads of 32-bit values.
      
      BUG=aomedia:572
      
      Change-Id: I0cf786c94870ff128c883bed8e900b0686afc3f7
      185575a7
  18. 05 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Early termination for warp error computation · 81f6ecd1
      Sarah Parker authored
      This terminates the computation for the warp error once
      the frame error exceeds the best frame error found
      so far to avoid unneccessary computation.
      
      Change-Id: I094a0b3e13f8b91610e051cb91d20a815879dd80
      81f6ecd1
  19. 01 Jun, 2017 1 commit
    • David Barker's avatar
      Fix integer overflow in warp filter · 17c37ceb
      David Barker authored
      Patch https://aomedia-review.googlesource.com/c/12602/ made the
      variable 'sum' in the warp filter unsigned, to indicate that its
      value should always be >= 0. But 'sum' is used to accumulate
      signed values, and it is expected that some of those values
      will be negative.
      
      The issue is that, when running 'x += y', if x is a uint32_t
      and y is an int (and is 32 bits), the C standard says to
      convert y to a uint32_t before doing the addition. This causes
      overflow, and so undefined behaviour, if y < 0.
      
      This is fixed by making 'sum' signed, and by explicitly bounds
      checking against zero at the end of the filter.
      
      BUG=aomedia:572
      
      Change-Id: I1d484b5f5698db0ec9761807610b3b2b35647983
      17c37ceb
  20. 30 May, 2017 1 commit
    • David Barker's avatar
      Tidy up warp filter · facac4f5
      David Barker authored
      * Simplify the C version of the warp filter to make the intent
        of the code clearer
      * Replace saturate_uint() in the C warp filter with an assertion
        that the intermediate values are in-range. This is because they
        should (provably) *never* go out-of-range.
      * Add a comment describing the intended hardware architecture
      * Miscellaneous comment updates
      
      Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e
      facac4f5
  21. 29 May, 2017 1 commit
  22. 26 May, 2017 1 commit
  23. 16 May, 2017 1 commit
    • David Barker's avatar
      Further speedups to warp filter · 58616eb0
      David Barker authored
      * Calculate sx4, sy4 by truncation instead of rounding
      * Move some repeated calculations out of the filter loop
      
      This is expected to have a roughly neutral effect on BDRATE.
      The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
      7-10%, for a total speedup of 14-18% when considered together
      with patches f7a5ee53 and 14b8112b.
      
      Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
      58616eb0
  24. 15 May, 2017 1 commit
  25. 12 May, 2017 2 commits
  26. 11 May, 2017 2 commits
    • Sean Purser-Haskell's avatar
      Extra rounding to let hw to use narrower integers. · 14b8112b
      Sean Purser-Haskell authored
      Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
      14b8112b
    • David Barker's avatar
      More accurate chroma warping · f7a5ee53
      David Barker authored
      Previously, the projected positions of chroma pixels would effectively
      undergo double rounding, since we round both when calculating x4 / y4
      and when calculating the filter index. Further, the two roundings
      were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
      the filter index uses ROUND_POWER_OF_TWO.
      
      It is slightly more accurate (and faster) to replace the first
      rounding by a shift; this is motivated by the fact that
      ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)
      
      Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
      f7a5ee53
  27. 06 May, 2017 1 commit
  28. 05 May, 2017 2 commits
  29. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  30. 03 May, 2017 4 commits