1. 20 Nov, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: refactor if statements · 8263f80c
      Cheng Chen authored
      Refactor if statement that use frame_offset == -1 to indicate
      jnt_comp is not chosen, as distance now can not be negative.
      Instead, add a variable use_jnt_comp_avg for the same functionality.
      
      Change-Id: Ie6b9c6ab36131b48bc9e066babada17046729cd8
      8263f80c
  2. 14 Nov, 2017 1 commit
  3. 13 Nov, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: SIMD for av1_warp_affine · fbaf5135
      Cheng Chen authored
      Add low bit-depth SIMD function for av1_warp_affine based on
      existing SIMD implementation.
      Unit tests are added.
      
      Change-Id: I1b4033fa75b53a81cb20a4bb5cc60413708b568c
      fbaf5135
  4. 06 Sep, 2017 1 commit
    • David Barker's avatar
      Adjust chroma position in warp filter · a60dc9d6
      David Barker authored
      When using chroma subsampling, the warp filter currently behaves
      strangely when projecting chroma pixels, especially when the
      subsamplings are not equal along the x and y axes.
      
      For example, when subsampling_x = 1 and subsampling_y = 0, we
      calculate the destination coordinates (dx, dy) from the source
      coordinates (sx, sy) as:
      dx = project(2*sx+0.5, 2*sy+0.5)/2 - 0.5
      dy = project(sx, sy)
      where project() applies the affine warp model.
      
      This patch changes to a simpler and more consistent model,
      where we:
      * Project the chroma sample into luma coordinates, taking
        the chroma sample to be co-located with the top-left luma
        sample in its (2x2, or 2x1, or 1x2) subsampling block
        (this is done for simplicity; we don't expect the exact
         position to make much difference to the output quality)
      * Apply the transformation in luma coordinates
      * Project the resulting luma sample back into chroma coordinates
      
      Change to software speed is in the noise, but this approach
      should be simpler in hardware, and should slightly improve
      quality for 4:2:2 and 4:4:0 videos.
      
      Change-Id: Idd455fdd3897594ca7d4edff5b85b78961d1638d
      a60dc9d6
  5. 21 Aug, 2017 1 commit
    • Rupert Swarbrick's avatar
      Obey do_average flag when doing convolve_round · 07089c68
      Rupert Swarbrick authored
      Doing this means that we don't have to memset temporary buffers to
      zero in reconinter.c, which was taking ~5% of cycles in a short
      encoding test (using perf to attach to a running encode).
      
      Change-Id: Ibb6e31920000b876c6ee99f454d89c8a97e9fb31
      07089c68
  6. 31 Jul, 2017 1 commit
    • Peter de Rivaz's avatar
      Unified warp_affine and warp_affine_post_round · b6a31753
      Peter de Rivaz authored
      This patch removes the need for a separate warp_affine_post_round
      function by adding the functionality to the warp_affine function.
      
      The encoded output should remain unchanged, but the encoder/decoder
      should operate faster because the sse2 and ssse3 warp implementation
      can now be used when post_rounding is being used.
      
      Change-Id: Ide52cae55de59a9da9c27c5793e17390f6d2c03e
      b6a31753
  7. 20 Jun, 2017 1 commit
  8. 30 May, 2017 1 commit
    • David Barker's avatar
      Tidy up warp filter · facac4f5
      David Barker authored
      * Simplify the C version of the warp filter to make the intent
        of the code clearer
      * Replace saturate_uint() in the C warp filter with an assertion
        that the intermediate values are in-range. This is because they
        should (provably) *never* go out-of-range.
      * Add a comment describing the intended hardware architecture
      * Miscellaneous comment updates
      
      Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e
      facac4f5
  9. 29 May, 2017 1 commit
  10. 16 May, 2017 1 commit
    • David Barker's avatar
      Further speedups to warp filter · 58616eb0
      David Barker authored
      * Calculate sx4, sy4 by truncation instead of rounding
      * Move some repeated calculations out of the filter loop
      
      This is expected to have a roughly neutral effect on BDRATE.
      The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
      7-10%, for a total speedup of 14-18% when considered together
      with patches f7a5ee53 and 14b8112b.
      
      Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
      58616eb0
  11. 15 May, 2017 1 commit
  12. 11 May, 2017 2 commits
    • Sean Purser-Haskell's avatar
      Extra rounding to let hw to use narrower integers. · 14b8112b
      Sean Purser-Haskell authored
      Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
      14b8112b
    • David Barker's avatar
      More accurate chroma warping · f7a5ee53
      David Barker authored
      Previously, the projected positions of chroma pixels would effectively
      undergo double rounding, since we round both when calculating x4 / y4
      and when calculating the filter index. Further, the two roundings
      were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
      the filter index uses ROUND_POWER_OF_TWO.
      
      It is slightly more accurate (and faster) to replace the first
      rounding by a shift; this is motivated by the fact that
      ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)
      
      Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
      f7a5ee53
  13. 05 May, 2017 1 commit
  14. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  15. 02 May, 2017 2 commits
  16. 28 Apr, 2017 2 commits
    • Debargha Mukherjee's avatar
      Revert "Limit to 192 filters for warp, clamp index since in some cases index 192" · 79362e33
      Debargha Mukherjee authored
      This reverts commit 266db85d.
      
      Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch.
      
      Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f
      79362e33
    • David Barker's avatar
      Fix encode/decode mismatch with global/warped motion · b62eef7b
      David Barker authored
      When predicting a 4x4 warp block (either using ZEROMV with
      global-motion, or the WARPED_CAUSAL motion mode with
      warped-motion), the warp filter would previously write
      4 bytes to the right of the block.
      
      This caused encode/decode mismatches when encoding with
      multiple threads and tile_cols > 1, since in that case
      we could end up overwriting already-generated pixels from
      the next tile across.
      
      This patch changes the filter so that we only overwrite the
      intended pixels.
      
      Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
      b62eef7b
  17. 27 Apr, 2017 1 commit
  18. 10 Apr, 2017 1 commit
  19. 06 Apr, 2017 1 commit
  20. 05 Apr, 2017 1 commit
  21. 19 Jan, 2017 2 commits
    • David Barker's avatar
      Add correctness tests for the SSE2 warp filter · 838367db
      David Barker authored
      Also rename warp_affine() to av1_warp_affine()
      
      Change-Id: I945baff6be8a1ea942ce88dfcfa5344af6b3a966
      838367db
    • David Barker's avatar
      Optimize SSE2 warp filter · 1b888f2e
      David Barker authored
      Improve the speed of the warp filter itself by ~30%. This leads
      to an overall decoder speedup of 5-20%, depending on bitrate,
      for the global-motion experiment, and a small speedup for
      warped-motion.
      
      Applies a very minor change to the rounding during filter
      selection (ROUND_POWER_OF_TWO makes slightly more sense here
      than ROUND_POWER_OF_TWO_SIGNED, and is faster)
      
      Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43
      1b888f2e
  22. 12 Jan, 2017 1 commit
    • David Barker's avatar
      Add SSE2 vectorized warp filter for lowbd · d5dfa96e
      David Barker authored
      End-to-end speed improvements: (measured on tempete_cif.y4m,
      20 frames for encoder and all 260 frames for decoder)
      
      * GLOBAL_MOTION encoder: ~10% faster
      * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
      * WARPED_MOTION encoder: ~2.5% faster
      * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
      
      The improvement in the GLOBAL_MOTION decoder is particularly
      large because its runtime is dominated by calls to warp_plane().
      
      This introduces minor changes to the output of the warp filter,
      but these should be rare.
      
      Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
      d5dfa96e