1. 26 May, 2017 1 commit
  2. 16 May, 2017 1 commit
    • David Barker's avatar
      Further speedups to warp filter · 58616eb0
      David Barker authored
      * Calculate sx4, sy4 by truncation instead of rounding
      * Move some repeated calculations out of the filter loop
      
      This is expected to have a roughly neutral effect on BDRATE.
      The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
      7-10%, for a total speedup of 14-18% when considered together
      with patches f7a5ee53 and 14b8112b.
      
      Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
      58616eb0
  3. 15 May, 2017 1 commit
  4. 12 May, 2017 2 commits
  5. 11 May, 2017 2 commits
    • Sean Purser-Haskell's avatar
      Extra rounding to let hw to use narrower integers. · 14b8112b
      Sean Purser-Haskell authored
      Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
      14b8112b
    • David Barker's avatar
      More accurate chroma warping · f7a5ee53
      David Barker authored
      Previously, the projected positions of chroma pixels would effectively
      undergo double rounding, since we round both when calculating x4 / y4
      and when calculating the filter index. Further, the two roundings
      were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
      the filter index uses ROUND_POWER_OF_TWO.
      
      It is slightly more accurate (and faster) to replace the first
      rounding by a shift; this is motivated by the fact that
      ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)
      
      Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
      f7a5ee53
  6. 06 May, 2017 1 commit
  7. 05 May, 2017 2 commits
  8. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  9. 03 May, 2017 4 commits
  10. 02 May, 2017 2 commits
  11. 01 May, 2017 3 commits
    • Yaowu Xu's avatar
      labs() -> llabs() · 321357ee
      Yaowu Xu authored
      llabs() takes int64_t as input paramemter, therefore fixes warnings of
      explict type conversion from int64_t to long.
      
      Change-Id: I2569a5c7e425e3690f5dc7a607bad2539c2324f6
      321357ee
    • Yaowu Xu's avatar
      Avoid left shift of negative values · cc6bdab7
      Yaowu Xu authored
      Convert shifts of int/int64 into multiplications
      
      Change-Id: I3d7ef400249096a6c3712c46f59c35c3ddfde5ca
      cc6bdab7
    • Debargha Mukherjee's avatar
      Turn off SSE2 version of warping temporarily · 1abf447b
      Debargha Mukherjee authored
      Temporarily force C version until the SSE2 version is fixed
      
      Change-Id: I51450068259f998d178b1c681872e59d056b254b
      1abf447b
  12. 28 Apr, 2017 3 commits
    • Debargha Mukherjee's avatar
      Revert "Limit to 192 filters for warp, clamp index since in some cases index 192" · 79362e33
      Debargha Mukherjee authored
      This reverts commit 266db85d.
      
      Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch.
      
      Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f
      79362e33
    • Yue Chen's avatar
      Fix test failures and warnings of WARPED_MOTION · f3e1ead3
      Yue Chen authored
      Properly set number of projection samples for seg skip blocks
      at the encoder side to clear unit test failure when both seg feature
      and warped_motion is on.
      Clear 'implicit conversions' warnings
      
      Change-Id: I29e40ffae75880dae2584dbc8772c81321f6d69e
      f3e1ead3
    • David Barker's avatar
      Fix encode/decode mismatch with global/warped motion · b62eef7b
      David Barker authored
      When predicting a 4x4 warp block (either using ZEROMV with
      global-motion, or the WARPED_CAUSAL motion mode with
      warped-motion), the warp filter would previously write
      4 bytes to the right of the block.
      
      This caused encode/decode mismatches when encoding with
      multiple threads and tile_cols > 1, since in that case
      we could end up overwriting already-generated pixels from
      the next tile across.
      
      This patch changes the filter so that we only overwrite the
      intended pixels.
      
      Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
      b62eef7b
  13. 27 Apr, 2017 1 commit
  14. 26 Apr, 2017 2 commits
  15. 24 Apr, 2017 1 commit
  16. 21 Apr, 2017 1 commit
    • Urvang Joshi's avatar
      Revert "warp_affine_c: Refactor highbd and lowbd versions." · 0d08afdc
      Urvang Joshi authored
      This reverts commit 8cd0e7ef.
      
      Reason for revert:
      This change breaks av1_warp_affine_c when CONFIG_HIGHBITDEPTH is enabled.
      
      In particular, running ./test_libaom --gtest_filter=*Warp* compiled with --enable-warped-motion --enable-highbitdepth shows several test failures, followed by a segmentation fault when it gets up to test SSE2/AV1WarpFilterTest.CheckOutput/4
      
      The tricky part is that the use the lowbd version of the function is dependent on a mix of two conditions:
      (1) Compile time check for CONFIG_HIGHBITDEPTH and
      (2) Run time check to see if bit-depth == 8
      So, it is tricky to refactor.
      
      BUG=aomedia:442
      
      Change-Id: I610c537fb65bde4f357185a13081639f906351de
      0d08afdc
  17. 20 Apr, 2017 1 commit
  18. 17 Apr, 2017 1 commit
  19. 13 Apr, 2017 1 commit
    • Debargha Mukherjee's avatar
      Adds option to use 1/32 subpel precision for gm/wm · 16056f5b
      Debargha Mukherjee authored
      Adds filters for 1/32 subpel precision for warping.
      To use 1/32 subpel precision make WARPEDPIXEL_PREC_BITS 5.
      By default, WARPEDPIXEL_PREC_BITS is set as 6 in common/mv.h,
      which uses 1/64 subpel precision.
      
      If 1/32 precision is used, BDRATE drops:
      on lowres:
      -1.101 (vs. -1.186% with 1/64) w/warped-motion
      -1.587 (vs. -1.650% with 1/64) w/global-motion
      
      on cam_lowres:
      -2.638 (vs. -2.707% with 1/64) w/warped-motion
      -3.396 (vs. -3.453% with 1/64) w/global-motion
      
      Change-Id: I82fbfddaad9bd9be658fe382401d212833c7ceef
      16056f5b
  20. 12 Apr, 2017 1 commit
  21. 11 Apr, 2017 2 commits
  22. 10 Apr, 2017 2 commits
  23. 08 Apr, 2017 1 commit
  24. 07 Apr, 2017 1 commit
  25. 06 Apr, 2017 1 commit
    • David Barker's avatar
      Prepare for vectorizing highbd warp filter · 2bcf280e
      David Barker authored
      This applies the same refactorings to highbd_warp_plane
      which were applied to warp_plane a while ago, and lays the
      groundwork for the relevant tests.
      
      Change-Id: Ic4c00bce1accc5a3624bba0c3b4b325e69a42c1a
      2bcf280e
  26. 05 Apr, 2017 1 commit