1. 13 Jul, 2017 1 commit
    • Yi Luo's avatar
      Speed up convolve_round post-rounding by avx2 · 04cef497
      Yi Luo authored
      - Decoder convolve rounding cycle percentage drops from
        2.75% to 0.91% by using avx2 function on i7-6700.
      
      Change-Id: I34ae48f45c0b4073f8962647d2181365ffe3325b
      04cef497
  2. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
      d8b1ddce
  3. 24 Jun, 2017 1 commit
  4. 20 Jun, 2017 1 commit
  5. 12 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Clean up hbd transform code · 30dfa883
      Sarah Parker authored
      Responding to some left over cosmetic comments from
      2b5cdb1cf87c933331a16cc0221455d0a8c255e1
      
      Change-Id: I42e126593526cedd6675adf35b9c1df78e1ddf54
      30dfa883
  6. 09 Jun, 2017 2 commits
    • David Barker's avatar
      Vectorize av1_convolve_2d() · 8295c7c7
      David Barker authored
      Includes a test case based on the warp filter tests
      
      Change-Id: I9abea53a088f68bb8a928ebd7cb96b3266a63c13
      8295c7c7
    • David Barker's avatar
      Add 'do_average' to ConvolveParams structure · e64d51a9
      David Barker authored
      The 'ref' member of ConvolveParams currently serves two purposes:
      * To indicate which component of a compound we're currently predicting,
        eg. for fetching interpolation filters with dual-filter enabled.
      * To determine whether we should average into the destination buffer.
      
      But there are two cases where we want to separate these out:
      * In joint_motion_search, we want to try combining a fixed second
        prediction with various first predictions.
      * When searching masked interinter compounds, we want to predict
        each component separately then try different combinations.
      
      In these cases, we set 'ref' to 0 and use temporary variables to
      make sure we use the correct interpolation filters. But this is
      quite fragile.
      
      This patch separates out the two uses into separate members.
      This allows us to remove some temporary variables, but more
      importantly gives easy fixes to two bugs in
      build_inter_predictors_single_buf (used by rdopt):
      
      * We previously set ref=0 but didn't fix up the interpolation filters
      * For ZERO_ZEROMV modes, the second component would accidentally
        average into the (uninitialized!) second prediction buffer
      
      BUG=aomedia:577
      BUG=aomedia:584
      BUG=aomedia:595
      
      Change-Id: Ibc31d1ac701a029ea5efaa1197dd402bc4b7af1e
      e64d51a9
  7. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  8. 02 Jun, 2017 1 commit
  9. 30 May, 2017 1 commit
    • David Barker's avatar
      Tidy up warp filter · facac4f5
      David Barker authored
      * Simplify the C version of the warp filter to make the intent
        of the code clearer
      * Replace saturate_uint() in the C warp filter with an assertion
        that the intermediate values are in-range. This is because they
        should (provably) *never* go out-of-range.
      * Add a comment describing the intended hardware architecture
      * Miscellaneous comment updates
      
      Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e
      facac4f5
  10. 29 May, 2017 1 commit
  11. 18 May, 2017 1 commit
    • Sarah Parker's avatar
      Refactor hbd txfm configurations to be 1D · eec47e65
      Sarah Parker authored
      The hbd transform configurations were originally written for all possible
      2d transforms. Now that there are many more possible 2d transforms
      due to EXT_TX and RECT_TX, it is simpler to write the cfg for the
      4 1D transform types and compose them to make all new possible transform
      types. This will allow for an easier integration of the identity transform
      for EXT_TX and rectangular transforms for RECT_TX into the current
      hbd transform codepath and facilitate the removal of obsolete transforms.
      This has no impact on performance.
      
      BUG=aomedia:524
      
      Change-Id: I1e217bcd217fd637b1df94fae62d9c59a0523c1a
      eec47e65
  12. 16 May, 2017 2 commits
    • David Barker's avatar
      Further speedups to warp filter · 58616eb0
      David Barker authored
      * Calculate sx4, sy4 by truncation instead of rounding
      * Move some repeated calculations out of the filter loop
      
      This is expected to have a roughly neutral effect on BDRATE.
      The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
      7-10%, for a total speedup of 14-18% when considered together
      with patches f7a5ee53 and 14b8112b.
      
      Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
      58616eb0
    • James Zern's avatar
      half_btf_avx2: correct fn sig for visual studio · 52b14161
      James Zern authored
      fixes:
      formal parameter with __declspec(align('32')) won't be aligned
      
      this is the same change that was made previously for sse4:
      5bedd5dc idct16x16_sse4_1: correct fn sig for visual studio
      
      Change-Id: Ib520bde439b03f81d5e84a2711ed61215debe862
      52b14161
  13. 15 May, 2017 3 commits
  14. 13 May, 2017 2 commits
  15. 12 May, 2017 1 commit
  16. 11 May, 2017 3 commits
    • Sean Purser-Haskell's avatar
      Extra rounding to let hw to use narrower integers. · 14b8112b
      Sean Purser-Haskell authored
      Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
      14b8112b
    • David Barker's avatar
      More accurate chroma warping · f7a5ee53
      David Barker authored
      Previously, the projected positions of chroma pixels would effectively
      undergo double rounding, since we round both when calculating x4 / y4
      and when calculating the filter index. Further, the two roundings
      were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
      the filter index uses ROUND_POWER_OF_TWO.
      
      It is slightly more accurate (and faster) to replace the first
      rounding by a shift; this is motivated by the fact that
      ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)
      
      Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
      f7a5ee53
    • Yi Luo's avatar
      Partial IDCT 32x32 avx2 · 40f22ef8
      Yi Luo authored
      - Function level improvement (ms):
      Functions       ssse3  avx2   Percentage
      idct32x32_1024  794    374    52.9%
      idct32x32_135   354    169    52.2%
      idct32x32_34    197    142    27.9%
      idct32x32_1     n/a     26    n/a
      
      - Integrating in default scan order.
      
      Change-Id: I84815112b26b8a8cb800281a1cfb1706342af57d
      40f22ef8
  17. 08 May, 2017 1 commit
    • Yi Luo's avatar
      Partial IDCT 16x16 avx2 · f6176abb
      Yi Luo authored
      - Function level improvement:
      functions      sse2  avx2  percentage
      idct16x16_256  365   226   38%
      idct16x16_38   n/a   136   n/a
      idct16x16_10   171   110   35%
      idct16x16_1     34    26   23%
      
      - Integrated in AV1 for default scan order.
      
      Change-Id: Ieb1a8e730bea9c371ebc0e5f4a748640d8f5e921
      f6176abb
  18. 05 May, 2017 2 commits
  19. 04 May, 2017 2 commits
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
    • Yaowu Xu's avatar
      Change to use unaligned load · ebaf8094
      Yaowu Xu authored
      BUG=aomedia:496
      
      Change-Id: Ib49a34233b538c7543425acab305e9bc4ffcfea0
      ebaf8094
  20. 02 May, 2017 2 commits
  21. 28 Apr, 2017 2 commits
    • Debargha Mukherjee's avatar
      Revert "Limit to 192 filters for warp, clamp index since in some cases index 192" · 79362e33
      Debargha Mukherjee authored
      This reverts commit 266db85d.
      
      Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch.
      
      Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f
      79362e33
    • David Barker's avatar
      Fix encode/decode mismatch with global/warped motion · b62eef7b
      David Barker authored
      When predicting a 4x4 warp block (either using ZEROMV with
      global-motion, or the WARPED_CAUSAL motion mode with
      warped-motion), the warp filter would previously write
      4 bytes to the right of the block.
      
      This caused encode/decode mismatches when encoding with
      multiple threads and tile_cols > 1, since in that case
      we could end up overwriting already-generated pixels from
      the next tile across.
      
      This patch changes the filter so that we only overwrite the
      intended pixels.
      
      Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
      b62eef7b
  22. 27 Apr, 2017 1 commit
  23. 26 Apr, 2017 1 commit
    • James Zern's avatar
      inline -> INLINE · c1b17f3b
      James Zern authored
      inline is undefined in visual studio 2013 for C
      
      Change-Id: I85adb3968e4a98e2d7909cc42e955b1447fcfa26
      c1b17f3b
  24. 25 Apr, 2017 1 commit
  25. 24 Apr, 2017 1 commit
  26. 21 Apr, 2017 1 commit
  27. 20 Apr, 2017 1 commit
  28. 14 Apr, 2017 2 commits