1. 12 Feb, 2018 2 commits
    • Peng Bin's avatar
      Add inv txfm2d sse2 for sizes with 4 · 18976fa5
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_4x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x8_sse2
      Implement av1_lowbd_inv_txfm2d_add_8x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x16_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x4_sse2
      A brief speed test shows that using the included SSE2 functions
      completed by this CL, for speed1 lowbitdepth encoder speeds up >9%
      and lowbitdepth decoder speeds up >25%, comparing to the highbitdepth
      implementation in the baseline.
      Change-Id: I0576a2a146c0b1a7b483c9d35c3d21d979e263cd
    • Zoe Liu's avatar
      [NORMATIVE] Unify comp ref context design · 4917295b
      Zoe Liu authored
      This patch uses the neighboring ref counts to design the contexts
      for the coding of the first reference frame of a reference pair for
      the compound prediction. This aligns the context design with that
      for the second reference frame of a reference pair for the
      compound prediction.
      The new designed contexts are much simpler than that in the baseline.
      The number of contexts for each binary symbol is reduced from 5 to
      3. Further, the logic for each context only depends on the collected
      neighboring ref counts, which is straightforward to derive.
      The default CDFs for the first reference frame coding have been
      updated using aom_entropy_optimizer.
      Experimental results demonstrate a small coding gain for Google test
      sets of both lowres and midres, with 30 frames coded for the default
      coding tool setup:
      lowres: avg_psnr -0.077%; ovr_psnr -0.076%; ssim -0.106%
      midres: avg_psnr -0.059%; ovr_psnr -0.066%; ssim -0.037%
      Change-Id: I781abbe4616dc3f3a7213ec663946ff9844eb830
  2. 11 Feb, 2018 7 commits
    • Jingning Han's avatar
      [NORMATIVE] Compound mode ref mv construction · e571f523
      Jingning Han authored
      Re-design the compound mode reference motion vector fetch. Use
      a single run to provide all the compound ref mvs. Save the potential
      additional two ref mv search runs on single reference frames.
      Tested on night_720p 50 frames at 800 kbps. The average time cost
      on find_mv_refs calls is reduced by 15% (average 69875 us ->
      60473 us). The overall compression performance change is less than
      Change-Id: I388b9cf36817d10613cd2c9d0bd8865b43324009
    • Yunqing Wang's avatar
      Refactor the ref_mv code · 7fcd0247
      Yunqing Wang authored
      Continued to refactor the reference MV search code, so that we can
      combine the compound ref mode search later.
      Change-Id: I6227a3142e82caa20f2a17a0c76147035eaa2129
    • Jingning Han's avatar
      [NORMATIVE opt-ref-mv] Rework mv fetch from diff ref frames · ff1a35b9
      Jingning Han authored
      This commit re-designs the reference motion vector fetching from
      spatial neighbors with different reference frames from a current
      coding block. Instead of re-running through the VP9 like reference
      motion vector search, it goes through the nearest top row and left
      column only. Such process kicks in, if and only if the regular
      reference frame match based mv search didn't find 2 or more mvs.
      The search through neighboring blocks with different reference
      frame will stop once 2 mvs are found.
      To decide the reference mvs, it compares the reference frame types
      from the two blocks. If they are on the same side, directly re-use
      it. Otherwise, reverse the sign of the motion vector.
      The compression performance change is in the noise range as
      0.03% down.
      Change-Id: Ib698d7c463f2f42c767f6ca008c8a7c84289df60
    • Peng Bin's avatar
      Add inv txfm2d 64 sse2 · a7ba23f6
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_32x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x16_sse2
      Change-Id: I1b27618f153583cc787e7bf6ef1616e7c6932990
    • Peng Bin's avatar
      Add inv txfm2d {8x32,32x8} sse2 · 008c6430
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_8x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_32x8_sse2
      Change-Id: Ibd5de72e1d2c4dabba5af020a06e8cfac329dc3d
    • Peng Bin's avatar
      Code refactor of lowbd_inv_txfm2d_add sse2 · abd17171
      Peng Bin authored
      1. Reorder functions to align with TX_SIZE define order.
      2. Merge functions for each TX_SIZE which have very similar code
         into a universal function lowbd_inv_txfm2d_add_internal_sse2.
      3. No speed impact was spotted except size 8x8, so the 8x8 version
         stays unchanged.
      Change-Id: Ic896aacd93745906716582af855774807a863231
    • Jonathan Matthews's avatar
      Make av1_decode_tg_tiles_and_wrapup handle highbd · fa455fca
      Jonathan Matthews authored
      Change-Id: Ibfa14836b1f80b54984b9d275f04ff842821cc6c
  3. 10 Feb, 2018 11 commits
    • Johann's avatar
      misc: apply clang-format v5.0.0 · 3c30fb48
      Johann authored
      Change-Id: I4b60db5c43bb443ddd001ccb6601d1a7d825bfa6
    • Johann's avatar
      test: apply clang-format v5.0.0 · f152ff6a
      Johann authored
      Change-Id: Iee91f5f6314c43556791850db19687ccac14c8be
    • Johann's avatar
      av1/[common|decoder]: apply clang-format v5.0.0 · 6b41d4da
      Johann authored
      Change-Id: I86befaf7aa35f3f9b18618db1a27d191c1f7af36
    • Johann's avatar
      av1/encoder: apply clang-format v5.0.0 · b0ef6ff3
      Johann authored
      Change-Id: If88516ac3dcd72b528f4f7e27aab181a5137b285
    • Johann's avatar
      aom_dsp: apply clang-format v5.0.0 · e8c11385
      Johann authored
      Change-Id: I3733c974654712b3ca56f541bb642af9e8cdd504
    • Johann's avatar
      update .clang-format for v5.0.0 · 1597876d
      Johann authored
      Change-Id: If9aababe5b92e8e1f4118cd46fa0eed3f0933175
    • Zoe Liu's avatar
      Hook in SSE2 inv txfms {8,16,32} · 2642647c
      Zoe Liu authored
      This CL hooks in the six new SSE2 inv txfm functions implemented by
      A brief speed test shows that using the new SSE2 code for speed1,
      lowbd encoder speeds up ~18% and lowbd decoder speeds up ~13%,
      comparing to the highbd implementation in the baseline.
      Change-Id: I97769c2f44f7bffd86ffdce097cff3ca633b2644
    • Zoe Liu's avatar
      Remove unneeded param in single ref context · 50097b7a
      Zoe Liu authored
      Change-Id: I2684121c30c1b8d982f32d55e6897f44c4257334
    • Peng Bin's avatar
      Inv_txfm unittest cosmetic changes · e193a70d
      Peng Bin authored
      Change-Id: I51df5a53bfa97ce69e6820e67df2cece3c0d5be5
    • Debargha Mukherjee's avatar
      Reorganize code to test various convolve options · e820b820
      Debargha Mukherjee authored
      Reorganize code to faciliate setting rounding parameters based
      on bit-depth, and to faciliate testing.
      After this patch this wil be the behavior for config flags as far
      as round_0 and round_1 choices are concerned for 8- and 10-bit:
      round_0 = 5, round_1 = None (baseline)
      round_0 = 3, round_1 = None (to test impact of increase in precision
      of intermediate buffer)
      round_0 = 5, round_1 = 4
      round_0 = 5, round_1 = 5
      round_0 = 3, round_1 = 6 (ARM proposal except clipping)
      round_0 = 3, round_1 = 7 (Google variation proposal)
      Change-Id: I615348332f5692135352085ca923662f9d52f696
    • Angie Chiang's avatar
      TXMG cosmetic changes · 5d8e28e1
      Angie Chiang authored
      Change-Id: I344118e58acc6835df929cb7f7451cacf157d55b
  4. 09 Feb, 2018 20 commits
    • Peng Bin's avatar
      Add av1_lowbd_inv_txfm2d_add_{16,32}_sse2 · 3285bc46
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_32x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_32x16_sse2
      Change-Id: I1b5dc29d0cf75d5d43f4869b729f480f03534ea9
    • Joe Young's avatar
      [unit-test] Add unit test for intra-edge functions · 1124db22
      Joe Young authored
      Change-Id: I138f275174bd2df21f7480acb629dd85a3a3c44c
    • Jingning Han's avatar
      [NORMATIVE]Constrain mv reference within 64x64 block · 5e19c9da
      Jingning Han authored
      When the coding block size is above 64x64 size, only use the
      top-left 64x64 region to derive the reference motion vectors.
      Change-Id: I7a0950168dbc886222697058dee105cf70d1c196
    • Jingning Han's avatar
      [NORMATIVE] mfmv extension border check at 64x64 · d41869a2
      Jingning Han authored
      Only use the mfmv reference within the same 64x64 block region.
      Change-Id: Ia7a60cd81cb9ea0e60ae0edcbe40a43d55ebb0f3
    • Jingning Han's avatar
      [NORMATIVE] Account all the newmv modes for compound mode context · 2d17ec66
      Jingning Han authored
      The compound mode context model depends on the number of 8x8
      blocks coded in the newmv mode under opt-ref-mv. This commit makes
      the codec account for all the newmv modes in both single and
      compound settings for that purpose. It only affects changes under
      It improves the midres coding performance by 0.08%.
      Change-Id: I0899cbb31e0001d958677128bcc94b063b449817
    • Yunqing Wang's avatar
      Modify reference MV function prototypes · 02efe6a2
      Yunqing Wang authored
      This patch modified the function prototypes in reference MV search,
      which prepared for the following change to combine the compound
      reference search and 2 single reference search together while the
      current block is compound mode. This patch doesn't cause normative
      bitstream change.
      Change-Id: I0d645983233753861d940b603d13957576ab51fb
    • Angie Chiang's avatar
      [Normative txmg]Use 12-bit cos_bit on inverse txfm · 463bd753
      Angie Chiang authored
      Change-Id: If36663a335d1d6af57faf98ba70755af4c3d56ed
    • Zoe Liu's avatar
      Fix unittest on av1_lowbd_inv_txfm2d_add_{16,8}x{16,8}_sse2 · 87fc8f98
      Zoe Liu authored
      This CL is from binpengsmail@gmail.com.
      It addresses the comments for the CL
      (1) Cosmetic changes;
      (2) Fix a unittest incorrect break.
      Change-Id: I6b8d9c26d46117d6c73485157a338226f46f6752
    • Angie Chiang's avatar
      Implement av1_lowbd_fwd_txfm2d_32x8_sse2 · 83b10cbe
      Angie Chiang authored
      Change-Id: Iaef8560a0d9862a65216da1de0d3c99a1ac5f40e
    • Angie Chiang's avatar
      Fix av1_lowbd_fwd_txfm2d_32x16_sse2 · e783cdcd
      Angie Chiang authored
      Let buf0's length be 32 so that flip_buf_sse2 can work properly
      This is not triggered because there is no flip adst32 txfm
      Change-Id: Ic6c5195d0fc70c5a8cb280cfb466a29cf39e7733
    • Angie Chiang's avatar
      Implement av1_lowbd_fwd_txfm2d_8x32_sse2 · bdfbcba4
      Angie Chiang authored
      Change-Id: I4a8680b2671308385fe027db7e03f68ddd622546
    • Hui Su's avatar
      [NORMATIVE]Fix has_top_right() for 128x* blocks · ea190906
      Hui Su authored
      Before this fix, have_top_right for the complete right half of the
      sb128 is disabled.
      Borg test results don't show any compression changes. Probably 128x*
      blocks are very rarely chosen for intra modes.
      Change-Id: I66a0573c029e7e3d440014842b5d031190d89f89
    • Debargha Mukherjee's avatar
      [NORMATIVE jnt_comp] remove double rounding · 53eaf8f2
      Debargha Mukherjee authored
      Change-Id: Ib325e33bee8aa3a8445a7f61c55adfd3fb210792
    • David Barker's avatar
      [wedge/compound-segment, normative] Remove more rounding · 7dbb0051
      David Barker authored
      This reduces the overall rounding in the masked blend process -
      the result is now equivalent to having a single round operation
      at the end of the prediction process.
      This increases the range of the intermediate values inside
      aom_blend_a64_d32_mask() by 2 bits, but has no effect on the
      ranges of any values outside that function.
      Change-Id: I1010ed94c7d8db75bb3d8157c864c5527005725b
    • David Barker's avatar
      [wedge/compound-segment, normative] Reduce multiple rounding · d3b99738
      David Barker authored
      As described in the linked bug report, the masked blend operation
      contains multiple stages of rounding. This commit replaces one
      intermediate round with a right shift, which should be slightly
      faster and more accurate.
      Change-Id: Ib24ce687e628b05d645fbde5306ee552f7ad876b
    • Yaowu Xu's avatar
      aom_qm_ext: add signaling for separate QM for U/V · f7a12420
      Yaowu Xu authored
      Change-Id: I9879264011f6450bd2eb6648e39e9ad47f13a7d8
    • Jingning Han's avatar
      Skip unnecessary single ref frame motion vector search · dccaf3f2
      Jingning Han authored
      Change the reference motion vector search order to do the compound
      mode first. Only proceed to do the single reference frame motion
      vector search, when the compound frame type has less than 2 motion
      vectors found. Tested on night_720p 800 kbps 50 frames, the decoding
      process for ref mv system is sped up by 2X.
      This is a non-normative change under opt-ref-mv flag.
      Change-Id: I579f81b156a506aa4481cf8ed85d8b1e54d9e481
    • David Barker's avatar
      [NORMATIVE-DECODING, intra-edge] Fix bug in is_smooth() · a883e6ea
      David Barker authored
      Because the mbmi pointer passed into is_smooth comes from the above/left
      block, it might be an inter block. If this happens, we correctly deduce
      that the above/left block does not use a smooth intra mode.
      However, inter blocks do not set mbmi->uv_mode, so in the UV case we end up
      reading stale data. This may result in is_smooth() returning the wrong value,
      if (whatever was previously written into) mbmi->uv_mode happens to be a
      smooth intra mode.
      Fix this by including an explicit check for inter blocks.
      Change-Id: I3ec9faef9b6297e22915176067b5704003bc4664
    • Dominic Symes's avatar
      [NORMATIVE-DECODING, txmg]: Adjust IADST4 · 6f81f128
      Dominic Symes authored
      Coefficients used for the VP9 version of IADST4 satisfy the
      mathematical relation sinpi[1]+sinpi[2]==sinpi[4] which is
      a trigonometric property useful for optimizing the four point
      transform. Unfortunately the change in shift used in the latest
      code means that rounding errors are introduced so that this identity
      no longer holds. We think the identity should be restored by
      rounding 4964.563 to 4964 rather than 4965. Similarly the identity
      has been fixed for other bit shifts in the table where it does not
      hold (even though these are not currently used I think).
      This change also fixes a bug with the range checking for IADST4.
      We see little change on the statistics from a local ACWY run
      on objective-1-fast I-frame only:
      Average  +0.01 +0.01    -0.02 -0.15   +0.01 -0.15    +0.01   +0.01
      Change-Id: Icc09a1c59929e58bbc922d4b8b73de4a14104a8e
    • Angie Chiang's avatar
      Turn on sse2 simd optimization · 7d8b13ed
      Angie Chiang authored
      Change-Id: Ia72e71a61cc48b97ef9596aaa6526381f9364f1a