1. 12 Feb, 2018 2 commits
    • Peng Bin's avatar
      Add inv txfm2d sse2 for sizes with 4 · 18976fa5
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_4x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x8_sse2
      Implement av1_lowbd_inv_txfm2d_add_8x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x16_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x4_sse2
      
      A brief speed test shows that using the included SSE2 functions
      completed by this CL, for speed1 lowbitdepth encoder speeds up >9%
      and lowbitdepth decoder speeds up >25%, comparing to the highbitdepth
      implementation in the baseline.
      
      Change-Id: I0576a2a146c0b1a7b483c9d35c3d21d979e263cd
      18976fa5
    • Zoe Liu's avatar
      [NORMATIVE] Unify comp ref context design · 4917295b
      Zoe Liu authored
      This patch uses the neighboring ref counts to design the contexts
      for the coding of the first reference frame of a reference pair for
      the compound prediction. This aligns the context design with that
      for the second reference frame of a reference pair for the
      compound prediction.
      
      The new designed contexts are much simpler than that in the baseline.
      The number of contexts for each binary symbol is reduced from 5 to
      3. Further, the logic for each context only depends on the collected
      neighboring ref counts, which is straightforward to derive.
      
      The default CDFs for the first reference frame coding have been
      updated using aom_entropy_optimizer.
      
      Experimental results demonstrate a small coding gain for Google test
      sets of both lowres and midres, with 30 frames coded for the default
      coding tool setup:
      
      lowres: avg_psnr -0.077%; ovr_psnr -0.076%; ssim -0.106%
      midres: avg_psnr -0.059%; ovr_psnr -0.066%; ssim -0.037%
      
      BUG=aomedia:1356
      
      Change-Id: I781abbe4616dc3f3a7213ec663946ff9844eb830
      4917295b
  2. 11 Feb, 2018 7 commits
    • Jingning Han's avatar
      [NORMATIVE] Compound mode ref mv construction · e571f523
      Jingning Han authored
      Re-design the compound mode reference motion vector fetch. Use
      a single run to provide all the compound ref mvs. Save the potential
      additional two ref mv search runs on single reference frames.
      
      Tested on night_720p 50 frames at 800 kbps. The average time cost
      on find_mv_refs calls is reduced by 15% (average 69875 us ->
      60473 us). The overall compression performance change is less than
      0.01%.
      
      BUG=aomedia:1373
      
      Change-Id: I388b9cf36817d10613cd2c9d0bd8865b43324009
      e571f523
    • Yunqing Wang's avatar
      Refactor the ref_mv code · 7fcd0247
      Yunqing Wang authored
      Continued to refactor the reference MV search code, so that we can
      combine the compound ref mode search later.
      
      Change-Id: I6227a3142e82caa20f2a17a0c76147035eaa2129
      7fcd0247
    • Jingning Han's avatar
      [NORMATIVE opt-ref-mv] Rework mv fetch from diff ref frames · ff1a35b9
      Jingning Han authored
      This commit re-designs the reference motion vector fetching from
      spatial neighbors with different reference frames from a current
      coding block. Instead of re-running through the VP9 like reference
      motion vector search, it goes through the nearest top row and left
      column only. Such process kicks in, if and only if the regular
      reference frame match based mv search didn't find 2 or more mvs.
      The search through neighboring blocks with different reference
      frame will stop once 2 mvs are found.
      
      To decide the reference mvs, it compares the reference frame types
      from the two blocks. If they are on the same side, directly re-use
      it. Otherwise, reverse the sign of the motion vector.
      
      The compression performance change is in the noise range as
      0.03% down.
      
      BUG=aomedia:1372
      
      Change-Id: Ib698d7c463f2f42c767f6ca008c8a7c84289df60
      ff1a35b9
    • Peng Bin's avatar
      Add inv txfm2d 64 sse2 · a7ba23f6
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_32x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x16_sse2
      
      Change-Id: I1b27618f153583cc787e7bf6ef1616e7c6932990
      a7ba23f6
    • Peng Bin's avatar
      Add inv txfm2d {8x32,32x8} sse2 · 008c6430
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_8x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_32x8_sse2
      
      Change-Id: Ibd5de72e1d2c4dabba5af020a06e8cfac329dc3d
      008c6430
    • Peng Bin's avatar
      Code refactor of lowbd_inv_txfm2d_add sse2 · abd17171
      Peng Bin authored
      1. Reorder functions to align with TX_SIZE define order.
      2. Merge functions for each TX_SIZE which have very similar code
         into a universal function lowbd_inv_txfm2d_add_internal_sse2.
      3. No speed impact was spotted except size 8x8, so the 8x8 version
         stays unchanged.
      
      Change-Id: Ic896aacd93745906716582af855774807a863231
      abd17171
    • Jonathan Matthews's avatar
      Make av1_decode_tg_tiles_and_wrapup handle highbd · fa455fca
      Jonathan Matthews authored
      BUG=aomedia:1310
      
      Change-Id: Ibfa14836b1f80b54984b9d275f04ff842821cc6c
      fa455fca
  3. 10 Feb, 2018 11 commits
    • Johann's avatar
      misc: apply clang-format v5.0.0 · 3c30fb48
      Johann authored
      Change-Id: I4b60db5c43bb443ddd001ccb6601d1a7d825bfa6
      3c30fb48
    • Johann's avatar
      test: apply clang-format v5.0.0 · f152ff6a
      Johann authored
      Change-Id: Iee91f5f6314c43556791850db19687ccac14c8be
      f152ff6a
    • Johann's avatar
      av1/[common|decoder]: apply clang-format v5.0.0 · 6b41d4da
      Johann authored
      Change-Id: I86befaf7aa35f3f9b18618db1a27d191c1f7af36
      6b41d4da
    • Johann's avatar
      av1/encoder: apply clang-format v5.0.0 · b0ef6ff3
      Johann authored
      Change-Id: If88516ac3dcd72b528f4f7e27aab181a5137b285
      b0ef6ff3
    • Johann's avatar
      aom_dsp: apply clang-format v5.0.0 · e8c11385
      Johann authored
      Change-Id: I3733c974654712b3ca56f541bb642af9e8cdd504
      e8c11385
    • Johann's avatar
      update .clang-format for v5.0.0 · 1597876d
      Johann authored
      Change-Id: If9aababe5b92e8e1f4118cd46fa0eed3f0933175
      1597876d
    • Zoe Liu's avatar
      Hook in SSE2 inv txfms {8,16,32} · 2642647c
      Zoe Liu authored
      This CL hooks in the six new SSE2 inv txfm functions implemented by
      binpengsmail@gmail.com.
      
      A brief speed test shows that using the new SSE2 code for speed1,
      lowbd encoder speeds up ~18% and lowbd decoder speeds up ~13%,
      comparing to the highbd implementation in the baseline.
      
      Change-Id: I97769c2f44f7bffd86ffdce097cff3ca633b2644
      2642647c
    • Zoe Liu's avatar
      Remove unneeded param in single ref context · 50097b7a
      Zoe Liu authored
      Change-Id: I2684121c30c1b8d982f32d55e6897f44c4257334
      50097b7a
    • Peng Bin's avatar
      Inv_txfm unittest cosmetic changes · e193a70d
      Peng Bin authored
      Change-Id: I51df5a53bfa97ce69e6820e67df2cece3c0d5be5
      e193a70d
    • Debargha Mukherjee's avatar
      Reorganize code to test various convolve options · e820b820
      Debargha Mukherjee authored
      Reorganize code to faciliate setting rounding parameters based
      on bit-depth, and to faciliate testing.
      
      After this patch this wil be the behavior for config flags as far
      as round_0 and round_1 choices are concerned for 8- and 10-bit:
      
      0. CONFIG_LOWPRECISION_BLEND=0 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = None (baseline)
      
      1. CONFIG_LOWPRECISION_BLEND=0 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = None (to test impact of increase in precision
      of intermediate buffer)
      
      2. CONFIG_LOWPRECISION_BLEND=1 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = 4
      
      3. CONFIG_LOWPRECISION_BLEND=2 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = 5
      
      4. CONFIG_LOWPRECISION_BLEND=1 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = 6 (ARM proposal except clipping)
      
      5. CONFIG_LOWPRECISION_BLEND=2 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = 7 (Google variation proposal)
      
      Change-Id: I615348332f5692135352085ca923662f9d52f696
      e820b820
    • Angie Chiang's avatar
      TXMG cosmetic changes · 5d8e28e1
      Angie Chiang authored
      Change-Id: I344118e58acc6835df929cb7f7451cacf157d55b
      5d8e28e1
  4. 09 Feb, 2018 20 commits
    • Peng Bin's avatar
      Add av1_lowbd_inv_txfm2d_add_{16,32}_sse2 · 3285bc46
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_32x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_32x16_sse2
      
      Change-Id: I1b5dc29d0cf75d5d43f4869b729f480f03534ea9
      3285bc46
    • Joe Young's avatar
      [unit-test] Add unit test for intra-edge functions · 1124db22
      Joe Young authored
      av1_upsample_intra_edge_*
      av1_filter_intra_edge_*
      
      BUG=aomedia:1308
      
      Change-Id: I138f275174bd2df21f7480acb629dd85a3a3c44c
      1124db22
    • Jingning Han's avatar
      [NORMATIVE]Constrain mv reference within 64x64 block · 5e19c9da
      Jingning Han authored
      When the coding block size is above 64x64 size, only use the
      top-left 64x64 region to derive the reference motion vectors.
      
      BUG=aomedia:1365
      
      Change-Id: I7a0950168dbc886222697058dee105cf70d1c196
      5e19c9da
    • Jingning Han's avatar
      [NORMATIVE] mfmv extension border check at 64x64 · d41869a2
      Jingning Han authored
      Only use the mfmv reference within the same 64x64 block region.
      
      BUG=aomedia:1364
      
      Change-Id: Ia7a60cd81cb9ea0e60ae0edcbe40a43d55ebb0f3
      d41869a2
    • Jingning Han's avatar
      [NORMATIVE] Account all the newmv modes for compound mode context · 2d17ec66
      Jingning Han authored
      The compound mode context model depends on the number of 8x8
      blocks coded in the newmv mode under opt-ref-mv. This commit makes
      the codec account for all the newmv modes in both single and
      compound settings for that purpose. It only affects changes under
      opt-ref-mv.
      
      It improves the midres coding performance by 0.08%.
      
      BUG=aomedia:1358
      
      Change-Id: I0899cbb31e0001d958677128bcc94b063b449817
      2d17ec66
    • Yunqing Wang's avatar
      Modify reference MV function prototypes · 02efe6a2
      Yunqing Wang authored
      This patch modified the function prototypes in reference MV search,
      which prepared for the following change to combine the compound
      reference search and 2 single reference search together while the
      current block is compound mode. This patch doesn't cause normative
      bitstream change.
      
      Change-Id: I0d645983233753861d940b603d13957576ab51fb
      02efe6a2
    • Angie Chiang's avatar
      [Normative txmg]Use 12-bit cos_bit on inverse txfm · 463bd753
      Angie Chiang authored
      BUG=aomedia:1355
      
      Change-Id: If36663a335d1d6af57faf98ba70755af4c3d56ed
      463bd753
    • Zoe Liu's avatar
      Fix unittest on av1_lowbd_inv_txfm2d_add_{16,8}x{16,8}_sse2 · 87fc8f98
      Zoe Liu authored
      This CL is from binpengsmail@gmail.com.
      
      It addresses the comments for the CL
      https://aomedia-review.googlesource.com/c/aom/+/45861
      (1) Cosmetic changes;
      (2) Fix a unittest incorrect break.
      
      Change-Id: I6b8d9c26d46117d6c73485157a338226f46f6752
      87fc8f98
    • Angie Chiang's avatar
      Implement av1_lowbd_fwd_txfm2d_32x8_sse2 · 83b10cbe
      Angie Chiang authored
      Change-Id: Iaef8560a0d9862a65216da1de0d3c99a1ac5f40e
      83b10cbe
    • Angie Chiang's avatar
      Fix av1_lowbd_fwd_txfm2d_32x16_sse2 · e783cdcd
      Angie Chiang authored
      Let buf0's length be 32 so that flip_buf_sse2 can work properly
      This is not triggered because there is no flip adst32 txfm
      
      Change-Id: Ic6c5195d0fc70c5a8cb280cfb466a29cf39e7733
      e783cdcd
    • Angie Chiang's avatar
      Implement av1_lowbd_fwd_txfm2d_8x32_sse2 · bdfbcba4
      Angie Chiang authored
      Change-Id: I4a8680b2671308385fe027db7e03f68ddd622546
      bdfbcba4
    • Hui Su's avatar
      [NORMATIVE]Fix has_top_right() for 128x* blocks · ea190906
      Hui Su authored
      Before this fix, have_top_right for the complete right half of the
      sb128 is disabled.
      
      Borg test results don't show any compression changes. Probably 128x*
      blocks are very rarely chosen for intra modes.
      
      BUG=aomedia:1309
      
      Change-Id: I66a0573c029e7e3d440014842b5d031190d89f89
      ea190906
    • Debargha Mukherjee's avatar
      [NORMATIVE jnt_comp] remove double rounding · 53eaf8f2
      Debargha Mukherjee authored
      Change-Id: Ib325e33bee8aa3a8445a7f61c55adfd3fb210792
      53eaf8f2
    • David Barker's avatar
      [wedge/compound-segment, normative] Remove more rounding · 7dbb0051
      David Barker authored
      This reduces the overall rounding in the masked blend process -
      the result is now equivalent to having a single round operation
      at the end of the prediction process.
      
      This increases the range of the intermediate values inside
      aom_blend_a64_d32_mask() by 2 bits, but has no effect on the
      ranges of any values outside that function.
      
      Change-Id: I1010ed94c7d8db75bb3d8157c864c5527005725b
      7dbb0051
    • David Barker's avatar
      [wedge/compound-segment, normative] Reduce multiple rounding · d3b99738
      David Barker authored
      As described in the linked bug report, the masked blend operation
      contains multiple stages of rounding. This commit replaces one
      intermediate round with a right shift, which should be slightly
      faster and more accurate.
      
      BUG=aomedia:1292
      
      Change-Id: Ib24ce687e628b05d645fbde5306ee552f7ad876b
      d3b99738
    • Yaowu Xu's avatar
      aom_qm_ext: add signaling for separate QM for U/V · f7a12420
      Yaowu Xu authored
      Change-Id: I9879264011f6450bd2eb6648e39e9ad47f13a7d8
      f7a12420
    • Jingning Han's avatar
      Skip unnecessary single ref frame motion vector search · dccaf3f2
      Jingning Han authored
      Change the reference motion vector search order to do the compound
      mode first. Only proceed to do the single reference frame motion
      vector search, when the compound frame type has less than 2 motion
      vectors found. Tested on night_720p 800 kbps 50 frames, the decoding
      process for ref mv system is sped up by 2X.
      
      This is a non-normative change under opt-ref-mv flag.
      
      Change-Id: I579f81b156a506aa4481cf8ed85d8b1e54d9e481
      dccaf3f2
    • David Barker's avatar
      [NORMATIVE-DECODING, intra-edge] Fix bug in is_smooth() · a883e6ea
      David Barker authored
      Because the mbmi pointer passed into is_smooth comes from the above/left
      block, it might be an inter block. If this happens, we correctly deduce
      that the above/left block does not use a smooth intra mode.
      
      However, inter blocks do not set mbmi->uv_mode, so in the UV case we end up
      reading stale data. This may result in is_smooth() returning the wrong value,
      if (whatever was previously written into) mbmi->uv_mode happens to be a
      smooth intra mode.
      
      Fix this by including an explicit check for inter blocks.
      
      BUG=aomedia:1362
      
      Change-Id: I3ec9faef9b6297e22915176067b5704003bc4664
      a883e6ea
    • Dominic Symes's avatar
      [NORMATIVE-DECODING, txmg]: Adjust IADST4 · 6f81f128
      Dominic Symes authored
      Coefficients used for the VP9 version of IADST4 satisfy the
      mathematical relation sinpi[1]+sinpi[2]==sinpi[4] which is
      a trigonometric property useful for optimizing the four point
      transform. Unfortunately the change in shift used in the latest
      code means that rounding errors are introduced so that this identity
      no longer holds. We think the identity should be restored by
      rounding 4964.563 to 4964 rather than 4965. Similarly the identity
      has been fixed for other bit shifts in the table where it does not
      hold (even though these are not currently used I think).
      
      This change also fixes a bug with the range checking for IADST4.
      
      We see little change on the statistics from a local ACWY run
      on objective-1-fast I-frame only:
      
               PSNR  PSNR-HVS SSIM  PSNR-Cr APSNR APSNR-Cr MS-SSIM PSRNR-YUV
      Average  +0.01 +0.01    -0.02 -0.15   +0.01 -0.15    +0.01   +0.01
      
      BUG=aomedia:1360
      
      Change-Id: Icc09a1c59929e58bbc922d4b8b73de4a14104a8e
      6f81f128
    • Angie Chiang's avatar
      Turn on sse2 simd optimization · 7d8b13ed
      Angie Chiang authored
      Change-Id: Ia72e71a61cc48b97ef9596aaa6526381f9364f1a
      7d8b13ed