1. 08 Dec, 2017 10 commits
    • Cheng Chen's avatar
      LPF_SB: select filter level and apply for superblock · ebcee0b6
      Cheng Chen authored
      For each superblock, select the best deblocking filter level and apply
      filtering. The filter level is signaled to decoder using a delta based
      scheme.
      
      Change-Id: I53e32589cabac9e2a4e580808fdd39ac878fe8c6
      ebcee0b6
    • Sarah Parker's avatar
      Remove bands from new-quant profiles · 6b56e99c
      Sarah Parker authored
      Rather than having a set of parameters for each of the 6
      COEF_BANDS, we have 1 for DC and 1 for AC coefficients.
      No change in performance since all of the bands had the
      same parameters.
      
      Change-Id: I3665e7c1b21f117be776f371d87d64b097715735
      6b56e99c
    • Sebastien Alaiwan's avatar
      Avoid memset when possible · 8a3d80eb
      Sebastien Alaiwan authored
      Also, reduce scope of one local.
      
      Change-Id: I41cb53528d4b7bc88eb343d8c943ed241230af82
      8a3d80eb
    • Cheng Chen's avatar
      Optimize av1_jnt_convolve_2d_copy function · 3f2b57d8
      Cheng Chen authored
      With shift, convolve copy no longer needs 32-bit multiplication of
      two 8-bit numbers. Thus we can implement it with sse2 instead of
      sse4.
      
      Change-Id: I63e8ba414383a24f820bad4a6c607f222ec40ec2
      3f2b57d8
    • Zoe Liu's avatar
      Enable single/comp ref mode for all qualified inter frames · 9ad440f5
      Zoe Liu authored
      Change-Id: I72ae23a60f79256b207753c429c3fecf4db6bd38
      9ad440f5
    • Debargha Mukherjee's avatar
      Misc refactors to support 4:1->2:1->1:1 tx splits · 0fa057f8
      Debargha Mukherjee authored
      Currently 4:1 transforms have max 2 split levels:
      4:1 -> 1:1 -> 0.5:0.5.
      
      This refactor enables split levels:
      4:1 -> 2:1 -> 1:1,
      
      by simply changing the tables in common_data.h.
      
      The actual switch will be made in a subsequent patch.
      
      Change-Id: I33f8d9ca5159ba3e7d02ced449ddf6f804a8f12a
      0fa057f8
    • Nathan E. Egge's avatar
      daala_tx: New flattened 16-point Type-IV DST. · 37131cfd
      Nathan E. Egge authored
      Change-Id: Ic741f269d0bd5e5e295b55f95bfef05050bc31e5
      37131cfd
    • David Barker's avatar
      no-frame-context-signaling + q-adapt-probs: Fix interaction · 11eac7bf
      David Barker authored
      Slightly change the way we save and reload frame contexts during
      frame setup. For "normal" frames everything is the same, but for
      error-resilient and/or intra-only frames, we now:
      
      * Reset the frame context using setup_past_independence()
        (+ extra code if q-adapt-probs is enabled), as usual
      * Store this frame context into a special slot in cm->frame_contexts
      * Use that slot to fill in cm->pre_fc
      
      The main difference from before is that (for error-resilient/intra-only
      frames which are not key frames) we used to throw away the frame
      context after setting it up, and would re-use whatever was set up
      at the last keyframe.
      This was fine when q_adapt_probs is disabled, but it caused an
      inconsistency when combined with q_adapt_probs. See the attached
      bug report for more details on that.
      
      BUG=aomedia:1104
      
      Change-Id: I9532b6b0e8ae29efbb4f059a0c67a73d7c7828ce
      11eac7bf
    • Nathan E. Egge's avatar
      daala_tx: New flattened 32-point Type-II DCT. · b9e16f2f
      Nathan E. Egge authored
      subset-1:
      
      daala_tx@2017-12-07T22:33:52.954Z -> new_dct32@2017-12-07T22:34:37.310Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0073 | -0.0284 | -0.1499 |  -0.0561 | -0.0128 | -0.0279 |    -0.0386
      
      Change-Id: Ib01f21aa9fc3c95f9d820331b43e70410de99958
      b9e16f2f
    • Jingning Han's avatar
      Constrain hash table access within tile boundary · 3de5353c
      Jingning Han authored
      Limit the prediction residual hash table access within the same
      tile. This resolves a data race issue across tiles in multi-threading
      encoding that triggers instable encoding results.
      
      BUG=aomedia:1088
      
      Change-Id: Ia4a415a0750bd60ee8ac4e56aa1cd39ec99e19c7
      3de5353c
  2. 07 Dec, 2017 14 commits
  3. 06 Dec, 2017 9 commits
    • Yunqing Wang's avatar
      Simplify warped motion parameter estimation · 763ccd8c
      Yunqing Wang authored
      The purpose of this change is to reduce the cycles needed for warped
      motion parameter estimation.
      
      Method 1:
      If we remove the 2-bit bit-depth reduction(as in patch set 2), the
      downshifting of A, Bx, By is also removed. The borg test result(over
      the baseline) is:
                   avg_psnr ovr_psnr  ssim
      lowres:      0.023     0.020    0.071
      cam_lowres: -0.009    -0.017   -0.031
      
      Method 2:
      In theory, the above change uses 2 more bits for elements of A, Bx,
      By. In patchset 3, we modified LS_STEP to be 8(1 full pixel), and now,
      the least 2 bits in A, Bx, By elements are always 0. Namely, 2-bit
      bit-depth reduction are achieved without extra operations. The borg
      test result(over the baseline) is:
      lowres:     -0.004    -0.007   -0.023
      cam_lowres: -0.031    -0.033   -0.045
      This is a little better than patch set 2 result.
      
      Method 2 is the final choice.
      
      Change-Id: I945aaba412e2ea86b7d67e8a90741fdf395b94cd
      763ccd8c
    • Zoe Liu's avatar
      Remove redundant check on single ref for motion mode · 70539b10
      Zoe Liu authored
      Change-Id: Ia8321afd087f99371cdf07f3a03249580e09964d
      70539b10
    • Zoe Liu's avatar
      JNT_COMP: Simplify logic on inter-inter comp modes · 5f11e915
      Zoe Liu authored
      This patch simplies the checking criteria for the two groups of
      compound modes. It also makes the encoder side cdf update inside the
      RD loop consistent with that in the bitstream.
      
      Experimental results on Google test sets (30 frames of lowres and
      midres) confirm this patch obtains identical coding performance.
      
      Change-Id: I170eea91f7d2be2170df544cfc2c692b09aa82d6
      5f11e915
    • Yushin Cho's avatar
      Fix the comments on the precisions in quantization · 46ae3de1
      Yushin Cho authored
      Fix the comments on the precision of quantizers and tx coefficients
      during a quantization process for different input depth and tx size.
      
      I think the author really meant "de-quantized/de-coded coefficients" by
      "quantized/coded coefficients". So, made it clear to avoid any possible
      misunderstandings.
      
      Change-Id: Ib92ac7dcfddcbe58cf3adfb9448497512381c1f5
      46ae3de1
    • Cheng Chen's avatar
      Add another if case for convolve_2d_copy_sse2 · 85c29ddc
      Cheng Chen authored
      Load four 8-bit input and process.
      
      Change-Id: I9b3ba58ea3a03c6a8129379afa37c54a57e04501
      85c29ddc
    • Sebastien Alaiwan's avatar
      mvref_common.c: reduce scope of locals · 62cc5859
      Sebastien Alaiwan authored
      Also, make them const when appropriate.
      
      Change-Id: I96d544e2cc9a0bce4d52fd33e44a4eaa40edda3c
      62cc5859
    • Maxym Dmytrychenko's avatar
      AVX2 implementation for highbd_convolve_2d · 70e7613a
      Maxym Dmytrychenko authored
      Can be up to >10% faster with bit exact results
      
      Change-Id: I5f169673fd2d5af96f425f00d862f3c989228d2e
      70e7613a
    • Urvang Joshi's avatar
      16x64 and 64x16 transforms: Reuse scan order, eob · 030cea9b
      Urvang Joshi authored
      16x64 reuses scan order of 16x32
      64x16 reuses scan order of 32x16
      
      max eob is curtailed to 512 (instead of 1024) for both.
      
      Change-Id: Iac2145aa5e3d090009e2a2f5715caa8d84dfb2ee
      030cea9b
    • Zoe Liu's avatar
      Simply the code path when jnt-comp is off · 6fa05dcf
      Zoe Liu authored
      Change-Id: I17a82393f1b7230119f499e2f9ed8d0b8fe5ba25
      6fa05dcf
  4. 05 Dec, 2017 7 commits
    • Luc Trudeau's avatar
      [CFL] Disable CfL for 4:1 and 1:4 Partitions · 4d6ea54e
      Luc Trudeau authored
      Moving CfL to using partition unit DC_PRED requires 4:1 and 1:4 DC_PRED,
      which are not currently implemented. A simple solution is to disable CfL
      for 4:1 and 1:4 partitions.
      
      CfL is also disabled for luma intra partitions < 4x4. This is inherent
      to luma intra prediction partition sizes. We add an assert to enforce
      this.
      
      Resulting in the following regression for Subset1
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      -0.0093 |  0.1803 |  0.1519 |  -0.0180 | 0.0256 |  0.0226 |     0.0352
      
      https://two.arewecompressedyet.com/?job=CfL%402017-11-30T19%3A05%3A05.639Z&job=CfL-Disable-4to1%402017-11-30T19%3A04%3A00.761Z
      
      Change-Id: Ie2c8b4d9cb6b6f33a103b540209e1a2fb6df74a7
      4d6ea54e
    • Angie Chiang's avatar
      Allow txk_sel to turn off optimize_b in rd loop · daccae3c
      Angie Chiang authored
      This is for speeding up the testing process
      
      Change-Id: I90866fa239794f14e4801675d471dbf50b779d18
      daccae3c
    • Angie Chiang's avatar
      fh < fl --> fh <= fl in od_ec_encode_q15 · f8bf6bba
      Angie Chiang authored
      When lv_map_multi is on,
      od_ec_encode_q15 is able to handle the situation of fh == fl
      
      Change-Id: I7c837dda561f1d25b0203c018763dadd0cbbc75a
      f8bf6bba
    • Cheng Chen's avatar
      Convolve copy function for jnt_comp · 3afe49ed
      Cheng Chen authored
      Added a copy function (c version and sse2 version) for full-pixel motion
      vectors for jnt_comp experiment following existing av1_convolve_2d_copy
      function.
      
      Change-Id: I20fd2219799f9c1451f591574fbe97364f40e0f0
      3afe49ed
    • Johann's avatar
      Partially revert "nasm defaults to -Ox" · f38fccee
      Johann authored
      The -Ox check in still useful to avoid the version of nasm distributed
      with Apple Xcode.
      
      This reverts commit 29b0c186.
      
      Change-Id: I9237791802267da708c3be8e5a83ca8d71e74afc
      f38fccee
    • Sarah Parker's avatar
      Add macro to allow different tx sets for 16x16 · cec7ba10
      Sarah Parker authored
      This allows for the following options:
       Set 0:
              Inter: All 16 txfms
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1) +
                     1D Hor/vert DCT (2)
       Set 1:
              Inter: Discrete Trig transforms w/ flip (9) + Identity (1) +
                     1D Hor/Ver DCT (2)
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1)
       Set 2:
              Inter: Discrete Trig transforms w/ flip (9) + Identity (1)
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1)
      
      Results on lowres 40 frames with
      disable-ext-partition disable-ext-partition-types
      
      Set 0: 0.03%
      Set 1: No change
      Set 2: 0.06%
      
      Change-Id: Iec57d8c8fcfa0891528de4ca88f54753dfcb5284
      cec7ba10
    • Cyril Concolato's avatar
      Enable encode/decode of OBU streams without IVF · 6c788834
      Cyril Concolato authored
      Change-Id: Ieed4ecce63a2a3b2a74c40ccddabe91cb9386632
      6c788834