1. 07 Dec, 2017 12 commits
  2. 06 Dec, 2017 9 commits
    • Yunqing Wang's avatar
      Simplify warped motion parameter estimation · 763ccd8c
      Yunqing Wang authored
      The purpose of this change is to reduce the cycles needed for warped
      motion parameter estimation.
      Method 1:
      If we remove the 2-bit bit-depth reduction(as in patch set 2), the
      downshifting of A, Bx, By is also removed. The borg test result(over
      the baseline) is:
                   avg_psnr ovr_psnr  ssim
      lowres:      0.023     0.020    0.071
      cam_lowres: -0.009    -0.017   -0.031
      Method 2:
      In theory, the above change uses 2 more bits for elements of A, Bx,
      By. In patchset 3, we modified LS_STEP to be 8(1 full pixel), and now,
      the least 2 bits in A, Bx, By elements are always 0. Namely, 2-bit
      bit-depth reduction are achieved without extra operations. The borg
      test result(over the baseline) is:
      lowres:     -0.004    -0.007   -0.023
      cam_lowres: -0.031    -0.033   -0.045
      This is a little better than patch set 2 result.
      Method 2 is the final choice.
      Change-Id: I945aaba412e2ea86b7d67e8a90741fdf395b94cd
    • Zoe Liu's avatar
      Remove redundant check on single ref for motion mode · 70539b10
      Zoe Liu authored
      Change-Id: Ia8321afd087f99371cdf07f3a03249580e09964d
    • Zoe Liu's avatar
      JNT_COMP: Simplify logic on inter-inter comp modes · 5f11e915
      Zoe Liu authored
      This patch simplies the checking criteria for the two groups of
      compound modes. It also makes the encoder side cdf update inside the
      RD loop consistent with that in the bitstream.
      Experimental results on Google test sets (30 frames of lowres and
      midres) confirm this patch obtains identical coding performance.
      Change-Id: I170eea91f7d2be2170df544cfc2c692b09aa82d6
    • Yushin Cho's avatar
      Fix the comments on the precisions in quantization · 46ae3de1
      Yushin Cho authored
      Fix the comments on the precision of quantizers and tx coefficients
      during a quantization process for different input depth and tx size.
      I think the author really meant "de-quantized/de-coded coefficients" by
      "quantized/coded coefficients". So, made it clear to avoid any possible
      Change-Id: Ib92ac7dcfddcbe58cf3adfb9448497512381c1f5
    • Cheng Chen's avatar
      Add another if case for convolve_2d_copy_sse2 · 85c29ddc
      Cheng Chen authored
      Load four 8-bit input and process.
      Change-Id: I9b3ba58ea3a03c6a8129379afa37c54a57e04501
    • Sebastien Alaiwan's avatar
      mvref_common.c: reduce scope of locals · 62cc5859
      Sebastien Alaiwan authored
      Also, make them const when appropriate.
      Change-Id: I96d544e2cc9a0bce4d52fd33e44a4eaa40edda3c
    • Maxym Dmytrychenko's avatar
      AVX2 implementation for highbd_convolve_2d · 70e7613a
      Maxym Dmytrychenko authored
      Can be up to >10% faster with bit exact results
      Change-Id: I5f169673fd2d5af96f425f00d862f3c989228d2e
    • Urvang Joshi's avatar
      16x64 and 64x16 transforms: Reuse scan order, eob · 030cea9b
      Urvang Joshi authored
      16x64 reuses scan order of 16x32
      64x16 reuses scan order of 32x16
      max eob is curtailed to 512 (instead of 1024) for both.
      Change-Id: Iac2145aa5e3d090009e2a2f5715caa8d84dfb2ee
    • Zoe Liu's avatar
      Simply the code path when jnt-comp is off · 6fa05dcf
      Zoe Liu authored
      Change-Id: I17a82393f1b7230119f499e2f9ed8d0b8fe5ba25
  3. 05 Dec, 2017 19 commits
    • Luc Trudeau's avatar
      [CFL] Disable CfL for 4:1 and 1:4 Partitions · 4d6ea54e
      Luc Trudeau authored
      Moving CfL to using partition unit DC_PRED requires 4:1 and 1:4 DC_PRED,
      which are not currently implemented. A simple solution is to disable CfL
      for 4:1 and 1:4 partitions.
      CfL is also disabled for luma intra partitions < 4x4. This is inherent
      to luma intra prediction partition sizes. We add an assert to enforce
      Resulting in the following regression for Subset1
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      -0.0093 |  0.1803 |  0.1519 |  -0.0180 | 0.0256 |  0.0226 |     0.0352
      Change-Id: Ie2c8b4d9cb6b6f33a103b540209e1a2fb6df74a7
    • Angie Chiang's avatar
      Allow txk_sel to turn off optimize_b in rd loop · daccae3c
      Angie Chiang authored
      This is for speeding up the testing process
      Change-Id: I90866fa239794f14e4801675d471dbf50b779d18
    • Angie Chiang's avatar
      fh < fl --> fh <= fl in od_ec_encode_q15 · f8bf6bba
      Angie Chiang authored
      When lv_map_multi is on,
      od_ec_encode_q15 is able to handle the situation of fh == fl
      Change-Id: I7c837dda561f1d25b0203c018763dadd0cbbc75a
    • Cheng Chen's avatar
      Convolve copy function for jnt_comp · 3afe49ed
      Cheng Chen authored
      Added a copy function (c version and sse2 version) for full-pixel motion
      vectors for jnt_comp experiment following existing av1_convolve_2d_copy
      Change-Id: I20fd2219799f9c1451f591574fbe97364f40e0f0
    • Johann's avatar
      Partially revert "nasm defaults to -Ox" · f38fccee
      Johann authored
      The -Ox check in still useful to avoid the version of nasm distributed
      with Apple Xcode.
      This reverts commit 29b0c186.
      Change-Id: I9237791802267da708c3be8e5a83ca8d71e74afc
    • Sarah Parker's avatar
      Add macro to allow different tx sets for 16x16 · cec7ba10
      Sarah Parker authored
      This allows for the following options:
       Set 0:
              Inter: All 16 txfms
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1) +
                     1D Hor/vert DCT (2)
       Set 1:
              Inter: Discrete Trig transforms w/ flip (9) + Identity (1) +
                     1D Hor/Ver DCT (2)
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1)
       Set 2:
              Inter: Discrete Trig transforms w/ flip (9) + Identity (1)
              Intra: Discrete Trig transforms w/0 flip (4) + Identity (1)
      Results on lowres 40 frames with
      disable-ext-partition disable-ext-partition-types
      Set 0: 0.03%
      Set 1: No change
      Set 2: 0.06%
      Change-Id: Iec57d8c8fcfa0891528de4ca88f54753dfcb5284
    • Cyril Concolato's avatar
      Enable encode/decode of OBU streams without IVF · 6c788834
      Cyril Concolato authored
      Change-Id: Ieed4ecce63a2a3b2a74c40ccddabe91cb9386632
    • Debargha Mukherjee's avatar
      Zero out half of 16x64 and 64x16 transforms · 60586676
      Debargha Mukherjee authored
      Constrain 16x64 transform so that the bottom 16x32 is zero;
      constrain 64x16 transform so that the right 32x16 is zero;
      Also implement 32x64 transform better to reduce intermediate
      coefficient range.
      Change-Id: Ia9050ee741ed1d5b02a42616635b496d637d932f
    • Cheng Chen's avatar
      Change comp_group index context and save sending comp_group · 5a88172c
      Cheng Chen authored
      Extend context model for comp_group_idx.
      Save sending comp_group_idx when masked_compound is not allowed.
      Change-Id: Ia7ae53958c9e1c8fe07be4b14a425d9b8648082d
    • Cheng Chen's avatar
      JNT_COMP: change COMPOUND_AVERAGE in cdf · 2ef24ea2
      Cheng Chen authored
      Remove COMPOUND_AVERAGE from compound_type_cdfs since it is now grouped
      to compound_idx. However, COMPOUND_AVERAGE is still used elsewhere.
      Change-Id: Ie0d460aabf9252e80eb4130cfef9aaf0efc3969d
    • Cheng Chen's avatar
      JNT_COMP: divide compound modes into two groups · 33a13d9f
      Cheng Chen authored
      Divide compound inter prediction modes into two groups:
      Group A: jnt_comp, compound_average
      Group B: interintra, compound_segment, wedge
      Change-Id: I1142da2e3dfadf382d6b8183a87bde95119cf1b7
    • Timothy B. Terriberry's avatar
      daala_tx: Add SIMD version of the 16-point DCT · b0191d21
      Timothy B. Terriberry authored
      Change-Id: Ie3e599def556a90c474680567c4537508de2e30a
    • Nathan E. Egge's avatar
      daala_tx: New flattened 4-point Type-IV asym DST. · dc857d1b
      Nathan E. Egge authored
      This 4-point Type-IV asymmetric DST uses the same computation graph as
       the 4-point Type-IV DST.
      This change improves the accuracy of the 8-point Type-II DCT:
      Old MSE: 1.8927096972341813413041010372151e-06
      New MSE: 1.7946367518072710517065436117146e-06
      new_dst4@2017-12-04T06:31:41.096Z -> new_dst4a@2017-12-04T06:32:22.698Z
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0143 |  0.0410 | -0.2166 |  -0.0556 | -0.0379 | -0.0461 |    -0.0002
      Change-Id: Ifde11fca987220130c1657306b0df34ec2f3fe25
    • Nathan E. Egge's avatar
      daala_tx: New flattened 4-point Type-IV DST. · 4644a7d0
      Nathan E. Egge authored
      This change slightly improves the 16-point DCT round trip accuracy due
       to changes in the rounding.
      new_dst2@2017-12-04T01:59:57.412Z -> new_dst4@2017-12-04T06:31:41.096Z
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0078 | -0.0001 |  0.0198 |   0.0432 | 0.0408 |  0.0502 |    -0.0057
      Change-Id: I75783ace97834af89e70c9ce3002c6f09176e343
    • Nathan E. Egge's avatar
      daala_tx: New flattened 2-point Type-IV DST. · ef525df6
      Nathan E. Egge authored
      This 2-point Type-IV DST uses the same computation graph as the
       asymmetric 2-point Type-IV DST.
      Because this transform is embedded, it may be possible to remove the
       initial averaging step by splitting the 2-point Type-IV DST into
       separate forward and inverse transforms.
      This change also reduces two multiplication constants (forward and
       inverse transform) so they are less than 1.
      new_dst2a@2017-12-04T01:59:12.884Z -> new_dst2@2017-12-04T01:59:57.412Z
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0126 |  0.0387 |  0.0441 |   0.0554 | -0.0301 |  0.0034 |    -0.0342
      Change-Id: I98568e0c5b97e3a6af27653ddab845ce97d2a53d
    • Nathan E. Egge's avatar
      daala_tx: New flattened 2-point Type-IV asym DST. · 5b69b199
      Nathan E. Egge authored
      This change improves the accuracy of the 4-point Type-II DCT:
      Old MSE: 6.2711279572488185887270981198199e-08
      New MSE: 6.0281623825882593130347914239103e-08
      It also reduces a multiplication constant so it is less than 1.
      daala_tx@2017-12-04T01:58:11.321Z -> new_dst2a@2017-12-04T01:59:12.884Z
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0274 |  0.0255 |  0.0969 |  -0.0024 | 0.0274 |  0.0027 |     0.0110
      Change-Id: I7c9d389af8e98cb39f3bc5923134b5dfe174ba0a
    • Timothy B. Terriberry's avatar
      daala_tx: Add 8-wide 32-bit, 16-wide 16-bit SIMD · 695ebacf
      Timothy B. Terriberry authored
      We use the former variant for the 8-point row transforms when the
      number of columns exceeds 8, since the scaling can exceed 16 bits.
      We ues the latter variant for the 8-point column transforms when
      the number of rows exceeds 8, since it allows us to perform twice
      as many transforms in parallel.
      Change-Id: Ia2595ad827636342f70c3d5b99cf05c278bd1389
    • Timothy B. Terriberry's avatar
      daala_tx: Undo manual SIMD multiply expansion · 009946c8
      Timothy B. Terriberry authored
      On x86 there is no PMULHRSD for use in the 32-bit transform
      versions, so the fastest approach is to just do a normal 32-bit
      multiply and manually shift and round. This requires keeping the
      constants in their reduced precision instead of always promoting
      them to Q15.
      Change-Id: I76339b5567da3f08f34882a707e0c93122991946
    • Timothy B. Terriberry's avatar
      daala_tx: Make kernels reg- and word-size agnostic · 170c946e
      Timothy B. Terriberry authored
      This creates the mechanism by which we can define multiple versions
      for different instruction sets and word sizes.
      This commit makes no functional changes.
      Change-Id: If49ebfc989247692df9c501bea05eb811944d52a