1. 31 Jan, 2018 7 commits
    • Jingning Han's avatar
      Conditionally skip transform block partition search · eb8f5e87
      Jingning Han authored
      Speed up recursive transform block partition search. When a txfm
      block is selected as all zero coefficients, skip the search over
      further split partition.
      
      Tested with txk-sel on, this makes the speed 0 / 1 both 10 - 15%
      faster at medium - high target bit-rate range. The coding
      performance change is neutral - 0.011% better for lowres set.
      
      Change-Id: I1247f3d5a33d15bf4bc5f0bcbac2bf1f3e1aca2e
      eb8f5e87
    • David Barker's avatar
      dependent-horztilegroups: Fix decoder crash · 13025199
      David Barker authored
      The tg_horz_boundary flag should always be 0 for the topmost
      tile row, even when dependent-horztilegroups is enabled.
      Otherwise, we end up trying to fetch data off the top of the
      frame, which results in segfaults.
      
      BUG=aomedia:1252
      
      Change-Id: I7caaa2b38a21c05ffb13b6c72f41f8f6e1982b69
      13025199
    • Peng Bin's avatar
      Add aom_comp_mask_<upsampled>pred_ssse3 · 33ba1fe5
      Peng Bin authored
      1) For encoder speed, overall ~1% faster with no impact on coding performance.
      2) aom_comp_mask_pred_ssse3 is 3.5x - 6x faster than aom_comp_mask_pred_c
      3) aom_comp_mask_upsampled_pred_ssse3 1.5x - 3x faster than
      aom_comp_mask_upsampled_pred_c, for special case where subpel_x ==
      subpel_y == 0, optimized version achieves 4x - 7x speedup
      
      Unittest for both functions have been added.
      
      Change-Id: Ib498317975e0dbd9cdcf61be327b640dfac9a7e5
      33ba1fe5
    • Yunqing Wang's avatar
      Remove frame counts in decoding coefs area · 1694a4ff
      Yunqing Wang authored
      Continued to remove count accumulation in decoder for decoder speedup.
      
      Change-Id: I9e3b874bfc5f750297070235bdfc4d71526ed665
      1694a4ff
    • Yunqing Wang's avatar
      Remove frame counts in decoder · e62feb65
      Yunqing Wang authored
      In the decode side, frame count accumulation is still existing. This
      patch removed part of them. More patch will follow. This should speed up
      the decoder.
      
      This doesn't change the encoder side since the counts are useful in
      some encoder optimizations.
      
      Change-Id: I91a021859f8d35e46618ea9232083e72a06431c8
      e62feb65
    • Hui Su's avatar
      txk-sel: support the fast tx type search feature · 12049df7
      Hui Su authored
      Change-Id: Ib6b07f76dd702c40841c88457ca9d96083157354
      12049df7
    • Yaowu Xu's avatar
      Fix a command line help comment · bada8230
      Yaowu Xu authored
      BUG=aomedia:1283
      
      Change-Id: I9b200d8cfb3ffcdd2fb1cece6c54a0f600d37a87
      bada8230
  2. 30 Jan, 2018 13 commits
  3. 29 Jan, 2018 6 commits
  4. 28 Jan, 2018 3 commits
    • David Michael Barr's avatar
      [CFL] Independent search termination for plane and sign · 2fae28b2
      David Michael Barr authored
      Stop if less than half of the iterations give improvement.
      
      Minor metric changes for a 2.5x speed up of the alpha search.
      
      Results on subset1:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0038 |  0.0466 |  0.1388 |  -0.0103 | -0.0312 | -0.0220 |     0.0330
      
      Change-Id: Ic25a995eee500ffc4b80b73635baf0a710954dc0
      2fae28b2
    • David Michael Barr's avatar
      [CFL] allow for 4:1 rects if full tx available · d27f1e61
      David Michael Barr authored
      Disable CFL sub8x8 validation in this case, as it appears to give
      false-negatives for 4:1 blocks. All other tests pass.
      
      The coding gain on subset1 is quite significant.
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1270 | -1.1386 | -1.1426 |  -0.1167 | -0.1157 | -0.1264 |    -0.4142
      
      Change-Id: Ic20c9b1a5ff28e0fbd4e6491ed2cd2d1f6b487c9
      d27f1e61
    • Yaowu Xu's avatar
      Avoid out of bound array access · 92245c87
      Yaowu Xu authored
      Change-Id: I4066561b769cf2bd4af515c9d351f609c08e3076
      92245c87
  5. 27 Jan, 2018 1 commit
  6. 26 Jan, 2018 10 commits
    • Thomas Daede's avatar
      Add CDF_STORAGE_REDUCTION experiment flag. · 5f0c41de
      Thomas Daede authored
      Change-Id: I8ce208e842b738bb729d5732f0f35366c3549063
      5f0c41de
    • Hui Su's avatar
      Fix memory overflow in av1_cdef_search() · 4d24a989
      Hui Su authored
      The logic to check for frame boundary needs to take 128 superblock size
      into account when ext-partition is on.
      
      BUG=aomedia:1268
      
      Change-Id: I40d2128d5ab46d57ecab9c9ecbef122005fe4b11
      4d24a989
    • Frank Bossen's avatar
      Speed up SSE4 implementation of 64-point inverse transform · c2368362
      Frank Bossen authored
      Avoid unnecessary computations knowing that only the lower
      frequency 32x32 quadrant has nonzero values.
      
      Runs about 2x faster
      
      Change-Id: Ie86f56ccdce917e30b594253f10e121b4dcb0abc
      c2368362
    • Maxym Dmytrychenko's avatar
      SSE2 optimizations for _6/_16 lowbd lpf functions · ae6e6bc1
      Maxym Dmytrychenko authored
      Includes vertical and horizontal implementations
      and to fix 5/13 TAPs/Parallel deblocking support.
      
      Re-working internals of the filters for better
      re-usage across different sizes.
      
      Tests are enabled.
      
      Performance changes, SSE2 over C:
      Horizontal methods: up to    3-4x
      Vertical   methods: up to 1.5x-2x
      
      Change-Id: I2e36035355d8c23c1d4b0d59d0e23f598e9d0e3f
      ae6e6bc1
    • Angie Chiang's avatar
      Add get_txw/h_idx functions · 29d2f21e
      Angie Chiang authored
      Change-Id: Ibace8208109068aae1e93275d28ab8bd8e58c529
      29d2f21e
    • Sebastien Alaiwan's avatar
      av1_rtcd_defs: fix formatting · f4123630
      Sebastien Alaiwan authored
      Change-Id: Ic4464eab6bedb18451f3506d1d58258f9fa64985
      f4123630
    • Sebastien Alaiwan's avatar
      Remove DAALA_TX experiment · 5859636f
      Sebastien Alaiwan authored
      This experiment has been abandonned for AV1.
      
      Change-Id: Ief8ed6a51a5e7bac17838ebb7a88d88bbf90a96f
      5859636f
    • Jingning Han's avatar
      Properly reset the skip_mode element in mb_mode_info · 3da65bff
      Jingning Han authored
      The skip_mode element might re-use prior frame's coding decision
      for a current coding block rate-distortion search. Properly reset
      it to be zero for regular rate-distortion mode search.
      
      This improves the coding performance for ext-skip by 0.07% for
      lowres.
      
      Change-Id: Idbda5b441e3eb844e03ca07bd174b4b7f8a7cb59
      3da65bff
    • Yaowu Xu's avatar
      minor reorder of operations · 30bf8713
      Yaowu Xu authored
      This also fixes several UBSan warnings.
      
      Change-Id: I4ea5f744c42983ea44c7cd6925555eab4938097c
      30bf8713
    • Yi Luo's avatar
      Fix loopfilter function usage · 31791278
      Yi Luo authored
      Here we should use aom_lpf_horizontal_16 function instead of
      aom_lpf_horizontal_16_dual function.
      
      aom_lpf_horizontal_16_dual works for two horizontal blocks,
      also fixed.
      
      Change-Id: Icc991d3f98bb182fa30497f120021aeb17839d21
      31791278