1. 31 Jan, 2018 18 commits
    • Hui Su's avatar
      one level less of tx size search for blocks larger than 64 · 7ed7e1fa
      Hui Su authored
      3~5% encoding speedup for speed 0; no quality loss.
      Change-Id: I0e31755f45253e5e99d8d9eed0d7a6fe6050f49f
    • Urvang Joshi's avatar
      Cleanup some fragile aspects of rd_pick_partition. · 00c6e6f7
      Urvang Joshi authored
      (1) Explicitly reset RD stats for each partition.
      PARTITION_SPLIT was the only one resetting the RD_STATS in 'sum_rdc'.
      But this was working because:
      - PARTITION_SPLIT was tried before VERT, HORZ, VERT_4 and HORZ_4; and
      - RD cost calculations in VERT, HORZ, VERT_4 and HORZ_4 partitions
      implicitly discarded existing value in sum_rdc
      However, that was very fragile; explicitly resetting the stats every
      time is much safer.
      (2) Using a separate variable 'temp_best_rd_cost' was fragile as someone
      may forget to update the same. So, we use best_rdc.rdcost directly.
      Change-Id: Icd75f25c34bb0f1806e691784648bcffce2417e6
    • Deepa K G's avatar
      AVX2 optimization of motion compensation functions · c8e0336a
      Deepa K G authored
      AVX2 implementation of av1_convolve_x_sr, av1_convolve_y_sr and
      av1_convolve_2d_sr have been added.
      Improvements have been made to av1_convolve_x_avx2, av1_convolve_y_avx2
      and av1_convolve_2d_avx2.
      Change-Id: I62a699dd9dcf42de94dd72cc2d43affc0dc31404
    • Tom Finegan's avatar
      Add information about extra CMake build flags to README.md · aa71f071
      Tom Finegan authored
      Change-Id: If9f944b58f23cdb71f919bd391f6b37e27b271f1
    • Angie Chiang's avatar
      Update adst4 range · 5d7c1fcc
      Angie Chiang authored
      Serialize the adst4 operations
      Update stage range accordingly
      Change the cos_bit precision accordingly.
      Correct 4x8/8x4 inv_start_range
      Change-Id: I10bc91585a61d790decdc24cb91659102e043620
    • David Barker's avatar
      [jnt-comp, normative] Avoid double-rounding in prediction · 39cf8061
      David Barker authored
      As per the linked bug report, the distance-weighted compound
      prediction has two separate round operations, first by 3
      bits (inside the various convolve functions), then by 10 bits
      (after the convolution functions).
      We can improve on this by right shifting by 3 bits inside the
      convolve functions - this is equivalent to doing a single round
      by 13 bits at the end.
      Note: In the encoder, when doing joint_motion_search(), we do
      things a bit differently: So that we can try modifying the two
      "sides" of the prediction independently, we predict each side as
      if it were a single prediction (including rounding), then blend
      these single predictions together.
      This is already an approximation to the "real" prediction, even
      in the non-jnt-comp case. So we leave that code path as-is.
      Change-Id: I9ad1fbcb3e12db2b5fc3c82b407f0fd9e6b39750
    • Johann's avatar
      BUG FIX: sse2 subpel variance is not PIC compliant · 0cf864fd
      Johann authored
      cherry-picked from libvpx:
        commit cb9f4dc1056b39383595f658cfcd166833bc0097
        Author: Scott LaVarnway <slavarnway@google.com>
        Date:   Sat Jan 13 07:01:04 2018 -0800
      Change-Id: Ie1736ea0787f4dad80204dcf5251fbb02d79541e
    • Imdad Sardharwalla's avatar
      Added HighBD support for mismatch debugging · 5b084ee1
      Imdad Sardharwalla authored
      Enabling CONFIG_MISMATCH_DEBUG with highbd streams was producing undefined
      behaviour. This patch adds support for highbd frames.
      Change-Id: I36ff4ddbb9b2e884e4a5b76485247a20b1f5db3c
    • Debargha Mukherjee's avatar
      Merge in STRIPED_LOOP_RESTORATION flag · 5105f7ac
      Debargha Mukherjee authored
      CONFIG_LOOP_RESTORATION still exists.
      Only CONFIG_STRIPED_LOOP_RESTORATION has been merged into
      CONFIG_LOOP_RESTORATION as always 1.
      Change-Id: I37d7a1fcd4cbb56e2fc037b1568ae63f72ed6a66
    • Sebastien Alaiwan's avatar
      Update configuration comment about LOWBITDEPTH · 1e3da463
      Sebastien Alaiwan authored
      The comment was misleading as the codec always supports 8-bit,
      regardless of the value of CONFIG_LOWBITDEPTH.
      This flag just enables the optimized-for-8-bits pipeline,
      without changing the actual YUV output.
      Change-Id: Ic2f041870acf4e2ee435021aa42e8f013ef52b78
    • Frederic Barbier's avatar
      Reduce scope of ctx derivation · 46475a30
      Frederic Barbier authored
      Change-Id: Ic8050cada6dc9dd14152da98ee21bb37042069e6
    • Jingning Han's avatar
      Conditionally skip transform block partition search · eb8f5e87
      Jingning Han authored
      Speed up recursive transform block partition search. When a txfm
      block is selected as all zero coefficients, skip the search over
      further split partition.
      Tested with txk-sel on, this makes the speed 0 / 1 both 10 - 15%
      faster at medium - high target bit-rate range. The coding
      performance change is neutral - 0.011% better for lowres set.
      Change-Id: I1247f3d5a33d15bf4bc5f0bcbac2bf1f3e1aca2e
    • David Barker's avatar
      dependent-horztilegroups: Fix decoder crash · 13025199
      David Barker authored
      The tg_horz_boundary flag should always be 0 for the topmost
      tile row, even when dependent-horztilegroups is enabled.
      Otherwise, we end up trying to fetch data off the top of the
      frame, which results in segfaults.
      Change-Id: I7caaa2b38a21c05ffb13b6c72f41f8f6e1982b69
    • Peng Bin's avatar
      Add aom_comp_mask_<upsampled>pred_ssse3 · 33ba1fe5
      Peng Bin authored
      1) For encoder speed, overall ~1% faster with no impact on coding performance.
      2) aom_comp_mask_pred_ssse3 is 3.5x - 6x faster than aom_comp_mask_pred_c
      3) aom_comp_mask_upsampled_pred_ssse3 1.5x - 3x faster than
      aom_comp_mask_upsampled_pred_c, for special case where subpel_x ==
      subpel_y == 0, optimized version achieves 4x - 7x speedup
      Unittest for both functions have been added.
      Change-Id: Ib498317975e0dbd9cdcf61be327b640dfac9a7e5
    • Yunqing Wang's avatar
      Remove frame counts in decoding coefs area · 1694a4ff
      Yunqing Wang authored
      Continued to remove count accumulation in decoder for decoder speedup.
      Change-Id: I9e3b874bfc5f750297070235bdfc4d71526ed665
    • Yunqing Wang's avatar
      Remove frame counts in decoder · e62feb65
      Yunqing Wang authored
      In the decode side, frame count accumulation is still existing. This
      patch removed part of them. More patch will follow. This should speed up
      the decoder.
      This doesn't change the encoder side since the counts are useful in
      some encoder optimizations.
      Change-Id: I91a021859f8d35e46618ea9232083e72a06431c8
    • Hui Su's avatar
      txk-sel: support the fast tx type search feature · 12049df7
      Hui Su authored
      Change-Id: Ib6b07f76dd702c40841c88457ca9d96083157354
    • Yaowu Xu's avatar
      Fix a command line help comment · bada8230
      Yaowu Xu authored
      Change-Id: I9b200d8cfb3ffcdd2fb1cece6c54a0f600d37a87
  2. 30 Jan, 2018 13 commits
  3. 29 Jan, 2018 6 commits
  4. 28 Jan, 2018 3 commits
    • David Michael Barr's avatar
      [CFL] Independent search termination for plane and sign · 2fae28b2
      David Michael Barr authored
      Stop if less than half of the iterations give improvement.
      Minor metric changes for a 2.5x speed up of the alpha search.
      Results on subset1:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0038 |  0.0466 |  0.1388 |  -0.0103 | -0.0312 | -0.0220 |     0.0330
      Change-Id: Ic25a995eee500ffc4b80b73635baf0a710954dc0
    • David Michael Barr's avatar
      [CFL] allow for 4:1 rects if full tx available · d27f1e61
      David Michael Barr authored
      Disable CFL sub8x8 validation in this case, as it appears to give
      false-negatives for 4:1 blocks. All other tests pass.
      The coding gain on subset1 is quite significant.
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1270 | -1.1386 | -1.1426 |  -0.1167 | -0.1157 | -0.1264 |    -0.4142
      Change-Id: Ic20c9b1a5ff28e0fbd4e6491ed2cd2d1f6b487c9
    • Yaowu Xu's avatar
      Avoid out of bound array access · 92245c87
      Yaowu Xu authored
      Change-Id: I4066561b769cf2bd4af515c9d351f609c08e3076