1. 10 Oct, 2017 1 commit
    • Rupert Swarbrick's avatar
      Add an SSE4.1 implementation of av1_convolve_2d_scale · 98dc22b8
      Rupert Swarbrick authored
      For large blocks this is almost 8x the speed of the C version. The
      code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD
      32-bit multiplies.
      This patch also makes av1_convolve_scale_test actually test something,
      making sure the optimised code matches the C version. The slightly
      excessive generality in the test (all the templating) is because of a
      following patch, which is for the high bit depth path and can then use
      most of the same test code.
      Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
  2. 09 Oct, 2017 11 commits
    • Angie Chiang's avatar
      Avoid updating non-used transform · ca8016ef
      Angie Chiang authored
      Since 32x32 transform use DCT only, we can avoid update other
      types of transform
      Change-Id: I51dd8ec71975187d249d7e25130e994a48cac5c1
    • Sarah Parker's avatar
      Change rectangular vartx recursion depth to 2 · d25ef8c6
      Sarah Parker authored
      0.15% improvement on lowres set
      Change-Id: If16a8e07797c64508f9e2d9b26ae874ac53c57a4
    • Rupert Swarbrick's avatar
      Catch invalid block sizes in bitstream · 415c8f1f
      Rupert Swarbrick authored
      There's a bitstream conformance requirement that says that any block
      must subsample to a valid block size with the current subsampling
      mode. For example, this means that BLOCK_4X8 is illegal if there is
      subsampling in only the horizontal direction (since there is no
      This patch checks the bitstream is conformant as it reads partition
      information in decodeframe.c
      Change-Id: I18139aa76d6f965282402edbb0b68959478a46c3
    • Urvang Joshi's avatar
      Revert wrong uses of TX_SIZE enum. · ab8840eb
      Urvang Joshi authored
      Introduced by: https://aomedia-review.googlesource.com/c/aom/+/25181
      Change-Id: I1f25178d6b273fbeade4c33f153b5f2bac4a8b99
    • Rupert Swarbrick's avatar
      Add av1_convolve_scale_test · 1ea7ab4e
      Rupert Swarbrick authored
      This unit test doesn't actually provide any test coverage and merely
      exists to benchmark the C function, av1_convolve_2d_scale_c. The
      following patch will add an SSE version of that function and extend
      this test to check that the SSE code matches the C code.
      Change-Id: Ic942ad8f9fd57d2659fc60f92c5a0b6c9a9f8cac
    • Debargha Mukherjee's avatar
      Enable 32x64 and 64x32 transforms · 1a86b013
      Debargha Mukherjee authored
      Change-Id: I73e9d2d327b062828a75bc99fb348441dd32174a
    • Debargha Mukherjee's avatar
      Resolve some static analysis warnings · e36a08c4
      Debargha Mukherjee authored
      Change-Id: Iaff923f34100ecdce76d2319fab67cde59d485ae
    • Cheng Chen's avatar
      Match braces in VIM for rdopt.c · 1483a714
      Cheng Chen authored
      Change-Id: I23344af711d9a31b819fca35ae3ad3b7edf4852e
    • Rupert Swarbrick's avatar
      Define block_signals_txsize function · fcff0b25
      Rupert Swarbrick authored
      This returns true if a block signals tx_size in the stream and uses it
      in the bitstream writing code and the decoder.
      Note that we can't quite use it in pack_inter_mode_mvs when
      CONFIG_VAR_TX && !CONFIG_RECT_TX but I've switched the code to using
      it the rest of the time since rect-tx is adopted and eventually the
      other code path should be deleted.
      Also use the helper function in tx_size_cost in rdopt.c, where the
      test was wrong and caused underestimates of block
      costs. (Specifically, the code that subtracts tx_size_cost from
      this_rate_tokenonly in rd_pick_intra_sby_mode ended up subtracting
      zero for a 4x8 block).
      The behaviour of the decoder should be unchanged. The only change in
      the encoder's behaviour should be in tx_size_cost where it should now
      match the rest of the code.
      Change-Id: I97236c9ce444993afe01ac5c6f4a0bb9e5049217
    • Zoe Liu's avatar
      Add experiment ext_skip · a3c5b9da
      Zoe Liu authored
      This coding tool is to introduce a new prediction mode for the
      bi-predictive frames that have a forward referernce within 2 frames
      away (distance denoted as 'fwd_delta'), and a backward reference,
      within (3-fwd_delta) frames away.
      If this prediction mode, namely 'ext_skip' is set, it will be coded
      using compound prediction with the most recent forward and backward
      reference frames as its reference pair, NEARESTMV as its motion mode,
      and the skip flag is set for the residue.
      Change-Id: I826034ccf1a956f4b350f0bc2e2dca8ea71b5197
    • Zoe Liu's avatar
      Add encoder/decoder support to frame_sign_bias · 17af2748
      Zoe Liu authored
      Frame sign bias value will not be signaled in frame header. Instead,
      the sign bias of reference frames are derived from their corresponding
      frame offsets at both encoder and decoder.
      The tool of 'frame_sign_bias' is dependent of 'frame_marker'. Compared
      against baseline, the enabling of both tools obtains a small coding gain
      of -0.08 ~ -0.11% in BDRate over Google lowres/midres tests.
      Change-Id: I8d85dc427ced0b2152712ccf61be4be6068075b9
  3. 08 Oct, 2017 10 commits
  4. 07 Oct, 2017 11 commits
    • Luc Trudeau's avatar
      [CFL] Support for 4:4:4 High Bit Depth · 69d9e878
      Luc Trudeau authored
      Change-Id: I13ba0dbe57297b540b78512d21a119f05a86a849
    • Luc Trudeau's avatar
      [CFL] Support for 4:2:0 High Bit Depth · 056d1f40
      Luc Trudeau authored
      high bit depth (_hbd) and low bit depth (_lbd) versions
      of the cfl functions: sum_above_row, sum_left_col,
      cfl_build_prediction, cfl_luma_subsampling_420 (4:4:4 will
      be added in subsequent commit) and cfl_alpha_dist. For
      cfl_alpha_dist, special care is given to scale the SSE
      according to the bit depth.
      Change-Id: I5b72845100d88fb8a438efe665bcae7fe1ba50b8
    • Urvang Joshi's avatar
      Add a macro SCALE_HORIZONTAL_ONLY. · e58b564d
      Urvang Joshi authored
      When enabled, scaling through resize and superres will occur only in the
      frame's width; the height will not be scaled.
      Macro is off by default.
      Change-Id: I501b2b0b2766aa4a86da5937b57c4d5aee4e34c4
    • Urvang Joshi's avatar
      inspect.c: Update to include new tx size enums. · b2752174
      Urvang Joshi authored
      Change-Id: I27292b7cdb27cec23754a6f017c5c7c55eb38bb5
    • Debargha Mukherjee's avatar
      Add cmake dependency constraint · c5600655
      Debargha Mukherjee authored
      ext-partition-types and supertx are incompatible
      Change-Id: I6c4cce16453cff13b0acbaad93dde7d089891038
    • Urvang Joshi's avatar
      FRAME_SUPERRES: Rework to use scale factor of 8/D · de71d142
      Urvang Joshi authored
      Earlier, the superres scale was in the form of:
      N/16, where N ranged from 8 to 16.
      We change this to the form:
      8/D, where D ranges from 8 to 16.
      This helps on the decoder side, by making it possible to work on 8x8
      blocks at a time.
      Change-Id: I6c72d4b3e8d1c830e61d4bb8d7f6337a100c3064
    • Urvang Joshi's avatar
      Superres: Remove unused variable. · e61923fa
      Urvang Joshi authored
      cm->superres_scale_numerator is used for both keyframes and
      non-keyframes, and is initialized from either
      oxcf->superres_scale_numerator or oxcf->superres_kf_scale_numerator as
      Change-Id: Ie46df576ef3830e181643ae591d836449a4bd38f
    • Rupert Swarbrick's avatar
      Upscale frame correctly in foreach_rtile_in_tile · b66894ac
      Rupert Swarbrick authored
      The restoration tiles (rtiles) divide the upscaled frame, not the
      encoded one.
      Change-Id: I2d08fe926d694fee7064461685289d3fd1c1de0c
    • Debargha Mukherjee's avatar
      Remove the speed optimization for rd_stats_stack · 9245d89d
      Debargha Mukherjee authored
      This optimization for speed was useful only when max tx-size
      was 32x32. However with tx64x64 this was breaking certain assumptions
      causing huge drops in coding efficiency. So I am removing this
      optimization for now. This can be brought back latger as a speed feature.
      The removal of this optimzation brings back the loss when 32x64
      and 64x32 transforms are used.
      Change-Id: I15987ea9ff53fa36a2962fe5f156c30a11e809ed
    • Joe Young's avatar
      [intra-edge] Pad intra edge samples to avoid valgrind warning · 7cfd5343
      Joe Young authored
      The SSE4 function filter_intra_edge_sse4_1() reads data slightly
      past the initialized part of the array. Those data are discarded
      later, but causes a valgrind warning. This change avoids the warning
      by initializing the array an extra +16 positions.
      Change-Id: Ib610492cff91492ae379c5d62895773f8747c4bc
    • Luc Trudeau's avatar
      [CFL] Extract sum top row and left col from custom DC_PRED · 13281d4f
      Luc Trudeau authored
      To simplify high bit depth commit, the summing the top row and the left
      column are extracted out of cfl_dc_pred. This does not change the
      Change-Id: I5c9fe91df4942f736c5af29c1d93abb3a6c8501f
  5. 06 Oct, 2017 7 commits
    • Jingning Han's avatar
      Rework key frame intra mode context model · a45d842d
      Jingning Han authored
      Reduce the context model size for key frame modes from 30240 bits
      to 4500 bits, i.e., less than 1/6 of the original context model.
      The coding performance loss on key frame is 0.14% for lowres and
      noise level difference for video sequence. The loss on key frame
      for midres is 0.05% and noise level for whole video. The change
      on hdres kf coding is 0.015%.
      Change-Id: I9e36825e5c5ee6ba35038c3ca349ad1ad3429910
    • Debargha Mukherjee's avatar
      Avoid large stack allocations · 5d108a36
      Debargha Mukherjee authored
      When ext-partition and ncobmc-adapt-weight is on, avoid too large
      stack allocations.
      Change-Id: I8db74e45cac80c4e5dfd9e20cfc73d9978d1578e
    • Angie Chiang's avatar
      Reduce division operations in update_scan_prob · bbcd8f76
      Angie Chiang authored
      Change-Id: I923931a9dbf828eb13670511852d55c953b479c1
    • Sebastien Alaiwan's avatar
      Avoid left shift of negative value · 92acb81b
      Sebastien Alaiwan authored
      This is undefined behaviour in C99 and could mislead the optimizer.
      This fixes the ubsan warning, and still generates optimal code
      (i.e an inlined 'sar' instruction).
      Change-Id: I36b20a6780532b8c9379b9fbfd970933d56b1bc5
    • Alexander Bokov's avatar
      Predict skip flag to speed up the TX type search · 8829a24d
      Alexander Bokov authored
      Average speed-up (lowres):
      low bitrates: 6.6%
      mid bitrates: 2.5%
      high bitrates: 0.0%
      Average PSNR loss:
      lowres: 0.010%
      midres: 0.005%
      Change-Id: Id34fb247e5e31f04ca324c58142e4b5ac4edacda
    • Yi Luo's avatar
      Lowbd SMOOTH_PRED intrapred ssse3 optimization · 46ae1ea3
      Yi Luo authored
      On i7-6700:
      Predictor    ssse3 v. C
      4x4          ~1.3x
      4x8          ~1.9x
      8x4          ~2.3x
      8x8          ~3.4x
      8x16         ~4.1x
      16x8         ~4.6x
      16x16        ~5.2x
      16x32        ~5.6x
      32x16        ~4.2x
      32x32        ~4.7x
      Change-Id: Ic12383cf9d4446361d6355eb8a480a3c7602060e
    • Sebastien Alaiwan's avatar
      Explicit requirement about sizeof(tran_low_t) · 698af562
      Sebastien Alaiwan authored
      Here, we're testing CONFIG_HIGHBITDEPTH but what we really depend upon
      is the actual size of the coefficients.
      Change-Id: I33d71e4b38b4b83bb4232346f4d449f20bcf740e