1. 26 Jan, 2018 9 commits
      minor reorder of operations · 30bf8713
      Yaowu Xu
      This also fixes several UBSan warnings.
      Change-Id: I4ea5f744c42983ea44c7cd6925555eab4938097c
      Fix loopfilter function usage · 31791278
      Yi Luo
      Here we should use aom_lpf_horizontal_16 function instead of
      aom_lpf_horizontal_16_dual function.
      aom_lpf_horizontal_16_dual works for two horizontal blocks,
      also fixed.
      Change-Id: Icc991d3f98bb182fa30497f120021aeb17839d21
      Adjust last odd row weight in fast_sgr · 127b562a
      Debargha Mukherjee
      Change-Id: I2348a7c6a3553bbbb0d061820a7c546a1a0367df
      Fix compile warning with mono-video disabled · 6cd8e177
      David Barker
      The variable 'num_planes' is only used when mono-video is enabled,
      so move it inside a #if CONFIG_MONO_VIDEO block
      Change-Id: I415f764b2629478edde579142b7242851991b1c0
      [seg] No need to decide temporal_update · e8d8879e
      Yushin Cho
      If error resilient mode is true, temporal update of seg_id is not used,
      thus don't need to decide seg->temporal_update flag by calling
      Change-Id: Ifb2271be53f1a6bc64f1196af5e7fbe46741fab0
      Skip txfm search · 3c22260b
      Cheng Chen
      Skip transform type search.
      Without txk_sel:
      Skip remaining transform type search when all transform blocks inside
      the coding block have eob = 0.
      With txk_sel:
      For each transform block, whenever eob = 0, we skip remaining
      transform type search.
      Speed impact:
      On low bitrate, 25% speed up.
      On high bitrate, 15-20% speed up.
      Performance impact: Google test lowres, 30 frames
      With txk_sel: 0.15% drop
      Without txk_sel: 0.30% drop
      Change-Id: I5e8db730a19feec22e378611046b1ce1ab001c85
      localize initialization of zero and max · 14b7967b
      Yaowu Xu
      This commit change the initialization of two constants to smaller
      scope, reducing the number of aligned parameters being passed into
      cfl_predict_hbd. This fixes the compiling issues with vs2015.
      Change-Id: Idd19e945ac6312654b7b0184fcbf65ca398c46ce
      Speed up av1_find_mv_refs() · b41ffb95
      Yunqing Wang
      av1_update_mv_context() is only used to provide compound_mode_context,
      which is the same as mode_context in find_mv_refs_idx(). This patch
      removes the calling of av1_update_mv_context() that takes 0.5% of the
      decoder time. This doesn't change bitstream.
      Change-Id: I6f0e082b237ff42c3b3e72361c46f98249ba07ab
      Remove const from parameter passed-by-value · 838ea62c
      Yaowu Xu
      This makes the usage of const consistent.
      Change-Id: I0ebf59842d8df234d0f4a91636b4bc2d6e9a6c81
  2. 25 Jan, 2018 20 commits
      Search the same set of neighbouring positions · 28f3fbf7
      Yunqing Wang
      This patch prepares for removing of av1_update_mv_context(). In
      av1_update_mv_context() and av1_find_mv_refs(), the neighbouring
      positions searched are not exactly the same. This patch fixes it.
      This causes bitstream chamges, but shouldn't affect the coding
      Change-Id: I59d2f8c318df388f2d06634cd96802b773c8bb13
      Add num_plane to av1_copy_tree_context() · 68377282
      Yaowu Xu
      To support monochrome video and fixes a nightly test segfault.
      Change-Id: I87dd3d5ca79e8f0ce51ee31738205ae5a53af072
      Re-enable the tx type pruning speed feature · 4e71fd94
      Hui Su
      Change-Id: I93702d24bf7d711b6910e2e502f9f97c661bcf6c
      [seg] Initialize temporal_update flag · b42e98de
      Yushin Cho
      Initialization has been nowhere done for seg->temporal_update.
      Change-Id: I3ccc0e10e14a83859b683c026093b921ea6d5dbf
      Add SSE4 implementation of 64-point transform · 5a06fe32
      Frank Bossen
      Can reduce decoder run time by 4 percent.
      Change-Id: Ibdd5bb3a18002789852f2e367b32533163a8c022
      Use meaningful names in txk-sel rd control · 66965a20
      Jingning Han
      Change-Id: I83ca47c1469d8e383a815058c02c4826c6282873
      Use safe soft quantization speed feature setup · 802eeaa8
      Jingning Han
      Change-Id: If8836621586ab5090affbb8d6d7b0be3a3e4cde8
      [intra, bugfix] Prevent overflow in DC_PRED · b844ee1b
      David Barker
      Commit https://aomedia-review.googlesource.com/c/aom/+/40541 replaced
      a division in the DC intra predictor by an approximate
      multiply+shift sequence.
      Unfortunately, this approximation is able to produce out-of-range
      values. For example, consider 4x8 DC_PRED, with bit depth = 10.
      If all of the context pixels are 0x3FF (the max value), then we get:
      sum = 12 * 0x3FF
      expected_dc = (sum * 0xAB) >> 11 = 1024 = 0x400
      This means that we need to insert a clip_pixel(_highbd) operation
      at the end of the DC prediction, to bring this value back in range.
      Change-Id: I9beb9ac8a4b39803865f7e23932402ecd1d6f672
      Remove mode_context calculation in find_mv_refs_idx() · 8152737f
      Yunqing Wang
      mode_context[ref_frame] is calculated in find_mv_refs_idx(), but is
      set to 0 in setup_ref_mv_list. Therefore, the calculation in
      find_mv_refs_idx() is not needed.
      Change-Id: I65ca06a2000278ad21c2eaa81eb12c48a7c1fcb8
      Do MV scaling on the fly for memory and run time reduction · 7b6bb947
      Frank Bossen
      This change is not normative and produces the same results as before.
      TPL_MV_REF data structure is about 5x smaller.
      Observed overall decoder run time reduction is about 4%.
      No observed change in encoder run time.
      Change-Id: Id68a492bac3bf28f48b7ceeedf85cd29981238ee
      Add obu_sizing experiment. · 41150ad4
      Tom Finegan
      Writes PRE_OBU_SIZE_BYTES (currently 4) bytes padded unsigned LEB128
      encoded integers in OBU size fields when enabled:
      $ cmake path/to/aom -DCONFIG_OBU=1 -DCONFIG_OBU_SIZING=1 && cmake --build .
      Requires CONFIG_OBU.
      Change-Id: I4d184ef0c8587d24e9c8c3e63237ea5003386c6a
      Give skip_mode priority over segmentation · b3bb318d
      Frederic Barbier
      Change-Id: I7612e379aa7c63da56e975e95cd7266cd1f8c68d
      Clean up and rework rates in motion_mode_rd() · c5024215
      Yue Chen
      Remove all *bmc variables, which were used to record basic motion
      search results (no advanced masked compound) when obmc and warped
      motion modes were allowed to work with compound ref.
      Remove switchable rate that is passed in to it, since in most
      motion modes, we need to recalculate the cost based on motion_mode
      and the refined mv. This change slightly improve the rd perf.
      Performance change: -0.024%
      Change-Id: I4afe0927e97cc7e7251022957f7665ed3032079c
      Simplify txfm table · 0c7b8d84
      Angie Chiang
      Instead of listing all possible stage_range,
      we use set_fwd_txfm_non_scale_range() to generate 2d stage_range
      from 1d stage_range.
      This will reduce the complexity of txfm table significantly.
      This is a lossless change.
      The coding performance isn't changed.
      The txfm config is exactly the same as it was before.
      Change-Id: Ibd1d9e53772bb928faaeecc98d81cbc8f38b27ed
      Refactor buf_offset in av1_inv_txfm2d.c · 0822557b
      Angie Chiang
      Change-Id: I73d1d15ab678242737432064d203c476057286ed
      Simplify context identification for coding ref frames · fa8bad19
      Zoe Liu
      This patch simply aggregates the checking on the counts of certain
      reference frames in the neighboring above and left blocks. It does
      not incur any coding performance change.
      Change-Id: I59a962ba95e7ab16731ce97371ec5709a582a0ba
      Move av1_search_txk_type() to rdopt.c · 4a5c6cf8
      Hui Su
      Change-Id: I4f9d014324b35e30f25cae5fa570620249640cf6
      Reduce the size of av1_prob_cost[] · c1cd5194
      Hui Su
      Only half of it was necessary.
      Change-Id: I0b5fc9ae6a17f5d812e10ee903a12f23f1377d8e
      Do not fail on deprecated --good option · 67adf42f
      Debargha Mukherjee
      Temporary quick fix for broken compatibility with testing
      Change-Id: I9af93690dd107fc79a79062f4d6ea7c53c8b4798
      Return int from av1_pack_bitstream(). · e4099e38
      Tom Finegan
      - Stop relying on asserts for error checking.
      - Update callers to check for and return errors where required.
      Change-Id: Id6a39b14397394b85aaa9dc8b168f7a26f04919b
  3. 24 Jan, 2018 11 commits
      Record total rate cost in trellis · 82775f61
      Cheng Chen
      Record total rate cost when computing trellis optimization.
      Reduce redundant rate computation in later stages.
      Speed impact: ~6% speed up
      Coding performance should not be affected.
      Change-Id: I9e940a2d126bb55930fcf22ea04d061eee1fc944
      Adding timing info to sequence headers · 28e9ce29
      Andrey Norkin
      Change-Id: I0fdb09499196e02709e067f690dff71146ee5114
      Added SSE4.1 and AVX2 implementations of FAST SGR. · 9d234571
      Imdad Sardharwalla
      The self-guided filter speed tests show that:
      - The SSE4.1 implementation of FAST SGR is ~35% faster than the corresponding
        implementation of SGR;
      - The AVX2 implementation of FAST SGR is ~28% faster than the corresponding
        implementation of SGR.
      Change-Id: Iecdc1f8cee79500084c71d06dbb02d804272aa99
      Add a config flag/code for fast sgr computation · ed5e9673
      Debargha Mukherjee
      Adds an experiment for fast sgr computation where for the r=2
      filter, computation of the A, B stats are computed for every
      other row and averaged in between.
      The motivation is to improve software performance with hopefully
      minimal loss.
      Change-Id: Ie36687826524dc18c1fbb7f6becff244187bf8da
      [loop-restoration, bugfix] Restrict sampling of deblocked pixels · dff901ff
      David Barker
      There is a special case with certain frame heights, where we
      end up with a loop restoration stripe which ends 1px above the
      crop border.
      Previously this case was handled in quite an ugly way, which also
      disagrees with the spec (+ isn't great for hardware). This patch
      changes things to match the spec.
      Specifically, the old method was to sometimes upscale one extra
      row of deblocked pixels so that we could always have a 2px
      "below" border for each processing stripe. The new method is to
      only use rows inside the crop border, and to duplicate them if
      Change-Id: Idf8ab510e1091dc3f5b257de60e16bca214d8dc4
      Remove deadline · 47cc2559
      Sean DuBois
      Change-Id: I9df343f4a6a809b09446ff1f2083c38771ab068b
      Set input_shift properly · 913867b4
      Yaowu Xu
      Profile 0 now supports 10 bit, therefore no longer means input_shift
      at 0.
      Change-Id: Idae429b88ee5c073ee6e939a88d569c5ffde2b0d
      Simplify cos_bit setting in txfm · d4327bce
      Angie Chiang
      Move cos_bit from txfm 1d cfg to 2d cfg
      Each txfm stage only uses one cos_bit
      This is a lossless change and it speeds up encoder by 2%
      Change-Id: I45d398761e4729b8c4c37729571fe3765cb0c83f
      Cleanup redundant assertion · dc3d916b
      Frederic Barbier
      Change-Id: I6532e20c958d5bf6f6d73a6f076664e1b74ba055
      Skip RD search over lst 2/3 frame for non-nearest neighbor mvs · 8db5f17b
      Jingning Han
      Skip the rate distortion search over last 2/3 reference frames for
      the reference motion vectors derived from non-nearest neighbors.
      The overall coding performance change is in the noise range - 0.05%
      better. Speed up the encoding process by 20%.
      Change-Id: I823b8ca2805ae332f4c9bc8ee255069a82db4331
      Use split and horz/vert to predict horzA/B/vertA/B · 6001fb05
      Zoe Liu
      In rd_pick_partition(), the first one or two blocks for the partition
      types HORZ_A, HORZ_B, VERT_A, and VERT_B may be already evaluated,
      during the evaluation of SPLIT, HORZ, and VERT. This patch saves the
      RD pick mode results and tries to reuse them to remove the duplicate
      RD mode evaluation operations.
      This patch should not incur any coding performance loss.
      Testing on a few lowres frames: when CFL is off, this patch obtains
      >10% encoder speedup.
      Change-Id: I932e233bc93873de62a88230254df44494236dde