1. 08 Jul, 2013 6 commits
    • Ronald S. Bultje's avatar
      Inline vp9_get_mv_joint(). · bd867f16
      Ronald S. Bultje authored
      Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
      prevent the call overhead.
      
      Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5
      bd867f16
    • Ronald S. Bultje's avatar
      Don't recalculate mv_ref costs for each block/partition. · 8fde07a3
      Ronald S. Bultje authored
      Changes cost_mv_ref() into doing a LUT into pre-calculated cost
      arrays instead. Encode time of first 50 frames of bus (speed 0)
      @ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.
      
      Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
      8fde07a3
    • Ronald S. Bultje's avatar
      Remove unnecessary memset(best_index, 0) from trellis/optimize. · 5a732549
      Ronald S. Bultje authored
      First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
      2min11.6, i.e. 0.75% overall speedup.
      
      Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
      5a732549
    • Ronald S. Bultje's avatar
      Remove memcpy() in handle_inter_mode() filter selection. · fcf7998a
      Ronald S. Bultje authored
      Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
      
      Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
      fcf7998a
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
      ed995afb
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      
      Specifically the flags and their impact are:
      
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
      d9b62160
  2. 05 Jul, 2013 1 commit
  3. 04 Jul, 2013 1 commit
  4. 03 Jul, 2013 17 commits
  5. 02 Jul, 2013 15 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
      be77f6bb
    • Dmitry Kovalev's avatar
      Adding write_selected_txfm_size function. · edb060a7
      Dmitry Kovalev authored
      Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
      edb060a7
    • Yaowu Xu's avatar
      Added a speed feature use_square_partition_only · 0d7b7c09
      Yaowu Xu authored
      This commit adds a speed feature where only squared partition are
      evaluated in partition picking. Enable this feature in cpu-used 2
      reduces encoding time by ~30%.
      
      loss of compression:
      -0.9% on cif set
      -1.23% on stdhd
      
      Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
      0d7b7c09
    • Ronald S. Bultje's avatar
      Use pmovmskb to skip quantize loops over empty coefficients. · e5fb4b61
      Ronald S. Bultje authored
      If none of the 16 coefficients that we quantize per loop iteration
      are larger than the zbin, directly skip to the next round of coeffs,
      rather than doing a full quantize loop that will eventually result
      in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
      32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
      no significantly positive results for smaller transforms, so is not
      used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).
      
      Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647
      e5fb4b61
    • Ronald S. Bultje's avatar
      Remove unused function vp9_build_inter4x4_predictors_mbuv(). · 5b872402
      Ronald S. Bultje authored
      Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
      5b872402
    • Jim Bankoski's avatar
      new unit test for cpu-speed · b0520b61
      Jim Bankoski authored
      Tests q0 ( lossless),  very high bitrate and low bitrates at cpu speed
      0, 1 and 2.
      
      Change-Id: I0c5cdca00acd8d01e7b13f124b3b08d4b1ae9f6d
      b0520b61
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
      37501d68
    • Deb Mukherjee's avatar
      66324d50
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Deb Mukherjee's avatar
      Clean-up in forward update to use mapping tables · 9c20cedd
      Deb Mukherjee authored
      Uses mapping tables instead of complicated modulo/division
      operations for prob mapping for forward updates.
      
      No bit-stream or output change.
      
      Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
      9c20cedd
    • Dmitry Kovalev's avatar
      904070ca
    • Ronald S. Bultje's avatar
      3cc6eb7c
    • Dmitry Kovalev's avatar
    • Dmitry Kovalev's avatar
      18fd4360
    • Dmitry Kovalev's avatar
      Removing unused implicit segmentation code. · a3d2e6c9
      Dmitry Kovalev authored
      Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
      a3d2e6c9