1. 11 Jul, 2013 2 commits
  2. 10 Jul, 2013 7 commits
    • Jingning Han's avatar
      Fix tx_type bug in intra4x4 rd loop · 18803f9c
      Jingning Han authored
      This commit fixed the mis-use of the tx_type for inverse transform
      in intra4x4 rate-distortion optimization loop. It improves the
      overall coding performance.
      
      Change-Id: I7fe9953175b74890357dbcee33c138573766e980
      18803f9c
    • Jim Bankoski's avatar
      remove warnings when NDEBUG is set · 6591cf2f
      Jim Bankoski authored
      Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
      6591cf2f
    • Deb Mukherjee's avatar
      Prunes out full-rd computation based on modeled rd · 53ff43ad
      Deb Mukherjee authored
      Adds a speed feature to eliminate full-rd computation if the modeled
      rd or rd based on a different parameter in the same mode is already
      a lot larger than the best rd yet.
      
      Specifically, only search the sharp and smooth filters if the modeled
      rd cost based on the  regular filter is within a certain factor of the
      best rd cost so far. Also, skip full-rd computation of non splitmv
      inter modes if the modeled rd cost based on pred error is within the
      same factor of the best rd cost so far.
      
      Also adds some enhancements in the rd search for splitmv mode to
      speed things up by early breakouts. Negligible impact on performance.
      
      Resuts on derfraw300:
      psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
               breakout feature on.
      speedup: 6% with splitmv enhancements, 20% with also residual breakout
               (tested on football sequence at 600 Kbps)
      
      Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
      53ff43ad
    • Ronald S. Bultje's avatar
      Remove memcpy() in handle_inter_mode() filter selection. · b1df674a
      Ronald S. Bultje authored
      Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
      
      Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
      b1df674a
    • Yaowu Xu's avatar
      Add a feature to reduce chrome intra mode search · bed27a96
      Yaowu Xu authored
      Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
      bed27a96
    • Jim Bankoski's avatar
      removing case statements around prediction entropy coding · fb027a76
      Jim Bankoski authored
      Removes SEG_ID
      Removes MBSKIP
      Removes SWITCHABLE_INTERP
      Removes INTRA_INTER
      Removes COMP_INTER_INTER
      Removes COMP_REF_P
      Removes SINGLE_REF_P1
      Removes SINGLE_REF_P2
      Removes TX_SIZE
      
      Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
      fb027a76
    • Yaowu Xu's avatar
      Revert "Remove memcpy() in handle_inter_mode() filter selection." · 205efbc1
      Yaowu Xu authored
      This reverts commit fcf7998a.
      
      Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
      205efbc1
  3. 09 Jul, 2013 1 commit
  4. 08 Jul, 2013 4 commits
    • Ronald S. Bultje's avatar
      Don't recalculate mv_ref costs for each block/partition. · 8fde07a3
      Ronald S. Bultje authored
      Changes cost_mv_ref() into doing a LUT into pre-calculated cost
      arrays instead. Encode time of first 50 frames of bus (speed 0)
      @ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.
      
      Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
      8fde07a3
    • Ronald S. Bultje's avatar
      Remove memcpy() in handle_inter_mode() filter selection. · fcf7998a
      Ronald S. Bultje authored
      Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
      
      Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
      fcf7998a
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
      ed995afb
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      
      Specifically the flags and their impact are:
      
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
      d9b62160
  5. 04 Jul, 2013 1 commit
  6. 03 Jul, 2013 3 commits
    • Jingning Han's avatar
      Enable early termination in rd search · 2bd6fe08
      Jingning Han authored
      This commit allows encoder to detect the cumulative rate-distortion
      cost per transformed block inside a partition. If the cumulative
      rd cost is already above the best rd value, it terminates the rest
      operations and continue to next prediction mode test.
      
      It reduces the runtime of bus at target bit-rate 2000 from 308 second
      to 266 second, i.e., about 13% speed-up at no performance penalty.
      
      Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
      2bd6fe08
    • Paul Wilkins's avatar
      Fix to comp_inter_joint_search_thresh feature. · f58b44ad
      Paul Wilkins authored
      When this is 0 (BLOCK_SIZE_AB4X4) we want to do
      the inter joint search for all sizes.
      
      Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
      f58b44ad
    • Paul Wilkins's avatar
      Added two new skip experiments. · 72c5778e
      Paul Wilkins authored
      sf->unused_mode_skip_lvl. Tests modes as normal for all
      sizes at or below the given level. At larger sizes it skips
      all modes that were not chosen at any smaller size.
      Hence setting BLOCK_SIZE_SB64X64 is in effect off.
      Setting BLOCK_SIZE_AB4X4 will only consider modes that
      were chosen for one or more 4x4 blocks at larger sizes.
      
      sf->reference_masking.
      Do a test encode of the NONE partition at one size and create
      a reference frame mask based on the best rd choice. In the
      full search only allow this reference frame.
      Currently it is testing 64x64 and repeats this in the full search.
      This does not work well with Jim's Partition code just now and
      is disabled by default.
      
      Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
      72c5778e
  7. 02 Jul, 2013 5 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
      be77f6bb
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
      37501d68
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Jingning Han's avatar
      Calculate rd cost per transformed block · b91a1586
      Jingning Han authored
      Compute the rate-distortion cost per transformed block, and cumulate
      the cost through all blocks inside a partition. This allows encoder
      to detect if the cumulative rd cost is already above the best rd cost,
      thereby enabling early termination in the rate-distortion optimization
      search.
      
      Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
      b91a1586
    • Paul Wilkins's avatar
      Revert "New motion threshold factor - speed feature." · b7cd01ed
      Paul Wilkins authored
      This reverts commit 13772781.
      Also fixes a spelling mistake.
      
      Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
      b7cd01ed
  8. 01 Jul, 2013 3 commits
    • Ronald S. Bultje's avatar
      Make get_coef_context() branchless. · 26b6318d
      Ronald S. Bultje authored
      This should significantly speedup cost_coeffs(). Basically what the
      patch does is to make the neighbour arrays padded by one item to
      prevent an eob check in get_coef_context(), then it populates each
      col/row scan and left/top edge coefficient with two times the same
      neighbour - this prevents a single/double context branch in
      get_coef_context(). Lastly, it populates neighbour arrays in pixel
      order (rather than scan order), so we don't have to dereference the
      scantable to get the correct neighbours.
      
      Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
      goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
      
      Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
      26b6318d
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
    • Paul Wilkins's avatar
      New motion threshold factor - speed feature. · 13772781
      Paul Wilkins authored
      Added a speed feature that focuses only on thresholds
      for new motion modes.
      
      Moved sf->comp_inter_joint_search_thresh into speed
      1.  This has ~+0.4% impact on quality at speed 0 as
      our quality reference baseline.
      
      Slight adjustment to baseline thresholds.
      
      Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
      13772781
  9. 29 Jun, 2013 1 commit
  10. 28 Jun, 2013 5 commits
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
    • Ronald S. Bultje's avatar
      Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a
      Ronald S. Bultje authored
      4x4: 234 -> 236 cycles
      8x8: 878 -> 888 cycles
      16x16: 3664 -> 3550 cycles
      32x32: 18134 -> 17392 cycles
      
      Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
      e3ce2b2a
    • Ronald S. Bultje's avatar
      Some minor optimizations for cost_coeffs(). · 91d223bd
      Ronald S. Bultje authored
      Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
      4x4: 298 -> 234 cycles
      8x8: 1227 -> 878 cycles
      16x16: 23426 -> 18134 cycles
      32x32: 4906 -> 3664 cycles
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
      from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
      
      Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
      91d223bd
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
    • Yaowu Xu's avatar
      Minor cleanups · 8b9eea0a
      Yaowu Xu authored
      Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
      8b9eea0a
  11. 27 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make intra predictor reference buffer configurable · 861cb06c
      Jingning Han authored
      This commit enables configurable reference buffer pointer for intra
      predictor. This allows later removal of spatial dependency between
      blocks inside a 64x64 superblock in the rate-distortion optimization
      loop.
      
      Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
      861cb06c
  12. 26 Jun, 2013 3 commits
    • Paul Wilkins's avatar
      Auto adapt step size feature. · 9f3ab834
      Paul Wilkins authored
      Also tweaks to other features and experiments with
      what is on and off at different speed settings.
      
      Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
      9f3ab834
    • Paul Wilkins's avatar
      Start adaptive threshold for each mode at max. · 689957e3
      Paul Wilkins authored
      Each frame we reset all adaptive thresholds to MAX
      rather than base. As modes are picked their thresholds
      drop down.
      
      Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
      689957e3
    • Paul Wilkins's avatar
      Change meaning of cpi->sf.first_step and rename. · e606cac0
      Paul Wilkins authored
      Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
      and changed its meaning such that it is a delta applied to
      reduce the default first step size (>> x) in the motion search
      rather than an absolute value.
      
      The default first step size is already changed according to the image
      dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
      now applies a further correction from the default.
      
      Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
      e606cac0
  13. 25 Jun, 2013 3 commits
  14. 21 Jun, 2013 1 commit