1. 22 Jul, 2013 2 commits
    • Jingning Han's avatar
      Optimize operation flow in sub8x8 rd loop · 409e77f2
      Jingning Han authored
      Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
      the encoder to skip the forward transform, quantization, and coeff cost
      estimation, in the sub8x8 rd optimization search, if the motion
      vector(s) are of integer pixel value, and have been tested in the
      previous prediction filter type rd loops of the same block.
      This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
      Its efficacy depends how frequently the motion search will select an
      integer motion vector.
      Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
    • Paul Wilkins's avatar
      Re-order mode search in rd. · 1d189d64
      Paul Wilkins authored
      Mode search order in rd loop changed to better reflect
      observed hit counts.
      Also some adjustment of the baseline mode rd thresholds
      to reflect the order change and observed frequencies.
      Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
  2. 21 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip buffer update in sub8x8 rd loop · c725502b
      Jingning Han authored
      This commit allows the encoder to skip a few buffer update steps in
      rd_pick_best_mbsegmentation, when early breakout has been triggered
      in the rd_check_segment_txsize. It provides about 1% speed-up for
      bus_cif at 2000 kbps, in the settings of speed 0.
      Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
  3. 19 Jul, 2013 4 commits
  4. 18 Jul, 2013 9 commits
  5. 17 Jul, 2013 10 commits
    • Ronald S. Bultje's avatar
      Add a best_yrd shortcut in splitmv mode search. · c6917528
      Ronald S. Bultje authored
      Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
      1min6.2 to 1min5.9, i.e. 0.5% faster overall.
      Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
    • Ronald S. Bultje's avatar
      Skip redundant nearest/near/zero encodes in splitmv. · 161c9956
      Ronald S. Bultje authored
      Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
      1min7.3 to 1min6.2, i.e. 1.7% faster overall.
      Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
    • Yaowu Xu's avatar
      changed mode checking order · 42facc29
      Yaowu Xu authored
      Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
    • Ronald S. Bultje's avatar
      Skip nearest/near/zero redundant encodes. · 8fea880b
      Ronald S. Bultje authored
      Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
      to 1min7.3, i.e. 8% faster.
      Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
    • Ronald S. Bultje's avatar
      Best_rd breakout in rd partition search. · 9f427bfe
      Ronald S. Bultje authored
      About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
      goes from 1min36 to 1min24. Results become slightly better (+0.2% on
      derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
      super_block_yrd(). Overall speed change (on derfraw300) is roughly
      -13%. This can probably be improved further by caching best_yrd
      between partition searches. Also, we might be able to get more
      speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
      just at the sb8x8 level.
      Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
    • Ronald S. Bultje's avatar
      Do a skip-block check for sub8x8 partitions also. · 83c7e13a
      Ronald S. Bultje authored
      +0.2% SSIM and glbPSNR on derfraw300.
      Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      Further improvement will be done later.
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
    • Paul Wilkins's avatar
      Move uv intra mode selection in rd loop. · 2ee338ce
      Paul Wilkins authored
      Use an estimate based on DC_PRED for intra uv cost
      within the rd loop then only do a full uv mode analysis
      if an intra mode is chosen.
      Significant speed gains in some cases. Currently only
      enabled for speed 2 pending speed/quality tests.
      Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
    • Paul Wilkins's avatar
      Limit transform sizes searched for uv intra. · 6c667f0f
      Paul Wilkins authored
      Apply limit if search_method == USE_LARGESTALL
      to the range of UV tx sizes searched.
      Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
    • Jingning Han's avatar
      Skip redundant motion search in 4x4 level rd loop · a142d6fc
      Jingning Han authored
      This commit makes the encoder to perform motion search only once
      per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
      at 2000 kbps, the runtime goes from 253812ms -> 217817ms
      (14% speed-up) for speed 0.
      Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
  6. 16 Jul, 2013 3 commits
  7. 15 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      This operation is currently turned on in settings of speed 1.
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
  8. 12 Jul, 2013 3 commits
    • Yaowu Xu's avatar
      Fix a build issue · fb754b18
      Yaowu Xu authored
      Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
    • Deb Mukherjee's avatar
      Some minor cleanups for efficiency · 94c481f9
      Deb Mukherjee authored
      Implements some of the helper functions more efficiently with
      lookups rathers than branches. Modeling function is consolidated
      to reduce some computations.
      Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
      one because there is no need to keep them separate (even though
      the semantics are a little different).
      No bitstream or output change.
      About 0.5% speedup
      Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
    • Ronald S. Bultje's avatar
      Remove unused function block_error(). · ee09dd99
      Ronald S. Bultje authored
      Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
  9. 11 Jul, 2013 2 commits
  10. 10 Jul, 2013 5 commits
    • Jingning Han's avatar
      Fix tx_type bug in intra4x4 rd loop · 18803f9c
      Jingning Han authored
      This commit fixed the mis-use of the tx_type for inverse transform
      in intra4x4 rate-distortion optimization loop. It improves the
      overall coding performance.
      Change-Id: I7fe9953175b74890357dbcee33c138573766e980
    • Jim Bankoski's avatar
      remove warnings when NDEBUG is set · 6591cf2f
      Jim Bankoski authored
      Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
    • Deb Mukherjee's avatar
      Prunes out full-rd computation based on modeled rd · 53ff43ad
      Deb Mukherjee authored
      Adds a speed feature to eliminate full-rd computation if the modeled
      rd or rd based on a different parameter in the same mode is already
      a lot larger than the best rd yet.
      Specifically, only search the sharp and smooth filters if the modeled
      rd cost based on the  regular filter is within a certain factor of the
      best rd cost so far. Also, skip full-rd computation of non splitmv
      inter modes if the modeled rd cost based on pred error is within the
      same factor of the best rd cost so far.
      Also adds some enhancements in the rd search for splitmv mode to
      speed things up by early breakouts. Negligible impact on performance.
      Resuts on derfraw300:
      psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
               breakout feature on.
      speedup: 6% with splitmv enhancements, 20% with also residual breakout
               (tested on football sequence at 600 Kbps)
      Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
    • Ronald S. Bultje's avatar
      Remove memcpy() in handle_inter_mode() filter selection. · b1df674a
      Ronald S. Bultje authored
      Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
      Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
    • Yaowu Xu's avatar
      Add a feature to reduce chrome intra mode search · bed27a96
      Yaowu Xu authored
      Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8