1. 24 Jul, 2013 1 commit
  2. 23 Jul, 2013 3 commits
    • Dmitry Kovalev's avatar
      Removing vp9_is_interpolating_filter array. · db7f5d28
      Dmitry Kovalev authored
      All filters are interpolating now, so we don't need this array, all
      values from this array are evaluated to true.
      
      Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
      db7f5d28
    • Jim Bankoski's avatar
      clean up bw, bh · 86a9dec7
      Jim Bankoski authored
      many structures use bw and bh and they have different meanings.   This cl attempts
      to start this clean up and remove unneccessary 2 step look up log and then
      shift operations...
      
      also removed partition type multiple operation code in bitstream.c.
      
      Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
      86a9dec7
    • James Zern's avatar
      vp9: make some static tables const · 3c8cce35
      James Zern authored
      Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
      3c8cce35
  3. 22 Jul, 2013 5 commits
    • Ronald S. Bultje's avatar
      More optimizations for cost_coeffs(). · e20fcd95
      Ronald S. Bultje authored
      4x4:    163 ->  123 cycles (33% faster)
      8x8:    491 ->  399 cycles (23% faster)
      16x16: 1889 -> 1763 cycles (7% faster)
      32x32: 8311 -> 8180 cycles (1.6% faster)
      
      Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
      goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.
      
      Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
      e20fcd95
    • Dmitry Kovalev's avatar
      Adding update_tx_counts function. · b2fc6fa9
      Dmitry Kovalev authored
      Moving common encoder/decoder code to update_tx_counts. Also renaming
      vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
      call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
      (twice before).
      
      Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
      b2fc6fa9
    • Yaowu Xu's avatar
      fix a build error · fc186dca
      Yaowu Xu authored
      Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
      fc186dca
    • Jingning Han's avatar
      Optimize operation flow in sub8x8 rd loop · 409e77f2
      Jingning Han authored
      Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
      the encoder to skip the forward transform, quantization, and coeff cost
      estimation, in the sub8x8 rd optimization search, if the motion
      vector(s) are of integer pixel value, and have been tested in the
      previous prediction filter type rd loops of the same block.
      
      This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
      Its efficacy depends how frequently the motion search will select an
      integer motion vector.
      
      Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
      409e77f2
    • Paul Wilkins's avatar
      Re-order mode search in rd. · 1d189d64
      Paul Wilkins authored
      Mode search order in rd loop changed to better reflect
      observed hit counts.
      
      Also some adjustment of the baseline mode rd thresholds
      to reflect the order change and observed frequencies.
      
      Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
      1d189d64
  4. 21 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip buffer update in sub8x8 rd loop · c725502b
      Jingning Han authored
      This commit allows the encoder to skip a few buffer update steps in
      rd_pick_best_mbsegmentation, when early breakout has been triggered
      in the rd_check_segment_txsize. It provides about 1% speed-up for
      bus_cif at 2000 kbps, in the settings of speed 0.
      
      Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
      c725502b
  5. 19 Jul, 2013 5 commits
    • Deb Mukherjee's avatar
      Reworked the auto_mv_step_size speed feature · 302698fb
      Deb Mukherjee authored
      This patch modifies the auto_mv_step_size speed feature to
      use a combination of the maximum magnitude mv from the last
      inter frame, and the maximum magnitude mv for the two reference
      mvs with the same reference. For arf frames, the max mav step
      for the resolution is used.
      The bounds therefore are slightly tighter. The feature is made
      a speed 1 feature.
      
      Rebased.
      
      Results (when this feature is turned on over speed 0):
      derfraw300: -0.046% psnr, about 5+% speedup
      (tested on football: goes from 4m30.760s to 4m17.410s).
      
      Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
      302698fb
    • Dmitry Kovalev's avatar
      Removing frame_type field from MACROBLOCKD struct. · 97e96bc4
      Dmitry Kovalev authored
      Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
      97e96bc4
    • Dmitry Kovalev's avatar
      Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE). · c0eb5740
      Dmitry Kovalev authored
      Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
      tx_mode.
      
      Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
      c0eb5740
    • Dmitry Kovalev's avatar
      Removing redundant VP9_COMMON* from function signatures. · afe43d40
      Dmitry Kovalev authored
      Functions: vp9_get_pred_context_switchable_interp,
                 vp9_get_pred_context_intra_inter,
                 vp9_get_pred_context_single_ref_p1,
                 vp9_get_pred_context_single_ref_p2.
      
      Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
      afe43d40
    • Dmitry Kovalev's avatar
      Consistent names for inter mode probabilities and encodings. · bc7acb13
      Dmitry Kovalev authored
      Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
      vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.
      
      Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
      bc7acb13
  6. 18 Jul, 2013 9 commits
  7. 17 Jul, 2013 10 commits
    • Ronald S. Bultje's avatar
      Add a best_yrd shortcut in splitmv mode search. · c6917528
      Ronald S. Bultje authored
      Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
      1min6.2 to 1min5.9, i.e. 0.5% faster overall.
      
      Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
      c6917528
    • Ronald S. Bultje's avatar
      Skip redundant nearest/near/zero encodes in splitmv. · 161c9956
      Ronald S. Bultje authored
      Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
      1min7.3 to 1min6.2, i.e. 1.7% faster overall.
      
      Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
      161c9956
    • Yaowu Xu's avatar
      changed mode checking order · 42facc29
      Yaowu Xu authored
      Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
      42facc29
    • Ronald S. Bultje's avatar
      Skip nearest/near/zero redundant encodes. · 8fea880b
      Ronald S. Bultje authored
      Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
      to 1min7.3, i.e. 8% faster.
      
      Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
      8fea880b
    • Ronald S. Bultje's avatar
      Best_rd breakout in rd partition search. · 9f427bfe
      Ronald S. Bultje authored
      About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
      goes from 1min36 to 1min24. Results become slightly better (+0.2% on
      derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
      super_block_yrd(). Overall speed change (on derfraw300) is roughly
      -13%. This can probably be improved further by caching best_yrd
      between partition searches. Also, we might be able to get more
      speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
      just at the sb8x8 level.
      
      Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
      9f427bfe
    • Ronald S. Bultje's avatar
      Do a skip-block check for sub8x8 partitions also. · 83c7e13a
      Ronald S. Bultje authored
      +0.2% SSIM and glbPSNR on derfraw300.
      
      Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
      83c7e13a
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      
      Further improvement will be done later.
      
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
      df90d58f
    • Paul Wilkins's avatar
      Move uv intra mode selection in rd loop. · 2ee338ce
      Paul Wilkins authored
      Use an estimate based on DC_PRED for intra uv cost
      within the rd loop then only do a full uv mode analysis
      if an intra mode is chosen.
      
      Significant speed gains in some cases. Currently only
      enabled for speed 2 pending speed/quality tests.
      
      Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
      2ee338ce
    • Paul Wilkins's avatar
      Limit transform sizes searched for uv intra. · 6c667f0f
      Paul Wilkins authored
      Apply limit if search_method == USE_LARGESTALL
      to the range of UV tx sizes searched.
      
      Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
      6c667f0f
    • Jingning Han's avatar
      Skip redundant motion search in 4x4 level rd loop · a142d6fc
      Jingning Han authored
      This commit makes the encoder to perform motion search only once
      per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
      at 2000 kbps, the runtime goes from 253812ms -> 217817ms
      (14% speed-up) for speed 0.
      
      Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
      a142d6fc
  8. 16 Jul, 2013 3 commits
  9. 15 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      
      This operation is currently turned on in settings of speed 1.
      
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
      faff6ed0
  10. 12 Jul, 2013 2 commits
    • Yaowu Xu's avatar
      Fix a build issue · fb754b18
      Yaowu Xu authored
      Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
      fb754b18
    • Deb Mukherjee's avatar
      Some minor cleanups for efficiency · 94c481f9
      Deb Mukherjee authored
      Implements some of the helper functions more efficiently with
      lookups rathers than branches. Modeling function is consolidated
      to reduce some computations.
      
      Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
      one because there is no need to keep them separate (even though
      the semantics are a little different).
      
      No bitstream or output change.
      
      About 0.5% speedup
      
      Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
      94c481f9