1. 02 Jul, 2013 2 commits
    • Jingning Han's avatar
      Calculate rd cost per transformed block · b91a1586
      Jingning Han authored
      Compute the rate-distortion cost per transformed block, and cumulate
      the cost through all blocks inside a partition. This allows encoder
      to detect if the cumulative rd cost is already above the best rd cost,
      thereby enabling early termination in the rate-distortion optimization
      search.
      
      Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
      b91a1586
    • Paul Wilkins's avatar
      Revert "New motion threshold factor - speed feature." · b7cd01ed
      Paul Wilkins authored
      This reverts commit 13772781.
      Also fixes a spelling mistake.
      
      Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
      b7cd01ed
  2. 01 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
    • Paul Wilkins's avatar
      New motion threshold factor - speed feature. · 13772781
      Paul Wilkins authored
      Added a speed feature that focuses only on thresholds
      for new motion modes.
      
      Moved sf->comp_inter_joint_search_thresh into speed
      1.  This has ~+0.4% impact on quality at speed 0 as
      our quality reference baseline.
      
      Slight adjustment to baseline thresholds.
      
      Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
      13772781
  3. 29 Jun, 2013 1 commit
  4. 28 Jun, 2013 5 commits
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
    • Ronald S. Bultje's avatar
      Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a
      Ronald S. Bultje authored
      4x4: 234 -> 236 cycles
      8x8: 878 -> 888 cycles
      16x16: 3664 -> 3550 cycles
      32x32: 18134 -> 17392 cycles
      
      Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
      e3ce2b2a
    • Ronald S. Bultje's avatar
      Some minor optimizations for cost_coeffs(). · 91d223bd
      Ronald S. Bultje authored
      Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
      4x4: 298 -> 234 cycles
      8x8: 1227 -> 878 cycles
      16x16: 23426 -> 18134 cycles
      32x32: 4906 -> 3664 cycles
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
      from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
      
      Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
      91d223bd
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
    • Yaowu Xu's avatar
      Minor cleanups · 8b9eea0a
      Yaowu Xu authored
      Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
      8b9eea0a
  5. 27 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make intra predictor reference buffer configurable · 861cb06c
      Jingning Han authored
      This commit enables configurable reference buffer pointer for intra
      predictor. This allows later removal of spatial dependency between
      blocks inside a 64x64 superblock in the rate-distortion optimization
      loop.
      
      Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
      861cb06c
  6. 26 Jun, 2013 3 commits
    • Paul Wilkins's avatar
      Auto adapt step size feature. · 9f3ab834
      Paul Wilkins authored
      Also tweaks to other features and experiments with
      what is on and off at different speed settings.
      
      Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
      9f3ab834
    • Paul Wilkins's avatar
      Start adaptive threshold for each mode at max. · 689957e3
      Paul Wilkins authored
      Each frame we reset all adaptive thresholds to MAX
      rather than base. As modes are picked their thresholds
      drop down.
      
      Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
      689957e3
    • Paul Wilkins's avatar
      Change meaning of cpi->sf.first_step and rename. · e606cac0
      Paul Wilkins authored
      Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
      and changed its meaning such that it is a delta applied to
      reduce the default first step size (>> x) in the motion search
      rather than an absolute value.
      
      The default first step size is already changed according to the image
      dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
      now applies a further correction from the default.
      
      Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
      e606cac0
  7. 25 Jun, 2013 3 commits
  8. 21 Jun, 2013 3 commits
    • Dmitry Kovalev's avatar
      Transforming scale_mv_component_q4 into scale_mv_q4 function. · f27f76df
      Dmitry Kovalev authored
      Using MV instead of int_mv for function arguments.
      
      Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
      f27f76df
    • Ronald S. Bultje's avatar
      Implement SSE2 block_error. · 54b2a596
      Ronald S. Bultje authored
      Change vp9_block_error() to return a 64bit error variable, change all
      callers to expect a 64bit return value (this will prevent overflows,
      which we basically don't check for at all right now). Remove duplicate
      block_error() function, which fixed that through truncation. Remove
      old (incompatible) mmx/sse2 block_error SIMD versions and replace with
      a new one that returns a 64bit value.
      
      Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
      3min23, i.e. a 3% overall speedup.
      
      Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
      54b2a596
    • Yaowu Xu's avatar
      rename variables to avoid build error in MSVC · ee07a261
      Yaowu Xu authored
      Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
      ee07a261
  9. 20 Jun, 2013 2 commits
    • Deb Mukherjee's avatar
      Improving model rd with variance and quant step · 7947a33d
      Deb Mukherjee authored
      Improves the rd modeling function and implements them using interpolation
      from a table which is a little faster. Also uses sse as input to the
      modeling function rather than var - since there is no dc prediction
      used and as a result the sse works a little better.
      
      derfraw300: +0.05%
      Speedup: ~1%
      
      Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
      7947a33d
    • Jim Bankoski's avatar
      convert all speed things to speed features · 1f94b976
      Jim Bankoski authored
      Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
      1f94b976
  10. 19 Jun, 2013 1 commit
  11. 14 Jun, 2013 1 commit
  12. 11 Jun, 2013 1 commit
  13. 10 Jun, 2013 4 commits
    • John Koleszar's avatar
      Fix use of get_uv_tx_size in loopfilter · 717d744a
      John Koleszar authored
      Change the argument of get_uv_tx_size() to be an MBMI pointer, so that the
      correct column's MBMI can be passed to the function.
      
      Change-Id: Ied6b8ec33b77cdd353119e8fd2d157811815fc98
      717d744a
    • Paul Wilkins's avatar
      Rd check on segment level reference mode. · de6ec27d
      Paul Wilkins authored
      Do not allow the rd code to check compound modes if
      a segment level reference frame is selected.
      
      Change-Id: I95f0c57789e0eaceed7caf227e94b4ba3130a06c
      de6ec27d
    • Ronald S. Bultje's avatar
      Allow non-zeromv if ref_frame=intra with segmentation skip/ref enabled. · b12a8dac
      Ronald S. Bultje authored
      Change-Id: Ib5a95bb6ab643b276df3faa9bf99595e4a69ff18
      b12a8dac
    • Tero Rintaluoma's avatar
      Fixed point reference picture scaling · 86bb6df0
      Tero Rintaluoma authored
      Fixed point scaling factors are calculated once for each
      reference frame by using integer division. Otherwise fixed point
      scaling routines are used in all scaling calculations. This makes it
      possible to calculate fixed point scaling factors on device driver
      software and pass them to hardware and thus avoid division on hardware.
      
      TODO:
       - Missing check for maximum frame dimensions
         (currently scaling uses 14 bits)
       - Missing check for maximum scaling ratio
         (upscaling 16:1, downscaling 2:1)
      
      Problems:
       - Straightforward fixed point implementation can cause error +-1
         compared to integer division (i.e. in x_step_q4). Should only
         be an issue for frames larger than 16k.
      
      Change-Id: I3cf4dabd610a4dc18da3bdb31ae244ebaf5d579c
      86bb6df0
  14. 07 Jun, 2013 4 commits
    • Deb Mukherjee's avatar
      Coding tx-size selection by use of spatial context · 21401942
      Deb Mukherjee authored
      Adds coding of transform size within a frame by use of context
      of transform sizes selected in left and above blocks.
      
      Also incorporates code for generating stats.
      
      TODO: generate and incorporate new default stats
      
      Change-Id: I6a7af099f6ad61d448521d9a51167aedaf638ed6
      21401942
    • Paul Wilkins's avatar
      Change to segment ref frame feature. · 340c7a48
      Paul Wilkins authored
      Simplify feature to only support a single reference frame
      instead of a mask.
      
      Change-Id: I5dd3a98c7a224aafb35708850ab82e2f220e68fb
      340c7a48
    • Deb Mukherjee's avatar
      Coding updates for tx-size selection · 3ee1a21a
      Deb Mukherjee authored
      Changes to the coding of transform sizes, along with forward
      and backward probability updates.
      
      Results:
      derf300: +0.241%
      
      Context based coding of transform sizes will be in a separate
      patch.
      
      Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
      3ee1a21a
    • Ronald S. Bultje's avatar
      Change ref frame coding. · 6ef805eb
      Ronald S. Bultje authored
      Code intra/inter, then comp/single, then the ref frame selection.
      Use contextualization for all steps. Don't code two past frames
      in comp pred mode.
      
      Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
      6ef805eb
  15. 06 Jun, 2013 7 commits
    • Ronald S. Bultje's avatar
      New intra mode and partitioning probabilities. · ad343687
      Ronald S. Bultje authored
      Split partition probabilities between keyframes and non-keyframes,
      since they are fairly different. Also have per-blocksize interframe
      y intramode probabilities, since these vary heavily between different
      blocksizes.
      
      Lastly, replace default probabilities for partitioning and intra modes
      with new ones generated from current codec. Replace counts with actual
      probabilities also.
      
      Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
      ad343687
    • Jingning Han's avatar
      Bug fix in rd_pick_inter_mode_sb_ · d03e974f
      Jingning Han authored
      Fix the calculation of step size in height.
      
      Change-Id: I0e0c0175f141f5a41214ae51cef233d13942d3c5
      d03e974f
    • Jim Bankoski's avatar
      signs reverted · b4c4f648
      Jim Bankoski authored
      Change-Id: Ieface458c83eb6e7ee95595d9fc662f372117c9a
      b4c4f648
    • Paul Wilkins's avatar
      Rd thresholds change with block size. · c3316c2b
      Paul Wilkins authored
      Added structures to support independent rd thresholds
      for different block sizes (and set experimental block
      size correction factors).
      
      Added structure to to allow dynamic adaptation of thresholds
      per mode and per block size basis depending on how often
      the mode/block size combination is seen (currently fixed factor).
      
      Removed some unused variables.
      
      TODO
      - Adaptation of thresholds based on how often each mode chosen.
      - The baseline mode values could also be adjusted based on
        the block size (e.g. for a particular intra mode use a low threshold
        for 4x4 prediction blocks but a relatively high value for 64x64.
      
      Change-Id: Iddee65ff3324ee309815ae7c1c5a8584720e7568
      c3316c2b
    • Paul Wilkins's avatar
      Turn off compound inter search refinement for good quality. · c880e02f
      Paul Wilkins authored
      Turn this feature off for some modes in  "good" quality.
      
      Change-Id: I3f262d62cca8f01736b977af1465291e8be29f0a
      c880e02f
    • Jim Bankoski's avatar
      don't tokenize & encode tokens for blocks in UMV · 5a88271b
      Jim Bankoski authored
      This avoids encoding tokens for blocks that are entirely
      in the UMV border. This changes the bitstream.
      
      Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
      5a88271b
    • Jingning Han's avatar
      Fix UV intra coding rd loop · f04b1548
      Jingning Han authored
      This commit makes the coding/reconstruction operations of intra
      coding rate-distortion loop for UV components consistent with those
      of the encoding process.
      
      key frame coding gains:
      derf:   0.11%
      stdhd:  0.42%
      
      Change-Id: I8d49f83924a320e3689ef2d60096c49d7f0c7a40
      f04b1548