1. 02 Jul, 2013 4 commits
  2. 01 Jul, 2013 5 commits
    • Ronald S. Bultje's avatar
      Update quantize SSSE3 SIMD to cover 32x32 transform case also. · c8defcfd
      Ronald S. Bultje authored
      Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
      2min10.1, i.e. a 2.3% overall speed increase.
      
      Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
      c8defcfd
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
    • Dmitry Kovalev's avatar
      Removing vp9_modecont.{h, c}. · 2ab3bc88
      Dmitry Kovalev authored
      Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.
      
      Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
      2ab3bc88
    • Yaowu Xu's avatar
      fix a mismatch in cpuused 2 · 632289b3
      Yaowu Xu authored
      Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
      632289b3
    • Paul Wilkins's avatar
      New motion threshold factor - speed feature. · 13772781
      Paul Wilkins authored
      Added a speed feature that focuses only on thresholds
      for new motion modes.
      
      Moved sf->comp_inter_joint_search_thresh into speed
      1.  This has ~+0.4% impact on quality at speed 0 as
      our quality reference baseline.
      
      Slight adjustment to baseline thresholds.
      
      Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
      13772781
  3. 29 Jun, 2013 4 commits
  4. 28 Jun, 2013 9 commits
    • Jingning Han's avatar
      Fix switch statement in 8x8 transform · 9def7f72
      Jingning Han authored
      Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
      9def7f72
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
    • Dmitry Kovalev's avatar
      Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb
      Dmitry Kovalev authored
      Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
      ones from vp9_onyx_int.h and vp9_onyxd_int.h.
      
      Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
      8e6ce6bb
    • Ronald S. Bultje's avatar
      Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a
      Ronald S. Bultje authored
      4x4: 234 -> 236 cycles
      8x8: 878 -> 888 cycles
      16x16: 3664 -> 3550 cycles
      32x32: 18134 -> 17392 cycles
      
      Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
      e3ce2b2a
    • Ronald S. Bultje's avatar
      Some minor optimizations for cost_coeffs(). · 91d223bd
      Ronald S. Bultje authored
      Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
      4x4: 298 -> 234 cycles
      8x8: 1227 -> 878 cycles
      16x16: 23426 -> 18134 cycles
      32x32: 4906 -> 3664 cycles
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
      from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
      
      Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
      91d223bd
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
    • Yaowu Xu's avatar
      Minor cleanups · 8b9eea0a
      Yaowu Xu authored
      Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
      8b9eea0a
    • Yaowu Xu's avatar
      Optimize partition search order · 1374a06b
      Yaowu Xu authored
      This commit change the partition search order to allow checking of
      rectangular partition to be done after square partitions. It also
      added a speed feature to skip rectangular partition check when
      NONE is better than SPLIT in RD sense.
      
      This feature roughly speed up encoder by 1.5X with loss on compression
      -0.91% on cif set
      -0.56% on stdhd set
      
      Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
      1374a06b
    • Ronald S. Bultje's avatar
      Fix tile independence with both column tiling and static_thresh set. · fd4eed3b
      Ronald S. Bultje authored
      Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
      fd4eed3b
  5. 27 Jun, 2013 3 commits
    • Dmitry Kovalev's avatar
      Decoder's code cleanup. · 3231da0a
      Dmitry Kovalev authored
      Using vp9_set_pred_flag function instead of custom code, adding
      decode_tokens function which is now called from decode_atom,
      decode_sb_intra, and decode_sb.
      
      Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
      3231da0a
    • Ronald S. Bultje's avatar
      Inline quantize so idiv instruction gets removed from inner loop. · 7a049be6
      Ronald S. Bultje authored
      Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
      3min15.0 to 3min10.9, i.e. 2.1% faster overall.
      
      Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
      7a049be6
    • Jingning Han's avatar
      Make intra predictor reference buffer configurable · 861cb06c
      Jingning Han authored
      This commit enables configurable reference buffer pointer for intra
      predictor. This allows later removal of spatial dependency between
      blocks inside a 64x64 superblock in the rate-distortion optimization
      loop.
      
      Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
      861cb06c
  6. 26 Jun, 2013 7 commits
  7. 25 Jun, 2013 8 commits