1. 26 Jun, 2013 3 commits
    • Paul Wilkins's avatar
      Auto adapt step size feature. · 9f3ab834
      Paul Wilkins authored
      Also tweaks to other features and experiments with
      what is on and off at different speed settings.
      Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
    • Paul Wilkins's avatar
      Start adaptive threshold for each mode at max. · 689957e3
      Paul Wilkins authored
      Each frame we reset all adaptive thresholds to MAX
      rather than base. As modes are picked their thresholds
      drop down.
      Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
    • Paul Wilkins's avatar
      Change meaning of cpi->sf.first_step and rename. · e606cac0
      Paul Wilkins authored
      Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
      and changed its meaning such that it is a delta applied to
      reduce the default first step size (>> x) in the motion search
      rather than an absolute value.
      The default first step size is already changed according to the image
      dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
      now applies a further correction from the default.
      Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
  2. 25 Jun, 2013 10 commits
    • Jingning Han's avatar
      Refactor intra predictor block · d19ea386
      Jingning Han authored
      Remove vp9_intra4x4_predict(). Use the common intra prediction
      function for all block sizes.
      Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
    • Dmitry Kovalev's avatar
      Renaming "nmv" to "mv". · 6fb10f2d
      Dmitry Kovalev authored
      Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
    • Ronald S. Bultje's avatar
      Only do metrics on cropped (visible) area of picture. · 450c7b57
      Ronald S. Bultje authored
      The part where we align it by 8 or 16 is an implementation detail that
      shouldn't matter to the outside world.
      Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
    • Ronald S. Bultje's avatar
      Don't skip right/bottom border pixels in SSIM calculations. · 44f349df
      Ronald S. Bultje authored
      Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
    • Ronald S. Bultje's avatar
      Add averaging-SAD functions for 8-point comp-inter motion search. · c24d9223
      Ronald S. Bultje authored
      Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
      i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
      the variance of the averaging predictor. This is slightly suboptimal
      because the function is subpixel-position-aware, but it will (at least
      for the SSE2 version) not actually use a bilinear filter for a full-pixel
      position, thus leading to approximately the same performance compared to
      if we implemented an actual average-aware full-pixel variance function.
      That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
      leading to a total gain of 2.7%.
      Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
    • Jingning Han's avatar
      Tune the rounding operations in 8x8 ADST/DCT sse2 · 0084e61d
      Jingning Han authored
      Improve the round-trip precision to meet the unit test setttings.
      Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
    • Jingning Han's avatar
      Add 8x8 dct/adst unit tests · ab362621
      Jingning Han authored
      This commit enables 8x8 DCT and hybrid transform unit tests. It
      also tunes the forward hybrid transform rounding opertions for
      more precise round-trip performance.
      Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
    • Jingning Han's avatar
      Use aligned buffer operations in 8x8/16x16 2D-DCT · 82d504b5
      Jingning Han authored
      This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
      Change-Id: I137758b81cd127b936175284310e81378db64552
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
  3. 24 Jun, 2013 4 commits
  4. 21 Jun, 2013 8 commits
  5. 20 Jun, 2013 12 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
    • Deb Mukherjee's avatar
      Improving model rd with variance and quant step · 7947a33d
      Deb Mukherjee authored
      Improves the rd modeling function and implements them using interpolation
      from a table which is a little faster. Also uses sse as input to the
      modeling function rather than var - since there is no dc prediction
      used and as a result the sse works a little better.
      derfraw300: +0.05%
      Speedup: ~1%
      Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
    • Jim Bankoski's avatar
      adds force partitioning greater than or less than block size · 9f2a1ae2
      Jim Bankoski authored
      adds a new speed feature to force partitioning to be greater than
      or less than a certain size
      Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
    • Jim Bankoski's avatar
      adds a set partitioning to speed features · 18bdf708
      Jim Bankoski authored
      this feature lets you set a partitioning size to be used by the entire
      Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
    • Jim Bankoski's avatar
      partition by variance using var from last frame · 476d73d2
      Jim Bankoski authored
      This uses variance to split partition. Variance is calculated using
      nearest mv,  always from last ref frame.
      Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
    • Jim Bankoski's avatar
      convert all speed things to speed features · 1f94b976
      Jim Bankoski authored
      Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
    • Jim Bankoski's avatar
      new partition via variance · 727fa7b1
      Jim Bankoski authored
      Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
    • Jim Bankoski's avatar
      fix to set up new speed feature · 0fad6a9d
      Jim Bankoski authored
      This uses the speed feature functionality for code.
      Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
    • Jim Bankoski's avatar
      don't copy partitions for key frames or altrefs · df2314cf
      Jim Bankoski authored
      force us to go through slow partitioning for keyframes, altref and
      Change-Id: I1a286361bf74083e71973575a7296be46eb98742
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
    • Jim Bankoski's avatar
      disable speed > 1 speed corrections in firstpass · f954490b
      Jim Bankoski authored
      need to rework these
      Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
    • Jim Bankoski's avatar
      copy partitioning from last fame · f033b44e
      Jim Bankoski authored
      Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
  6. 19 Jun, 2013 3 commits
    • Yunqing Wang's avatar
      Add two-pass quantization · b5bf7b13
      Yunqing Wang authored
      Optimized the quantization function by making it a two-pass
      process. The first pass does a quick checking of the transform
      coefficients against the base ZBIN, and only keep the good
      enough set of coefficients for quantization. A skipping
      check is added. If all coefficients are within the base ZBIN, no
      quantization is needed. The second pass is the actual quantization
      pass, which only processes the coefficient subset determined
      in first pass. This reduces the computation. Furthermore, an
      alternitive method is used for large transform size, which often
      has sparse nonzero quantized coefficients.
      Overall, the encoder speedup is about 4%. The quantization function
      itself gets 20% faster.
      Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
    • Yaowu Xu's avatar
      Remove unnecessary copying of probs. · 12180c83
      Yaowu Xu authored
      Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
    • Dmitry Kovalev's avatar
      Renaming 'nmv' to 'mv' for several functions. · 87e1fa76
      Dmitry Kovalev authored
      Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09