1. 26 Jun, 2013 3 commits
    • Paul Wilkins's avatar
      Auto adapt step size feature. · 9f3ab834
      Paul Wilkins authored
      Also tweaks to other features and experiments with
      what is on and off at different speed settings.
      
      Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
      9f3ab834
    • Paul Wilkins's avatar
      Start adaptive threshold for each mode at max. · 689957e3
      Paul Wilkins authored
      Each frame we reset all adaptive thresholds to MAX
      rather than base. As modes are picked their thresholds
      drop down.
      
      Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
      689957e3
    • Paul Wilkins's avatar
      Change meaning of cpi->sf.first_step and rename. · e606cac0
      Paul Wilkins authored
      Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
      and changed its meaning such that it is a delta applied to
      reduce the default first step size (>> x) in the motion search
      rather than an absolute value.
      
      The default first step size is already changed according to the image
      dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
      now applies a further correction from the default.
      
      Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
      e606cac0
  2. 25 Jun, 2013 10 commits
    • Jingning Han's avatar
      Refactor intra predictor block · d19ea386
      Jingning Han authored
      Remove vp9_intra4x4_predict(). Use the common intra prediction
      function for all block sizes.
      
      Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
      d19ea386
    • Dmitry Kovalev's avatar
      Renaming "nmv" to "mv". · 6fb10f2d
      Dmitry Kovalev authored
      Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
      6fb10f2d
    • Ronald S. Bultje's avatar
      Only do metrics on cropped (visible) area of picture. · 450c7b57
      Ronald S. Bultje authored
      The part where we align it by 8 or 16 is an implementation detail that
      shouldn't matter to the outside world.
      
      Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
      450c7b57
    • Ronald S. Bultje's avatar
      Don't skip right/bottom border pixels in SSIM calculations. · 44f349df
      Ronald S. Bultje authored
      Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
      44f349df
    • Ronald S. Bultje's avatar
      Add averaging-SAD functions for 8-point comp-inter motion search. · c24d9223
      Ronald S. Bultje authored
      Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
      i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
      the variance of the averaging predictor. This is slightly suboptimal
      because the function is subpixel-position-aware, but it will (at least
      for the SSE2 version) not actually use a bilinear filter for a full-pixel
      position, thus leading to approximately the same performance compared to
      if we implemented an actual average-aware full-pixel variance function.
      That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
      leading to a total gain of 2.7%.
      
      Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
      c24d9223
    • Jingning Han's avatar
      Tune the rounding operations in 8x8 ADST/DCT sse2 · 0084e61d
      Jingning Han authored
      Improve the round-trip precision to meet the unit test setttings.
      
      Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
      0084e61d
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      functions.
      
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
      87ee34aa
    • Jingning Han's avatar
      Add 8x8 dct/adst unit tests · ab362621
      Jingning Han authored
      This commit enables 8x8 DCT and hybrid transform unit tests. It
      also tunes the forward hybrid transform rounding opertions for
      more precise round-trip performance.
      
      Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
      ab362621
    • Jingning Han's avatar
      Use aligned buffer operations in 8x8/16x16 2D-DCT · 82d504b5
      Jingning Han authored
      This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
      
      Change-Id: I137758b81cd127b936175284310e81378db64552
      82d504b5
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
      a32a086d
  3. 24 Jun, 2013 4 commits
  4. 21 Jun, 2013 8 commits
  5. 20 Jun, 2013 12 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
      1e6a32f1
    • Deb Mukherjee's avatar
      Improving model rd with variance and quant step · 7947a33d
      Deb Mukherjee authored
      Improves the rd modeling function and implements them using interpolation
      from a table which is a little faster. Also uses sse as input to the
      modeling function rather than var - since there is no dc prediction
      used and as a result the sse works a little better.
      
      derfraw300: +0.05%
      Speedup: ~1%
      
      Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
      7947a33d
    • Jim Bankoski's avatar
      adds force partitioning greater than or less than block size · 9f2a1ae2
      Jim Bankoski authored
      adds a new speed feature to force partitioning to be greater than
      or less than a certain size
      
      Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
      9f2a1ae2
    • Jim Bankoski's avatar
      adds a set partitioning to speed features · 18bdf708
      Jim Bankoski authored
      this feature lets you set a partitioning size to be used by the entire
      frame.
      
      Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
      18bdf708
    • Jim Bankoski's avatar
      partition by variance using var from last frame · 476d73d2
      Jim Bankoski authored
      This uses variance to split partition. Variance is calculated using
      nearest mv,  always from last ref frame.
      
      Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
      476d73d2
    • Jim Bankoski's avatar
      convert all speed things to speed features · 1f94b976
      Jim Bankoski authored
      Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
      1f94b976
    • Jim Bankoski's avatar
      new partition via variance · 727fa7b1
      Jim Bankoski authored
      Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
      727fa7b1
    • Jim Bankoski's avatar
      fix to set up new speed feature · 0fad6a9d
      Jim Bankoski authored
      This uses the speed feature functionality for code.
      
      Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
      0fad6a9d
    • Jim Bankoski's avatar
      don't copy partitions for key frames or altrefs · df2314cf
      Jim Bankoski authored
      force us to go through slow partitioning for keyframes, altref and
      overlays.
      
      Change-Id: I1a286361bf74083e71973575a7296be46eb98742
      df2314cf
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
      8fb6c581
    • Jim Bankoski's avatar
      disable speed > 1 speed corrections in firstpass · f954490b
      Jim Bankoski authored
      need to rework these
      
      Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
      f954490b
    • Jim Bankoski's avatar
      copy partitioning from last fame · f033b44e
      Jim Bankoski authored
      Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
      f033b44e
  6. 19 Jun, 2013 3 commits
    • Yunqing Wang's avatar
      Add two-pass quantization · b5bf7b13
      Yunqing Wang authored
      Optimized the quantization function by making it a two-pass
      process. The first pass does a quick checking of the transform
      coefficients against the base ZBIN, and only keep the good
      enough set of coefficients for quantization. A skipping
      check is added. If all coefficients are within the base ZBIN, no
      quantization is needed. The second pass is the actual quantization
      pass, which only processes the coefficient subset determined
      in first pass. This reduces the computation. Furthermore, an
      alternitive method is used for large transform size, which often
      has sparse nonzero quantized coefficients.
      
      Overall, the encoder speedup is about 4%. The quantization function
      itself gets 20% faster.
      
      Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
      b5bf7b13
    • Yaowu Xu's avatar
      Remove unnecessary copying of probs. · 12180c83
      Yaowu Xu authored
      Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
      12180c83
    • Dmitry Kovalev's avatar
      Renaming 'nmv' to 'mv' for several functions. · 87e1fa76
      Dmitry Kovalev authored
      Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
      87e1fa76