1. 25 Jun, 2013 3 commits
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      functions.
      
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
      87ee34aa
    • Jingning Han's avatar
      Use aligned buffer operations in 8x8/16x16 2D-DCT · 82d504b5
      Jingning Han authored
      This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
      
      Change-Id: I137758b81cd127b936175284310e81378db64552
      82d504b5
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
      a32a086d
  2. 24 Jun, 2013 1 commit
  3. 21 Jun, 2013 8 commits
  4. 20 Jun, 2013 12 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
      1e6a32f1
    • Deb Mukherjee's avatar
      Improving model rd with variance and quant step · 7947a33d
      Deb Mukherjee authored
      Improves the rd modeling function and implements them using interpolation
      from a table which is a little faster. Also uses sse as input to the
      modeling function rather than var - since there is no dc prediction
      used and as a result the sse works a little better.
      
      derfraw300: +0.05%
      Speedup: ~1%
      
      Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
      7947a33d
    • Jim Bankoski's avatar
      adds force partitioning greater than or less than block size · 9f2a1ae2
      Jim Bankoski authored
      adds a new speed feature to force partitioning to be greater than
      or less than a certain size
      
      Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
      9f2a1ae2
    • Jim Bankoski's avatar
      adds a set partitioning to speed features · 18bdf708
      Jim Bankoski authored
      this feature lets you set a partitioning size to be used by the entire
      frame.
      
      Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
      18bdf708
    • Jim Bankoski's avatar
      partition by variance using var from last frame · 476d73d2
      Jim Bankoski authored
      This uses variance to split partition. Variance is calculated using
      nearest mv,  always from last ref frame.
      
      Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
      476d73d2
    • Jim Bankoski's avatar
      convert all speed things to speed features · 1f94b976
      Jim Bankoski authored
      Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
      1f94b976
    • Jim Bankoski's avatar
      new partition via variance · 727fa7b1
      Jim Bankoski authored
      Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
      727fa7b1
    • Jim Bankoski's avatar
      fix to set up new speed feature · 0fad6a9d
      Jim Bankoski authored
      This uses the speed feature functionality for code.
      
      Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
      0fad6a9d
    • Jim Bankoski's avatar
      don't copy partitions for key frames or altrefs · df2314cf
      Jim Bankoski authored
      force us to go through slow partitioning for keyframes, altref and
      overlays.
      
      Change-Id: I1a286361bf74083e71973575a7296be46eb98742
      df2314cf
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
      8fb6c581
    • Jim Bankoski's avatar
      disable speed > 1 speed corrections in firstpass · f954490b
      Jim Bankoski authored
      need to rework these
      
      Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
      f954490b
    • Jim Bankoski's avatar
      copy partitioning from last fame · f033b44e
      Jim Bankoski authored
      Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
      f033b44e
  5. 19 Jun, 2013 3 commits
    • Yunqing Wang's avatar
      Add two-pass quantization · b5bf7b13
      Yunqing Wang authored
      Optimized the quantization function by making it a two-pass
      process. The first pass does a quick checking of the transform
      coefficients against the base ZBIN, and only keep the good
      enough set of coefficients for quantization. A skipping
      check is added. If all coefficients are within the base ZBIN, no
      quantization is needed. The second pass is the actual quantization
      pass, which only processes the coefficient subset determined
      in first pass. This reduces the computation. Furthermore, an
      alternitive method is used for large transform size, which often
      has sparse nonzero quantized coefficients.
      
      Overall, the encoder speedup is about 4%. The quantization function
      itself gets 20% faster.
      
      Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
      b5bf7b13
    • Yaowu Xu's avatar
      Remove unnecessary copying of probs. · 12180c83
      Yaowu Xu authored
      Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
      12180c83
    • Dmitry Kovalev's avatar
      Renaming 'nmv' to 'mv' for several functions. · 87e1fa76
      Dmitry Kovalev authored
      Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
      87e1fa76
  6. 18 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make fdct32 computation flow within 16bit range · a41a4860
      Jingning Han authored
      This commit makes use of dual fdct32x32 versions for rate-distortion
      optimization loop and encoding process, respectively. The one for
      rd loop requires only 16 bits precision for intermediate steps.
      The original fdct32x32 that allows higher intermediate precision (18
      bits) was retained for the encoding process only.
      
      This allows speed-up for fdct32x32 in the rd loop. No performance
      loss observed.
      
      Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
      a41a4860
  7. 17 Jun, 2013 4 commits
  8. 14 Jun, 2013 3 commits
  9. 13 Jun, 2013 1 commit
    • Jingning Han's avatar
      Enable sse2 version of sad8x4/4x8 · 15f50e7b
      Jingning Han authored
      The encoding time for bus at CIF goes from 661s to 625s. This commit
      also enabled unit test of sad8x4/4x8 in sad_test.cc.
      
      Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
      15f50e7b
  10. 12 Jun, 2013 3 commits
  11. 11 Jun, 2013 1 commit
    • Deb Mukherjee's avatar
      Minor change in forward updates · a4d906c1
      Deb Mukherjee authored
      Removes the case of coding prob = 0 for forward updates, since that
      is not an allowed probability to code.
      Slightly improves efficiency but may not matter in practice.
      
      Change-Id: I3b4caf82e8f0891992f0706d4089cc5a27568dba
      a4d906c1