1. 26 Jun, 2013 1 commit
  2. 25 Jun, 2013 9 commits
    • Ronald S. Bultje's avatar
      Only do metrics on cropped (visible) area of picture. · 450c7b57
      Ronald S. Bultje authored
      The part where we align it by 8 or 16 is an implementation detail that
      shouldn't matter to the outside world.
      
      Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
      450c7b57
    • Ronald S. Bultje's avatar
      Don't skip right/bottom border pixels in SSIM calculations. · 44f349df
      Ronald S. Bultje authored
      Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
      44f349df
    • Ronald S. Bultje's avatar
      Add averaging-SAD functions for 8-point comp-inter motion search. · c24d9223
      Ronald S. Bultje authored
      Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
      i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
      the variance of the averaging predictor. This is slightly suboptimal
      because the function is subpixel-position-aware, but it will (at least
      for the SSE2 version) not actually use a bilinear filter for a full-pixel
      position, thus leading to approximately the same performance compared to
      if we implemented an actual average-aware full-pixel variance function.
      That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
      leading to a total gain of 2.7%.
      
      Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
      c24d9223
    • Jingning Han's avatar
      Tune the rounding operations in 8x8 ADST/DCT sse2 · 0084e61d
      Jingning Han authored
      Improve the round-trip precision to meet the unit test setttings.
      
      Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
      0084e61d
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      functions.
      
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
      87ee34aa
    • Jingning Han's avatar
      Add 8x8 dct/adst unit tests · ab362621
      Jingning Han authored
      This commit enables 8x8 DCT and hybrid transform unit tests. It
      also tunes the forward hybrid transform rounding opertions for
      more precise round-trip performance.
      
      Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
      ab362621
    • Scott LaVarnway's avatar
      Small mode_info_context cleanup in filter_block_plane · c787f40b
      Scott LaVarnway authored
      Unnecessary updates to xd->mode_info_context.
      
      Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
      c787f40b
    • Jingning Han's avatar
      Use aligned buffer operations in 8x8/16x16 2D-DCT · 82d504b5
      Jingning Han authored
      This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
      
      Change-Id: I137758b81cd127b936175284310e81378db64552
      82d504b5
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
      a32a086d
  3. 24 Jun, 2013 4 commits
  4. 21 Jun, 2013 10 commits
  5. 20 Jun, 2013 15 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
      1e6a32f1
    • Frank Galligan's avatar
      Fix win64 warning. · c259af4f
      Frank Galligan authored
      - size_t vs int.
      
      Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
      c259af4f
    • Deb Mukherjee's avatar
      Improving model rd with variance and quant step · 7947a33d
      Deb Mukherjee authored
      Improves the rd modeling function and implements them using interpolation
      from a table which is a little faster. Also uses sse as input to the
      modeling function rather than var - since there is no dc prediction
      used and as a result the sse works a little better.
      
      derfraw300: +0.05%
      Speedup: ~1%
      
      Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
      7947a33d
    • Jim Bankoski's avatar
      adds force partitioning greater than or less than block size · 9f2a1ae2
      Jim Bankoski authored
      adds a new speed feature to force partitioning to be greater than
      or less than a certain size
      
      Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
      9f2a1ae2
    • Jim Bankoski's avatar
      adds a set partitioning to speed features · 18bdf708
      Jim Bankoski authored
      this feature lets you set a partitioning size to be used by the entire
      frame.
      
      Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
      18bdf708
    • Jim Bankoski's avatar
      partition by variance using var from last frame · 476d73d2
      Jim Bankoski authored
      This uses variance to split partition. Variance is calculated using
      nearest mv,  always from last ref frame.
      
      Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
      476d73d2
    • Jim Bankoski's avatar
      convert all speed things to speed features · 1f94b976
      Jim Bankoski authored
      Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
      1f94b976
    • Jim Bankoski's avatar
      new partition via variance · 727fa7b1
      Jim Bankoski authored
      Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
      727fa7b1
    • Jim Bankoski's avatar
      fix to set up new speed feature · 0fad6a9d
      Jim Bankoski authored
      This uses the speed feature functionality for code.
      
      Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
      0fad6a9d
    • Jim Bankoski's avatar
      don't copy partitions for key frames or altrefs · df2314cf
      Jim Bankoski authored
      force us to go through slow partitioning for keyframes, altref and
      overlays.
      
      Change-Id: I1a286361bf74083e71973575a7296be46eb98742
      df2314cf
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
      8fb6c581
    • Jim Bankoski's avatar
      disable speed > 1 speed corrections in firstpass · f954490b
      Jim Bankoski authored
      need to rework these
      
      Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
      f954490b
    • Jim Bankoski's avatar
      new debug modes code · 2c6bdbbc
      Jim Bankoski authored
      The new print out includes skips and has prefixed sections so you can
      grep to find things like transforms chosen on each frame.
      
      Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
      2c6bdbbc
    • Jim Bankoski's avatar
      copy partitioning from last fame · f033b44e
      Jim Bankoski authored
      Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
      f033b44e
    • Yaowu Xu's avatar
      Removed a number of unnecessary check on ref_frame · 6e3b34bd
      Yaowu Xu authored
      Since intra block decoding is handled by decode_sb_intra() separately.
      
      Change-Id: I42d757884714084c92fc23ec5d35d4dc946f4b15
      6e3b34bd
  6. 19 Jun, 2013 1 commit