1. 26 Jun, 2013 1 commit
  2. 25 Jun, 2013 4 commits
    • Ronald S. Bultje's avatar
      Add averaging-SAD functions for 8-point comp-inter motion search. · c24d9223
      Ronald S. Bultje authored
      Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
      i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
      the variance of the averaging predictor. This is slightly suboptimal
      because the function is subpixel-position-aware, but it will (at least
      for the SSE2 version) not actually use a bilinear filter for a full-pixel
      position, thus leading to approximately the same performance compared to
      if we implemented an actual average-aware full-pixel variance function.
      That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
      leading to a total gain of 2.7%.
      
      Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
      c24d9223
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      functions.
      
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
      87ee34aa
    • Scott LaVarnway's avatar
      Small mode_info_context cleanup in filter_block_plane · c787f40b
      Scott LaVarnway authored
      Unnecessary updates to xd->mode_info_context.
      
      Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
      c787f40b
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
      a32a086d
  3. 24 Jun, 2013 2 commits
    • Scott LaVarnway's avatar
      Changed size of mb_mode_context to 8 bits · dfa2ecc3
      Scott LaVarnway authored
      This reduced the size of the MODE_INFO array (mip and prev_mip)
      by 425,568 bytes each for 1080p resolutions.
      
      Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
      dfa2ecc3
    • John Koleszar's avatar
      Fix loopfilter of leftmost 4x4 edges in SB · 858475a0
      John Koleszar authored
      For cases where there's no transform set in bit 0 (the left edge of
      the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the
      left edge needs filtering), it was incorrectly being skipped before.
      This situation only happens on the leftmost edge of the image, as
      the edge at column 0 is intentionally skipped since there aren't
      pixels to the left to read.
      
      Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3
      858475a0
  4. 21 Jun, 2013 6 commits
  5. 20 Jun, 2013 4 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
      1e6a32f1
    • Frank Galligan's avatar
      Fix win64 warning. · c259af4f
      Frank Galligan authored
      - size_t vs int.
      
      Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
      c259af4f
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
      8fb6c581
    • Jim Bankoski's avatar
      new debug modes code · 2c6bdbbc
      Jim Bankoski authored
      The new print out includes skips and has prefixed sections so you can
      grep to find things like transforms chosen on each frame.
      
      Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
      2c6bdbbc
  6. 19 Jun, 2013 2 commits
  7. 18 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make fdct32 computation flow within 16bit range · a41a4860
      Jingning Han authored
      This commit makes use of dual fdct32x32 versions for rate-distortion
      optimization loop and encoding process, respectively. The one for
      rd loop requires only 16 bits precision for intermediate steps.
      The original fdct32x32 that allows higher intermediate precision (18
      bits) was retained for the encoding process only.
      
      This allows speed-up for fdct32x32 in the rd loop. No performance
      loss observed.
      
      Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
      a41a4860
  8. 17 Jun, 2013 2 commits
  9. 14 Jun, 2013 3 commits
    • John Koleszar's avatar
      Fix type mismatch in array definition · a9415d2e
      John Koleszar authored
      vp9_default_inter_mode_probs was being accessed with a different type
      than it was defined with. Ensure that its declaration is included
      prior to its definition.
      
      Change-Id: I2f963f513ab2f4e339f8a3c17e3d0f03749eba16
      a9415d2e
    • John Koleszar's avatar
      Remove constant vp9_coef_update_prob table · 0f7a66e9
      John Koleszar authored
      All elements of this table are equal to 252, so replace it with a
      single constant VP9_COEF_UPDATE_PROB.
      
      Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9
      0f7a66e9
    • Jingning Han's avatar
      Enable sse2 version of sad8x4/4x8 · c43af9a8
      Jingning Han authored
      The encoding time for bus at CIF goes from 661s to 625s. This commit
      also enabled unit test of sad8x4/4x8 in sad_test.cc.
      
      Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
      c43af9a8
  10. 13 Jun, 2013 1 commit
    • Jingning Han's avatar
      Enable sse2 version of sad8x4/4x8 · 15f50e7b
      Jingning Han authored
      The encoding time for bus at CIF goes from 661s to 625s. This commit
      also enabled unit test of sad8x4/4x8 in sad_test.cc.
      
      Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
      15f50e7b
  11. 12 Jun, 2013 11 commits
  12. 11 Jun, 2013 2 commits
    • John Koleszar's avatar
      Disallow wide loopfilter on some chroma borders · 9831f205
      John Koleszar authored
      Don't do the 15 tap filter if there aren't 8 pixels below/right of the
      edge.
      
      Change-Id: I62f16437c1d9ba59b6901a5fe71ddb2f472da344
      9831f205
    • Jingning Han's avatar
      Fix partition coding of corner block · 551f37d6
      Jingning Han authored
      This commit fixed the allowable partition types for bottom-right
      corner blocks.
      
      When a block has over half of its pixels as valid content in both
      vertical and horizontal directions, allow all the four partition
      types in the bit-stream. Otherwise, apply partition type constraints.
      
      Change-Id: I2252e2de7125a8bfb1c824bf34299a13c81102e3
      551f37d6
  13. 10 Jun, 2013 1 commit
    • Deb Mukherjee's avatar
      New probs for filters/tx_size and a few others · a43ff153
      Deb Mukherjee authored
      * New probs for subpel filters/tx_count
      * Makes a change to not reset to defaults for the tx_size
      probs if an intermediate frame reverts to using a fixed tx_size.
      * A few updates to the parameters for backward adaptation for mode/mv
      * some cosmetic cleanups
      
      derf300: +0.06%
      
      Change-Id: I22994d659bc31ca7a4fc8820fde24001e64a2920
      a43ff153