1. 26 Jun, 2013 1 commit
  2. 25 Jun, 2013 4 commits
    • Ronald S. Bultje's avatar
      Add averaging-SAD functions for 8-point comp-inter motion search. · c24d9223
      Ronald S. Bultje authored
      Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
      i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
      the variance of the averaging predictor. This is slightly suboptimal
      because the function is subpixel-position-aware, but it will (at least
      for the SSE2 version) not actually use a bilinear filter for a full-pixel
      position, thus leading to approximately the same performance compared to
      if we implemented an actual average-aware full-pixel variance function.
      That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
      leading to a total gain of 2.7%.
      Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
    • Dmitry Kovalev's avatar
      Removing unused code. · 87ee34aa
      Dmitry Kovalev authored
      Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
      Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
    • Scott LaVarnway's avatar
      Small mode_info_context cleanup in filter_block_plane · c787f40b
      Scott LaVarnway authored
      Unnecessary updates to xd->mode_info_context.
      Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
    • Jingning Han's avatar
      Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d
      Jingning Han authored
      This commit makes use of the butterfly structure to enable the sse2
      version implementation of 8x8 ADST/DCT hybrid transform coding.
      The runtime of hybrid transform module goes down from 1170 cycles
      to 245 cycles. Overall speed-up around 1.5%.
      Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
  3. 24 Jun, 2013 2 commits
    • Scott LaVarnway's avatar
      Changed size of mb_mode_context to 8 bits · dfa2ecc3
      Scott LaVarnway authored
      This reduced the size of the MODE_INFO array (mip and prev_mip)
      by 425,568 bytes each for 1080p resolutions.
      Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
    • John Koleszar's avatar
      Fix loopfilter of leftmost 4x4 edges in SB · 858475a0
      John Koleszar authored
      For cases where there's no transform set in bit 0 (the left edge of
      the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the
      left edge needs filtering), it was incorrectly being skipped before.
      This situation only happens on the leftmost edge of the image, as
      the edge at column 0 is intentionally skipped since there aren't
      pixels to the left to read.
      Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3
  4. 21 Jun, 2013 6 commits
  5. 20 Jun, 2013 4 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
    • Frank Galligan's avatar
      Fix win64 warning. · c259af4f
      Frank Galligan authored
      - size_t vs int.
      Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
    • Jim Bankoski's avatar
      new debug modes code · 2c6bdbbc
      Jim Bankoski authored
      The new print out includes skips and has prefixed sections so you can
      grep to find things like transforms chosen on each frame.
      Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
  6. 19 Jun, 2013 2 commits
  7. 18 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make fdct32 computation flow within 16bit range · a41a4860
      Jingning Han authored
      This commit makes use of dual fdct32x32 versions for rate-distortion
      optimization loop and encoding process, respectively. The one for
      rd loop requires only 16 bits precision for intermediate steps.
      The original fdct32x32 that allows higher intermediate precision (18
      bits) was retained for the encoding process only.
      This allows speed-up for fdct32x32 in the rd loop. No performance
      loss observed.
      Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
  8. 17 Jun, 2013 2 commits
  9. 14 Jun, 2013 3 commits
    • John Koleszar's avatar
      Fix type mismatch in array definition · a9415d2e
      John Koleszar authored
      vp9_default_inter_mode_probs was being accessed with a different type
      than it was defined with. Ensure that its declaration is included
      prior to its definition.
      Change-Id: I2f963f513ab2f4e339f8a3c17e3d0f03749eba16
    • John Koleszar's avatar
      Remove constant vp9_coef_update_prob table · 0f7a66e9
      John Koleszar authored
      All elements of this table are equal to 252, so replace it with a
      single constant VP9_COEF_UPDATE_PROB.
      Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9
    • Jingning Han's avatar
      Enable sse2 version of sad8x4/4x8 · c43af9a8
      Jingning Han authored
      The encoding time for bus at CIF goes from 661s to 625s. This commit
      also enabled unit test of sad8x4/4x8 in sad_test.cc.
      Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
  10. 13 Jun, 2013 1 commit
    • Jingning Han's avatar
      Enable sse2 version of sad8x4/4x8 · 15f50e7b
      Jingning Han authored
      The encoding time for bus at CIF goes from 661s to 625s. This commit
      also enabled unit test of sad8x4/4x8 in sad_test.cc.
      Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
  11. 12 Jun, 2013 11 commits
  12. 11 Jun, 2013 2 commits
    • John Koleszar's avatar
      Disallow wide loopfilter on some chroma borders · 9831f205
      John Koleszar authored
      Don't do the 15 tap filter if there aren't 8 pixels below/right of the
      Change-Id: I62f16437c1d9ba59b6901a5fe71ddb2f472da344
    • Jingning Han's avatar
      Fix partition coding of corner block · 551f37d6
      Jingning Han authored
      This commit fixed the allowable partition types for bottom-right
      corner blocks.
      When a block has over half of its pixels as valid content in both
      vertical and horizontal directions, allow all the four partition
      types in the bit-stream. Otherwise, apply partition type constraints.
      Change-Id: I2252e2de7125a8bfb1c824bf34299a13c81102e3
  13. 10 Jun, 2013 1 commit
    • Deb Mukherjee's avatar
      New probs for filters/tx_size and a few others · a43ff153
      Deb Mukherjee authored
      * New probs for subpel filters/tx_count
      * Makes a change to not reset to defaults for the tx_size
      probs if an intermediate frame reverts to using a fixed tx_size.
      * A few updates to the parameters for backward adaptation for mode/mv
      * some cosmetic cleanups
      derf300: +0.06%
      Change-Id: I22994d659bc31ca7a4fc8820fde24001e64a2920