1. 10 Mar, 2013 1 commit
    • John Koleszar's avatar
      Optimize vp9_tree_probs_from_distribution · bd84685f
      John Koleszar authored
      The previous implementation visited each node in the tree multiple times
      because it used each symbol's encoding to revisit the branches taken and
      increment its count. Instead, we can traverse the tree depth first and
      calculate the probabilities and branch counts as we walk back up. The
      complexity goes from somewhere between O(nlogn) and O(n^2) (depending on
      how balanced the tree is) to O(n).
      
      Only tested one clip (256kbps, CIF), saw 13% decoding perf improvement.
      
      Note that this optimization should port trivially to VP8 as well. In VP8,
      the decoder doesn't use this function, but it does routinely show up
      on the profile for realtime encoding.
      
      Change-Id: I4f2848e4f41dc9a7694f73f3e75034bce08d1b12
      bd84685f
  2. 05 Mar, 2013 1 commit
    • Ronald S. Bultje's avatar
      Make superblocks independent of macroblock code and data. · 111ca421
      Ronald S. Bultje authored
      Split macroblock and superblock tokenization and detokenization
      functions and coefficient-related data structs so that the bitstream
      layout and related code of superblock coefficients looks less like it's
      a hack to fit macroblocks in superblocks.
      
      In addition, unify chroma transform size selection from luma transform
      size (i.e. always use the same size, as long as it fits the predictor);
      in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
      transform will now use the 16x16 (instead of the 8x8) chroma transform,
      and 64x64 superblocks using the 32x32 luma transform will now use the
      32x32 (instead of the 16x16) chroma transform.
      
      Lastly, add a trellis optimize function for 32x32 transform blocks.
      
      HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
      a few negative points here and there that I might want to analyze
      a little closer.
      
      Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
      111ca421
  3. 04 Mar, 2013 2 commits
    • Yunqing Wang's avatar
      Optimize vp9_short_idct4x4llm function · e8bc9f42
      Yunqing Wang authored
      Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder
      performance.
      
      Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
      e8bc9f42
    • Jingning Han's avatar
      Support 16K sequence coding · 5957b2b5
      Jingning Han authored
      Fixed a couple of variable/function definitions, as well as header
      handling to support 16K sequence coding at high bit-rates.
      
      The width and height are each specified by two bytes in the header.
      Use an extra byte to explicitly indicate the scaling factors in
      both directions, each ranging from 0 to 15.
      
      Tested coding up to 16400x16400 dimension.
      
      Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
      5957b2b5
  4. 02 Mar, 2013 2 commits
  5. 01 Mar, 2013 1 commit
    • Yunqing Wang's avatar
      Add eob<=10 case in idct32x32 · c550bb3b
      Yunqing Wang authored
      Simplified idct32x32 calculation when there are only 10 or less
      non-zero coefficients in 32x32 block. This helps the decoder
      performance.
      
      Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
      c550bb3b
  6. 28 Feb, 2013 5 commits
  7. 27 Feb, 2013 10 commits
    • Dmitry Kovalev's avatar
      Code cleanup. · 347f3a0a
      Dmitry Kovalev authored
      Fixing code style, using array lookup instead of switch statements for
      forward hybrid transforms (in the same way as for their inverses).
      Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places.
      
      Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
      347f3a0a
    • Yunqing Wang's avatar
      Remove unused file · 5ef694cf
      Yunqing Wang authored
      Removed vp9_idctllm_mmx.asm
      
      Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe
      5ef694cf
    • Ronald S. Bultje's avatar
      Move eob from BLOCKD to MACROBLOCKD. · e8c74e2b
      Ronald S. Bultje authored
      Consistent with VP8.
      
      Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
      e8c74e2b
    • John Koleszar's avatar
      Remove unused vp9_copy32xn · 7ad8dbe4
      John Koleszar authored
      This function was part of an optimization used in VP8 that required
      caching two macroblocks. This is unused in VP9, and might not
      survive refactoring to support superblocks, so removing it for now.
      
      Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d
      7ad8dbe4
    • Jan Kratochvil's avatar
      Fix --as=nasm compatibility for new asm code. · 82ed3f9a
      Jan Kratochvil authored
      s/movd/movq/
      
      Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626
      82ed3f9a
    • John Koleszar's avatar
      Use 256-byte aligned filter tables · 6fd7dd1a
      John Koleszar authored
      This avoids duplicating all the filters twice. Includes fixups to the
      convolve routines and associated tests to make this work.
      
      Change-Id: I922f86021594e55072ddb63b42b2313605db6e00
      6fd7dd1a
    • John Koleszar's avatar
      Combined motion compensation with scaled predictors · 77f88e97
      John Koleszar authored
      This patch extends the previous support for using references of a
      different resolution in ZEROMV mode to all inter prediction modes.
      Subpixel based best-mv scoring is disabled when the reference frame
      differs in resolution from the current frame.
      
      Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
      77f88e97
    • John Koleszar's avatar
      Set scale factors consistently for SPLITMV · 472eeaf0
      John Koleszar authored
      This commit updates the 4x4 prediction to consistently use the
      build_2x1_inter_predictor() method. That function is updated to
      calculate the scale offset, rather than relying on the caller
      to calculate it. In the case that the 2x1 prediction can not
      be used, the scale offset is recalculated for each 1x1 block.
      The idea here is that the offsets are calculated before each
      call to vp9_build_scaled_inter_predictor().
      
      Change-Id: I0ac3343dd54e2846efa3c4195fcd328b709ca04d
      472eeaf0
    • John Koleszar's avatar
      Spatial resamping of ZEROMV predictors · eb939f45
      John Koleszar authored
      This patch allows coding frames using references of different
      resolution, in ZEROMV mode. For compound prediction, either
      reference may be scaled.
      
      To test, I use the resize_test and enable WRITE_RECON_BUFFER
      in vp9_onyxd_if.c. It's also useful to apply this patch to
      test/i420_video_source.h:
      
        --- a/test/i420_video_source.h
        +++ b/test/i420_video_source.h
        @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource {
      
           virtual void FillFrame() {
             // Read a frame from input_file.
        +    if (frame_ != 3)
             if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) {
               limit_ = frame_;
             }
      
      This forces the frame that the resolution changes on to be coded
      with no motion, only scaling, and improves the quality of the
      result.
      
      Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
      eb939f45
    • Yunqing Wang's avatar
      Optimize vp9_dc_only_idct_add_c function · 35bc02c6
      Yunqing Wang authored
      Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to
      improve performance, clipped the absolute diff values to [0, 255].
      This allowed us to keep the additions/subtractions in 8 bits.
      Test showed an over 2% decoder performance increase.
      
      Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
      35bc02c6
  8. 26 Feb, 2013 4 commits
    • Dmitry Kovalev's avatar
      Removing redundant 'extern' keyword from function declarations. · 971ff267
      Dmitry Kovalev authored
      Change-Id: I893fa36297b9bd9cff93d082f1736f6860b15c0d
      971ff267
    • John Koleszar's avatar
      Refactor inter recon functions to support scaling · 6a4f708c
      John Koleszar authored
      Ensure that all inter prediction goes through a common code path
      that takes scaling into account. Removes a bunch of duplicate
      1st/2nd predictor code. Also introduces a 16x8 mode for 8x8
      MVs, similar to the 8x4 trick we were doing before. This has an
      unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that
      case for now.
      
      Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
      6a4f708c
    • Yaowu Xu's avatar
      Improve 32x32 forward dct · 66d94ac1
      Yaowu Xu authored
      The commit improves the 32x32 forward dct implementation:
      1. change to use same constants and rounding as other forward dcts
      2. select rounding to specifically minimize the roundtrip error, which
      improved average 19/block to .77/block using 100000 random input.
      
      Test showed a small but consistent gain on all test sets, about .15%
      
      Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
      66d94ac1
    • Dmitry Kovalev's avatar
      Changing pitch value meaning for fht and iht transforms. · 9bf3f751
      Dmitry Kovalev authored
      Pitch now means the number of elements, not the number of bytes.
      
      Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
      9bf3f751
  9. 25 Feb, 2013 3 commits
    • Dmitry Kovalev's avatar
      Code cleanup. · 9770d564
      Dmitry Kovalev authored
      Removing switch statements for inverse hybrid transforms. Making code style
      consistent for all similar transform implementations. Renaming shortpitch
      and short_pitch variables to half_pitch.
      
      Change-Id: I875f7a82aae4e8063a58777bf1cc3f1e67b48582
      9770d564
    • Dmitry Kovalev's avatar
      Code cleanup. · 20b0cb59
      Dmitry Kovalev authored
      Removing redundant parentheses, better code formatting, introducing
      ROUND_POWER_OF_TWO macro to replace repeated expression.
      
      Change-Id: I91aad7a53ed03482428b2419de4bb99fd92c6771
      20b0cb59
    • Jingning Han's avatar
      clean up forward and inverse hybrid transform · 77a3becf
      Jingning Han authored
      Rebased.
      
      Remove the old matrix multiplication transform computation. The 16x16
      ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
      300/0 in vp9/common/vp9_blockd.h.
      
      Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
      77a3becf
  10. 23 Feb, 2013 3 commits
    • Ronald S. Bultje's avatar
      Split coefficient token tables intra vs. inter. · 0c9e2e9a
      Ronald S. Bultje authored
      Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
      0c9e2e9a
    • Paul Wilkins's avatar
      Further changes to coefficient contexts. · c17672a3
      Paul Wilkins authored
      This patch alters the balance of context between the
      coefficient bands (reflecting the position of coefficients
      within a transform blocks) and the energy of the previous
      token (or tokens) within a block.
      
      In this case the number of coefficient bands is reduced
      but more previous token energy bands are supported.
      
      Some initial rebalancing of the default tables has been
      by running multiple derf clips at multiple data rates using
      the ENTOPY_STATS macro. Further balancing needs to be
      done using larger image formatsd especially in regard to
      the bigger transform sizes which are not as well represented
      in encodings of smaller image formats.
      
      Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
      c17672a3
    • James Zern's avatar
      give vp9 variance struct a unique name · e5fb6321
      James Zern authored
      variance_vtable clashed with vp8/common/variance.h
      
      Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f
      e5fb6321
  11. 22 Feb, 2013 2 commits
    • Dmitry Kovalev's avatar
      Code cleanup. · 548b4dd5
      Dmitry Kovalev authored
      Removing redundant 'extern' keywords and parentheses, fixing indentation,
      making variable names lower case, using short expressions x *= c
      instead of x = x * c, minor code simplifications.
      
      Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
      548b4dd5
    • Jingning Han's avatar
      Forward butterfly hybrid transform · babbd5d1
      Jingning Han authored
      This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT
      hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)).
      The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N).
      
      Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
      babbd5d1
  12. 21 Feb, 2013 2 commits
    • Ronald S. Bultje's avatar
      Remove "eobs" array in MACROBLOCKD. · 35524e22
      Ronald S. Bultje authored
      The information is a duplicate of "eob" in BLOCKD.
      
      Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3
      35524e22
    • Deb Mukherjee's avatar
      Refactoring of switchable filter search for speed · 28b1db92
      Deb Mukherjee authored
      Refactors the switchable filter search in the rd loop to
      improve encode speed.
      
      Uses a piecewise approximation to a closed form expression to estimate
      rd cost for a Laplacian source with a given variance and quantization
      step-size.
      
      About 40% encode time reduction is achieved.
      
      Results (on a feb 12 baseline) show a slight drop:
      
      derf: -0.019%
      yt: +0.010%
      std-hd: -0.162%
      hd: -0.050%
      
      Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0
      28b1db92
  13. 20 Feb, 2013 3 commits
    • Dmitry Kovalev's avatar
      Code cleanup. · eb6aee50
      Dmitry Kovalev authored
      Change-Id: I7c6e3bebd94856b24dbe2aded7f9e04ef8bb8c08
      eb6aee50
    • Yaowu Xu's avatar
      Merge lossless experiment · d262e26c
      Yaowu Xu authored
      Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
      d262e26c
    • Tero Rintaluoma's avatar
      Avoid division in intra prediction · 56e6c66b
      Tero Rintaluoma authored
      - Using multiplication and shifting instead of division in
        intra prediction.
      - Maximum absolute difference is 1 for division statements
        in d45, d27, d63 prediction modes. However, errors can
        cumulate for large block sizes when using already predicted
        values.
      - Maximum number of non-matching result values in loops using
        division are:
        4x4        0/16
        8x8        0/64
        16x16     10/256
        32x32     13/1024
        64x64    122/4096
      
        Overall PSNR
        derf:     0.005
        yt:      -0.022
        std-hd:   0.021
        hd:      -0.006
      
      Change-Id: I3979a02eb6351636442c1af1e23d6c4e6ec1d01d
      56e6c66b
  14. 19 Feb, 2013 1 commit
    • Jingning Han's avatar
      16x16 butterfly inverse ADST/DCT hybrid transform · cd907b16
      Jingning Han authored
      rebased.
      
      This patch includes 16x16 butterfly inverse ADST/DCT hybrid
      transform. It uses the variant ADST of kernel
          sin((2k+1)*(2n+1)/4N),
      which allows a butterfly implementation.
      
      The coding gains as compared to DCT 16x16 are about 0.1% for
      both derf and std-hd. It is noteworthy that for std-hd sets
      many sequences gains about 0.5%, some 0.2%. There are also few
      points that provides -1% to -3% performance. Hence the average
      goes to about 0.1%.
      
      Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
      cd907b16