1. 25 Feb, 2013 3 commits
    • Dmitry Kovalev's avatar
      Code cleanup. · 9770d564
      Dmitry Kovalev authored
      Removing switch statements for inverse hybrid transforms. Making code style
      consistent for all similar transform implementations. Renaming shortpitch
      and short_pitch variables to half_pitch.
      
      Change-Id: I875f7a82aae4e8063a58777bf1cc3f1e67b48582
      9770d564
    • Dmitry Kovalev's avatar
      Code cleanup. · 20b0cb59
      Dmitry Kovalev authored
      Removing redundant parentheses, better code formatting, introducing
      ROUND_POWER_OF_TWO macro to replace repeated expression.
      
      Change-Id: I91aad7a53ed03482428b2419de4bb99fd92c6771
      20b0cb59
    • Jingning Han's avatar
      clean up forward and inverse hybrid transform · 77a3becf
      Jingning Han authored
      Rebased.
      
      Remove the old matrix multiplication transform computation. The 16x16
      ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
      300/0 in vp9/common/vp9_blockd.h.
      
      Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
      77a3becf
  2. 22 Feb, 2013 1 commit
    • Dmitry Kovalev's avatar
      Code cleanup. · 548b4dd5
      Dmitry Kovalev authored
      Removing redundant 'extern' keywords and parentheses, fixing indentation,
      making variable names lower case, using short expressions x *= c
      instead of x = x * c, minor code simplifications.
      
      Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
      548b4dd5
  3. 20 Feb, 2013 1 commit
  4. 19 Feb, 2013 1 commit
    • Jingning Han's avatar
      16x16 butterfly inverse ADST/DCT hybrid transform · cd907b16
      Jingning Han authored
      rebased.
      
      This patch includes 16x16 butterfly inverse ADST/DCT hybrid
      transform. It uses the variant ADST of kernel
          sin((2k+1)*(2n+1)/4N),
      which allows a butterfly implementation.
      
      The coding gains as compared to DCT 16x16 are about 0.1% for
      both derf and std-hd. It is noteworthy that for std-hd sets
      many sequences gains about 0.5%, some 0.2%. There are also few
      points that provides -1% to -3% performance. Hence the average
      goes to about 0.1%.
      
      Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
      cd907b16
  5. 15 Feb, 2013 1 commit
  6. 13 Feb, 2013 2 commits
    • Yaowu Xu's avatar
      enable bitstream lossless support · 17db5d00
      Yaowu Xu authored
      1. Added a bit in frame header to  to indicate if a frame is encoded
      in lossless mode, so decoder does not make the decision based on Q0
      2. Minor changes to make sure that lossy coding works same as when
      the lossless experiment is not enabled.
      3. Renamed function pointers for transforms to be consistent, using
      prefix fwd_txm and inv_txm for forward and inverse respectively
      
      To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
      with vpxenc.
      
      Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
      17db5d00
    • Paul Wilkins's avatar
      Removal of Hybrid DWT/DCT experiment. · 649be94c
      Paul Wilkins authored
      Removal of experiment to simplify code base for other
      changes.
      
      Change-Id: If0a33952504558511926ad212bc311fc2bffb19a
      649be94c
  7. 11 Feb, 2013 1 commit
    • Jingning Han's avatar
      butterfly inverse 4x4 ADST · 57e995ff
      Jingning Han authored
      fixed format issues.
      
      Implement the inverse 4x4 ADST using 9 multiplications. For this
      particular dimension, the original ADST transform can be
      factorized into simpler operations, hence is retained.
      
      Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
      57e995ff
  8. 07 Feb, 2013 3 commits
    • Yaowu Xu's avatar
      move dct/idct constants to a header file · e6ad9ab0
      Yaowu Xu authored
      also removed some un-unsed functions.
      
      Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5
      e6ad9ab0
    • Jingning Han's avatar
      Butterfly ADST based hybrid transform · d15e1da4
      Jingning Han authored
      Refactor the 8x8 inverse hybrid transform. It is now consistent
      with the new inverse DCT. Overall performance loss (due to the
      use of this variant ADST, and the rounding errors in the butterfly
      implementation) for std-hd is -0.02.
      
      Fixed BUILD warning.
      
      Devise a variant of the original ADST, which allows butterfly
      computation structure. This new transform has kernel of the
      form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
      using floating-point multiplications was reported in Z. Wang,
      "Fast algorithms for the discrete W transform and for the discrete
      Fourier transform", IEEE Trans. on ASSP, 1984.
      
      This patch includes the butterfly implementation of the inverse
      ADST/DCT hybrid transform of dimension 8x8.
      
      Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
      d15e1da4
    • Ronald S. Bultje's avatar
      Use configure checks for various inline keywords. · aac73df1
      Ronald S. Bultje authored
      Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
      aac73df1
  9. 05 Feb, 2013 2 commits
    • Yaowu Xu's avatar
      rewrite 4x4 idct and fdct · fa36981e
      Yaowu Xu authored
      This commit changes the 4x4 iDCT to use same algorithm & constants as
      other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.
      
      Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
      fa36981e
    • Scott LaVarnway's avatar
      Added vp9_short_idct1_32x32_c · 5780c4cb
      Scott LaVarnway authored
      and called this function in vp9_dequant_idct_add_32x32_c when
      eob == 1.  For the test clip used, the decoder performance improved
      by 21+%.  Based on Yaowu's 16 point idct work.
      
      Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
      5780c4cb
  10. 04 Feb, 2013 2 commits
    • Yaowu Xu's avatar
      re-write 8 point idct · 1eb79dc1
      Yaowu Xu authored
      to be consistent with idct16 and idct32.
      
      Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204
      1eb79dc1
    • Yaowu Xu's avatar
      a couple of minor fixes · ccaaeb4b
      Yaowu Xu authored
      fixed a function prototypes to prevent compiler warnings;
      removed a function not in use;
      un-capitialize "Refstride" to ref_stride
      
      Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba
      ccaaeb4b
  11. 01 Feb, 2013 1 commit
    • Yaowu Xu's avatar
      Changes 16 point idct · 91e0e801
      Yaowu Xu authored
      This commit changes the inverse 16 point dct to use the same algorithm
      as the one for 32 point idct. In fact, now 16 point dct uses the exact
      version of the souce code for even portion of the 32 point idct.
      
      Tests showed current implementation has significant better accuracy
      than the previous version. With this implementation and the minor bug
      fix on forward 16 point dct, encoding tests showed about 0.2% better
      compression of CIF set, test results on std-hd setting pending.
      
      Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
      91e0e801
  12. 31 Jan, 2013 1 commit
    • Yaowu Xu's avatar
      A fix point implementation of 32x32 idct · 5149d7f7
      Yaowu Xu authored
      This commit changes the 32x32 idct to use integer only. The algorithm
      was taken directly from "A Fast Computational Algorithm for the
      Discrete Cosine Tranform" by W. Chen, et al., which was published in
      IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal
      flow graph in the original paper is for a 32 point forward dct, the
      current implementation of inverse DCT was done by follow the graph in
      reversed direction.
      
      With this implementation, the 32 point inverse dct contains a 16 point
      inverse dct in its even portion, similarly the 16 point idct further
      contains 8 point and 4 point inverse dcts.
      
      As of patch 4, encoding tests showed there is no compression loss when
      compared against the floating point baseline. Numbers even showed very
      small postives. (cif: .01%, std-hd: .05%).
      
      Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378
      5149d7f7
  13. 13 Jan, 2013 1 commit
    • Deb Mukherjee's avatar
      Further enhancements/fixes on dct/dwt hybrid txfm · 516db21c
      Deb Mukherjee authored
      Fixes some scaling issues. Adds an option to only compute the
      dct on the low-low subband for 32x32 and 64x64 blocks using
      only a single 16x16 dct after 1 and 2 wavelet decomposition
      levels respectively. Also adds an option to use a 8x8 dct
      as building block.
      
      Currenlty with the 2/6 filter and with a single 16x16 dct on
      the low low band, the reuslts compared to full 32x32 dct is
      as follows:
      derf: -0.15%
      yt: -0.29%
      std-hd: -0.18%
      hd: -0.6%
      These are my current recommended settings, since the 2/6 filter
      is very simple.
      
      Results with 8x8 dct are about 0.3% worse.
      
      Change-Id: I00100cdc96e32deced591985785ef0d06f325e44
      516db21c
  14. 10 Jan, 2013 1 commit
  15. 08 Jan, 2013 1 commit
    • Deb Mukherjee's avatar
      Adds 64x64 hybrid dct/dwt transform · 4b7304ee
      Deb Mukherjee authored
      This is to add to the 64x64 transform experiment as an alternative to
      a 64x64 DCT.
      Two levels of wavelet decomposition is used on a 64x64 block, followed
      by 16x16 DCT on the four lowest subbands. The highest three subbands
      are left untransformed after the first level DWT.
      
      Change-Id: I3d48d5800468d655191933894df6b46e15adca56
      4b7304ee
  16. 27 Dec, 2012 1 commit
    • Yunqing Wang's avatar
      Switch the order of calculating 2-D inverse transform · cc80247f
      Yunqing Wang authored
      The 2-D inverse transform X = M1*Z*Transposed_M2 was calculated
      in 2 steps from left to right:
      1. Vertical transform: Y = M1*Z
      2. Horizontal transform: X= Y*Transposed_M2
      In SIMD, a transpose is needed in vertical transform.
      
      Here, switched the calculation order to do it from right to left.
      In this way, we could eliminate that transpose by writing the
      intermediate results out to their transposed positions.
      
      Change-Id: I34dfe5eb01292f6e363712420d99475e2e81e12c
      cc80247f
  17. 26 Dec, 2012 1 commit
  18. 18 Dec, 2012 1 commit
  19. 14 Dec, 2012 1 commit
  20. 13 Dec, 2012 1 commit
    • Deb Mukherjee's avatar
      Further improvements on the hybrid dwt/dct expt · 210dc5b2
      Deb Mukherjee authored
      Modifies the scanning pattern and uses a floating point 16x16
      dct implementation for now to handle scaling better.
      Also experiments are in progress with 2/6 and 9/7 wavelets.
      
      Results have improved to within ~0.25% of 32x32 dct for std-hd
      and about 0.03% for derf. This difference can probably be bridged by
      re-optimizing the entropy stats for these transforms. Currently
      the stats used are common between 32x32 dct and dwt/dct.
      
      Experiments are in progress with various scan pattern - wavelet
      combinations.
      
      Ideally the subbands should be tokenized separately, and an
      experiment will be condcuted next on that.
      
      Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110
      210dc5b2
  21. 12 Dec, 2012 2 commits
    • Scott LaVarnway's avatar
      Improved vp9_ihtllm_c · b575394e
      Scott LaVarnway authored
      As suggested by Yaowu, we can use eob to reduce the complexity
      of the vp9_ihtllm_c function.  For the 1080p test clip used, the decoder
      performance improved by 17%.
      
      Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
      b575394e
    • Ronald S. Bultje's avatar
      Consistently use get_prob(), clip_prob() and newly added clip_pixel(). · 4d0ec7aa
      Ronald S. Bultje authored
      Add a function clip_pixel() to clip a pixel value to the [0,255] range
      of allowed values, and use this where-ever appropriate (e.g. prediction,
      reconstruction). Likewise, consistently use the recently added function
      clip_prob(), which calculates a binary probability in the [1,255] range.
      If possible, try to use get_prob() or its sister get_binary_prob() to
      calculate binary probabilities, for consistency.
      
      Since in some places, this means that binary probability calculations
      are changed (we use {255,256}*count0/(total) in a range of places,
      and all of these are now changed to use 256*count0+(total>>1)/total),
      this changes the encoding result, so this patch warrants some extensive
      testing.
      
      Change-Id: Ibeeff8d886496839b8e0c0ace9ccc552351f7628
      4d0ec7aa
  22. 07 Dec, 2012 1 commit
    • Ronald S. Bultje's avatar
      32x32 transform for superblocks. · c456b35f
      Ronald S. Bultje authored
      This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
      code all over the place to wrap that in the bitstream/encoder/decoder/RD.
      
      Some implementation notes (these probably need careful review):
      - token range is extended by 1 bit, since the value range out of this
        transform is [-16384,16383].
      - the coefficients coming out of the FDCT are manually scaled back by
        1 bit, or else they won't fit in int16_t (they are 17 bits). Because
        of this, the RD error scoring does not right-shift the MSE score by
        two (unlike for 4x4/8x8/16x16).
      - to compensate for this loss in precision, the quantizer is halved
        also. This is currently a little hacky.
      - FDCT and IDCT is double-only right now. Needs a fixed-point impl.
      - There are no default probabilities for the 32x32 transform yet; I'm
        simply using the 16x16 luma ones. A future commit will add newly
        generated probabilities for all transforms.
      - No ADST version. I don't think we'll add one for this level; if an
        ADST is desired, transform-size selection can scale back to 16x16
        or lower, and use an ADST at that level.
      
      Additional notes specific to Debargha's DWT/DCT hybrid:
      - coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
        block than for the rest (DWT pixel differences) of the block. Therefore,
        RD error scoring isn't easily scalable between coefficient and pixel
        domain. Thus, unfortunately, we need to compute the RD distortion in
        the pixel domain until we figure out how to scale these appropriately.
      
      Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
      c456b35f
  23. 29 Nov, 2012 1 commit
  24. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
      fcccbcbb
  25. 25 Nov, 2012 1 commit
  26. 13 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Optimize 8x8 dequant and idct · e60478d4
      Yunqing Wang authored
      Similar to 16x16 dequant and idct, based on the value of eobs, the
      8x8 dequant and idct calculation was simplified to improve decorder
      performance.
      
      Combined vp9_dequant_idct_add_8x8 and vp9_dequant_dc_idct_add_8x8
      to eliminate duplicate code.
      
      Change-Id: Ia58e50ab27f7012b7379c495837c9c0b5ba9cf7f
      e60478d4
  27. 08 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Optimize 16x16 dequant and idct · 6c17c9fa
      Yunqing Wang authored
      As suggested by Yaowu, simplified 16x16 dequant and idct. In decoder,
      after detoken step, we know the number of non-zero dct coefficients
      (eobs) in a macroblock. Idct calculation can be skipped or simplified
      based on eobs, which improves the decoder performance.
      
      Change-Id: I9ffa1cb134bcb5a7d64fcf90c81871a96d1b4018
      6c17c9fa
  28. 06 Nov, 2012 1 commit
  29. 01 Nov, 2012 3 commits
  30. 31 Oct, 2012 1 commit