1. 07 Feb, 2013 1 commit
    • Jingning Han's avatar
      Butterfly ADST based hybrid transform · d15e1da4
      Jingning Han authored
      Refactor the 8x8 inverse hybrid transform. It is now consistent
      with the new inverse DCT. Overall performance loss (due to the
      use of this variant ADST, and the rounding errors in the butterfly
      implementation) for std-hd is -0.02.
      
      Fixed BUILD warning.
      
      Devise a variant of the original ADST, which allows butterfly
      computation structure. This new transform has kernel of the
      form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
      using floating-point multiplications was reported in Z. Wang,
      "Fast algorithms for the discrete W transform and for the discrete
      Fourier transform", IEEE Trans. on ASSP, 1984.
      
      This patch includes the butterfly implementation of the inverse
      ADST/DCT hybrid transform of dimension 8x8.
      
      Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
      d15e1da4
  2. 05 Feb, 2013 1 commit
    • Scott LaVarnway's avatar
      Added vp9_short_idct1_32x32_c · 5780c4cb
      Scott LaVarnway authored
      and called this function in vp9_dequant_idct_add_32x32_c when
      eob == 1.  For the test clip used, the decoder performance improved
      by 21+%.  Based on Yaowu's 16 point idct work.
      
      Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
      5780c4cb
  3. 04 Feb, 2013 1 commit
    • Yaowu Xu's avatar
      re-write 8 point idct · 1eb79dc1
      Yaowu Xu authored
      to be consistent with idct16 and idct32.
      
      Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204
      1eb79dc1
  4. 01 Feb, 2013 1 commit
    • Yaowu Xu's avatar
      Changes 16 point idct · 91e0e801
      Yaowu Xu authored
      This commit changes the inverse 16 point dct to use the same algorithm
      as the one for 32 point idct. In fact, now 16 point dct uses the exact
      version of the souce code for even portion of the 32 point idct.
      
      Tests showed current implementation has significant better accuracy
      than the previous version. With this implementation and the minor bug
      fix on forward 16 point dct, encoding tests showed about 0.2% better
      compression of CIF set, test results on std-hd setting pending.
      
      Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
      91e0e801
  5. 25 Jan, 2013 1 commit
    • Scott LaVarnway's avatar
      Added eob == 0 check to vp9_dequant_idct_add_32x32_c · 9d4c2653
      Scott LaVarnway authored
      Added a quick eob == 0 check.  Once the integer version of the dct32x32 is
      complete, we can check for other eob cases.
      
      For the 1080p clip used, the decoder performance improved by 4%.
      
      Change-Id: I9390b6ed3c8be0c0c0a0c44c578d9a031d6e026e
      9d4c2653
  6. 10 Jan, 2013 1 commit
  7. 08 Jan, 2013 1 commit
  8. 18 Dec, 2012 1 commit
  9. 12 Dec, 2012 2 commits
    • Scott LaVarnway's avatar
      Improved vp9_ihtllm_c · b575394e
      Scott LaVarnway authored
      As suggested by Yaowu, we can use eob to reduce the complexity
      of the vp9_ihtllm_c function.  For the 1080p test clip used, the decoder
      performance improved by 17%.
      
      Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
      b575394e
    • Ronald S. Bultje's avatar
      Consistently use get_prob(), clip_prob() and newly added clip_pixel(). · 4d0ec7aa
      Ronald S. Bultje authored
      Add a function clip_pixel() to clip a pixel value to the [0,255] range
      of allowed values, and use this where-ever appropriate (e.g. prediction,
      reconstruction). Likewise, consistently use the recently added function
      clip_prob(), which calculates a binary probability in the [1,255] range.
      If possible, try to use get_prob() or its sister get_binary_prob() to
      calculate binary probabilities, for consistency.
      
      Since in some places, this means that binary probability calculations
      are changed (we use {255,256}*count0/(total) in a range of places,
      and all of these are now changed to use 256*count0+(total>>1)/total),
      this changes the encoding result, so this patch warrants some extensive
      testing.
      
      Change-Id: Ibeeff8d886496839b8e0c0ace9ccc552351f7628
      4d0ec7aa
  10. 07 Dec, 2012 1 commit
    • Ronald S. Bultje's avatar
      32x32 transform for superblocks. · c456b35f
      Ronald S. Bultje authored
      This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
      code all over the place to wrap that in the bitstream/encoder/decoder/RD.
      
      Some implementation notes (these probably need careful review):
      - token range is extended by 1 bit, since the value range out of this
        transform is [-16384,16383].
      - the coefficients coming out of the FDCT are manually scaled back by
        1 bit, or else they won't fit in int16_t (they are 17 bits). Because
        of this, the RD error scoring does not right-shift the MSE score by
        two (unlike for 4x4/8x8/16x16).
      - to compensate for this loss in precision, the quantizer is halved
        also. This is currently a little hacky.
      - FDCT and IDCT is double-only right now. Needs a fixed-point impl.
      - There are no default probabilities for the 32x32 transform yet; I'm
        simply using the 16x16 luma ones. A future commit will add newly
        generated probabilities for all transforms.
      - No ADST version. I don't think we'll add one for this level; if an
        ADST is desired, transform-size selection can scale back to 16x16
        or lower, and use an ADST at that level.
      
      Additional notes specific to Debargha's DWT/DCT hybrid:
      - coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
        block than for the rest (DWT pixel differences) of the block. Therefore,
        RD error scoring isn't easily scalable between coefficient and pixel
        domain. Thus, unfortunately, we need to compute the RD distortion in
        the pixel domain until we figure out how to scale these appropriately.
      
      Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
      c456b35f
  11. 29 Nov, 2012 2 commits
    • Jim Bankoski's avatar
      ihtllm moves to rtcd · 030e268a
      Jim Bankoski authored
      clears up some warnings
      
      Change-Id: I9899637497c6ad7519f098e055ab98580ae6d688
      030e268a
    • Deb Mukherjee's avatar
      Fixing 8x8/4x4 ADST for intra modes with tx select · 0742b1e4
      Deb Mukherjee authored
      This patch allows use of 8x8 and 4x4 ADST correctly for Intra
      16x16 modes and Intra 8x8 modes when the block size selected
      is smaller than the prediction mode. Also includes some cleanups
      and refactoring.
      
      Rebase.
      
      Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a
      0742b1e4
  12. 28 Nov, 2012 1 commit
  13. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
      fcccbcbb
  14. 25 Nov, 2012 1 commit
  15. 16 Nov, 2012 1 commit
  16. 15 Nov, 2012 1 commit
  17. 13 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Optimize 8x8 dequant and idct · e60478d4
      Yunqing Wang authored
      Similar to 16x16 dequant and idct, based on the value of eobs, the
      8x8 dequant and idct calculation was simplified to improve decorder
      performance.
      
      Combined vp9_dequant_idct_add_8x8 and vp9_dequant_dc_idct_add_8x8
      to eliminate duplicate code.
      
      Change-Id: Ia58e50ab27f7012b7379c495837c9c0b5ba9cf7f
      e60478d4
  18. 08 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Optimize 16x16 dequant and idct · 6c17c9fa
      Yunqing Wang authored
      As suggested by Yaowu, simplified 16x16 dequant and idct. In decoder,
      after detoken step, we know the number of non-zero dct coefficients
      (eobs) in a macroblock. Idct calculation can be skipped or simplified
      based on eobs, which improves the decoder performance.
      
      Change-Id: I9ffa1cb134bcb5a7d64fcf90c81871a96d1b4018
      6c17c9fa
  19. 02 Nov, 2012 1 commit
  20. 01 Nov, 2012 1 commit
  21. 31 Oct, 2012 3 commits
  22. 22 Oct, 2012 1 commit
  23. 11 Oct, 2012 1 commit
  24. 30 Aug, 2012 1 commit
    • Jingning Han's avatar
      hybrid transform of 16x16 dimension · de6dfa6b
      Jingning Han authored
      Enable ADST/DCT of dimension 16x16 for I16X16 modes. This change provides
      benefits mostly for hd sequences.
      
      Set up the framework for selectable transform dimension.
      
      Also allowing quantization parameter threshold to control the use
      of hybrid transform (This is currently disabled by setting threshold
      always above the quantization parameter. Adaptive thresholding can
      be built upon this, which will further improve the coding performance.)
      
      The coding performance gains (with respect to the codec that has all
      other configuration settings turned on) are
      
      derf:   0.013
      yt:     0.086
      hd:     0.198
      std-hd: 0.501
      
      Change-Id: Ibb4263a61fc74e0b3c345f54d73e8c73552bf926
      de6dfa6b
  25. 07 Aug, 2012 1 commit
    • Jingning Han's avatar
      Refactoring hybrid transform coding · 66f440f1
      Jingning Han authored
      The forward and inverse hybrid transforms are now performed using
      single function modules, where the dimension is sent as argument.
      
      Added an inline function clip8b to clip the reconstruction pixels
      into range of 0-255.
      
      Change-Id: Id7d870b3e1aefc092721c80c0af6f641eb5f3747
      66f440f1
  26. 03 Aug, 2012 2 commits
    • Jingning Han's avatar
      Replacing the 8x8 DCT with 8x8 ADST/DCT for I8x8 · fcbff9ee
      Jingning Han authored
      Fixed the code review comments.
      
      Under the htrans8x8 experiment the 8X8 DCT in the
      I8X8 mode is replaced with a combination of 8X8 ADST and
      DCT.
      
      Overall coding gains with the htrans8x8 experiment are:
      derf:   0.486
      std-hd: 1.040
      hd:     1.063
      yt:     0.506
      
      Note that part of the gain comes from bigger transforms
      (8x8 instead of 4x4) and part comes from replacing the DCT
      wth the ADST.
      
      Change-Id: I92ca6bbfce11b4165d612b81d9adfad4d010c775
      fcbff9ee
    • Daniel Kang's avatar
      16x16 DCT blocks. · fed8a183
      Daniel Kang authored
      Set on all 16x16 intra/inter modes
      
      Features:
      - Butterfly fDCT/iDCT
      - Loop filter does not filter internal edges with 16x16
      - Optimize coefficient function
      - Update coefficient probability function
      - RD
      - Entropy stats
      - 16x16 is a config option
      
      Have not tested with experiments.
      
      hd:     2.60%
      std-hd: 2.43%
      yt:     1.32%
      derf:   0.60%
      
      Change-Id: I96fb090517c30c5da84bad4fae602c3ec0c58b1c
      fed8a183
  27. 19 Jul, 2012 1 commit
    • Jingning Han's avatar
      Adds hybrid transform · 9824230f
      Jingning Han authored
      Adds ADST/DCT hybrid transform coding for Intra4x4 mode.
      The ADST is applied to directions in which the boundary
      pixels are used for prediction, while DCT applied to
      directions without corresponding boundary prediction.
      
      Adds enum TX_TYPE in b_mode_infor to indicate the transform
      type used.
      
      Make coding style consistent with google style.
      Fixed the commented issues.
      
      Experimental results in terms of bit-rate reduction:
      derf:   0.731%
      yt:     0.982%
      std-hd: 0.459%
      hd:     0.725%
      
      Will be looking at 8x8 transforms next.
      
      Change-Id: I46dbd7b80dbb3e8856e9c34fbc58cb3764a12fcf
      9824230f
  28. 17 Jul, 2012 1 commit
  29. 29 Jun, 2012 1 commit
    • Hui Su's avatar
      Add lossless compression mode. · e44ee38a
      Hui Su authored
      This commit adds lossless compression capability to the experimental
      branch. The lossless experiment can be enabled using --enable-lossless
      in configure. When the experiment is enabled, the encoder will use
      lossless compression mode by command line option --lossless, and the
      decoder automatically recognizes a losslessly encoded clip and decodes
      accordingly.
      
      To achieve the lossless coding, this commit has changed the following:
          1. To encode at lossless mode, encoder forces the use of unit
      quantizer, i.e, Q 0, where effective quantization is 1. Encoder also
      disables the usage of 8x8 transform and allows only 4x4 transform;
          2. At Q 0, the first order 4x4  DCT/IDCT have been switched over
      to a pair of forward and inverse Walsh-Hadamard Transform
      (http://goo.gl/EIsfy),  with proper scaling applied to match the range
      of the original 4x4 DCT/IDCT pair;
          3. At Q 0, the second order remains to use the previous
      walsh-hadamard transform pair. However, to maintain the reversibility
      in second order transform at Q 0, scaling down is applied to first
      order DC coefficients prior to forward transform, and scaling up is
      applied to the second order output prior to quantization. Symmetric
      upscaling and downscaling are added around inverse second order
      transform;
          4. At lossless mode, encoder also disables a number of minor
      features to ensure no loss is introduced, these features includes:
              a. Trellis quantization optimization
              b. Loop filtering
              c. Aggressive zero-binning, rounding and zero-bin boosting
              d. Mode based zero-bin boosting
      
      Lossless coding test was performed on all clips within the derf set,
      to verify that the commit has achieved lossless compression for all
      clips. The average compression ratio is around 2.57 to 1.
      (http://goo.gl/dEShs)
      
      Change-Id: Ia3aba7dd09df40dd590f93b9aba134defbc64e34
      e44ee38a
  30. 15 Mar, 2012 1 commit
    • Yaowu Xu's avatar
      WebM Experimental Codec Branch Snapshot · 6035da54
      Yaowu Xu authored
      This is a code snapshot of experimental work currently ongoing for a
      next-generation codec.
      
      The codebase has been cut down considerably from the libvpx baseline.
      For example, we are currently only supporting VBR 2-pass rate control
      and have removed most of the code relating to coding speed, threading,
      error resilience, partitions and various other features.  This is in
      part to make the codebase easier to work on and experiment with, but
      also because we want to have an open discussion about how the bitstream
      will be structured and partitioned and not have that conversation
      constrained by past work.
      
      Our basic working pattern has been to initially encapsulate experiments
      using configure options linked to #IF CONFIG_XXX statements in the
      code. Once experiments have matured and we are reasonably happy that
      they give benefit and can be merged without breaking other experiments,
      we remove the conditional compile statements and merge them in.
      
      Current changes include:
      * Temporal coding experiment for segments (though still only 4 max, it
        will likely be increased).
      * Segment feature experiment - to allow various bits of information to
        be coded at the segment level. Features tested so far include mode
        and reference frame information, limiting end of block offset and
        transform size, alongside Q and loop filter parameters, but this set
        is very fluid.
      * Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
        in MBs using 16x16 prediction modes within inter frames.
      * Compound prediction (combination of signals from existing predictors
        to create a new predictor).
      * 8 tap interpolation filters and 1/8th pel motion vectors.
      * Loop filter modifications.
      * Various entropy modifications and changes to how entropy contexts and
        updates are handled.
      * Extended quantizer range matched to transform precision improvements.
      
      There are also ongoing further experiments that we hope to merge in the
      near future: For example, coding of motion and other aspects of the
      prediction signal to better support larger image formats, use of larger
      block sizes (e.g. 32x32 and up) and lossless non-transform based coding
      options (especially for key frames). It is our hope that we will be
      able to make regular updates and we will warmly welcome community
      contributions.
      
      Please be warned that, at this stage, the codebase is currently slower
      than VP8 stable branch as most new code has not been optimized, and
      even the 'C' has been deliberately written to be simple and obvious,
      not fast.
      
      The following graphs have the initial test results, numbers in the
      tables measure the compression improvement in terms of percentage. The
      build has  the following optional experiments configured:
      --enable-experimental --enable-enhanced_interp --enable-uvintra
      --enable-high_precision_mv --enable-sixteenth_subpel_uv
      
      CIF Size clips:
      http://getwebm.org/tmp/cif/
      HD size clips:
      http://getwebm.org/tmp/hd/
      (stable_20120309 represents encoding results of WebM master branch
      build as of commit#7a159071)
      
      They were encoded using the following encode parameters:
      --good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
      --end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
      --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
      --minsection-pct=0 --maxsection-pct=800 --sharpness=0
      --arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
      --arnr-type=3
      
      Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
      6035da54
  31. 01 Mar, 2012 1 commit
  32. 16 Feb, 2012 1 commit
    • Yaowu Xu's avatar
      moved scaling from dequantization to inverse transform for T8x8 · 454c7abc
      Yaowu Xu authored
      Previously, the scaling related to extended quantize range happens in
      dequantization stage, which implies the coefficients form forward
      transform are in different scale(4x) from dequantization coefficients
      This worked fine when there was not distortion computation done based
      on 8x8 transform, but it completely wracked the distortion estimation
      based on transform coefficients and dequantized transform coefficients
      introduced in commit f64725a0 for macroblocks using 8x8 transform.
      This commit fixed the issue by moving the scaling into the stage of
      inverse 8x8 transform.
      
      TODO: Test&Verify the transform/quantization pipeline accuracy.
      
      Change-Id: Iff77b36a965c2a6b247e59b9c59df93eba5d60e2
      454c7abc
  33. 09 Feb, 2012 1 commit
  34. 15 Dec, 2011 1 commit
    • Scott LaVarnway's avatar
      Moved dequant idct into common · a53d5a4c
      Scott LaVarnway authored
      These functions are now used by the encoder.
      This is WIP with the goal of creating a common idct/add for
      the encoder and decoder.  A boost of 1.8% was seen for
      the HD rt test clip used.
      
      [Tero] Added needed changes to ARM side.
      
      Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
      a53d5a4c
  35. 25 Nov, 2011 1 commit
    • Scott LaVarnway's avatar
      Modified the inverse walsh to output directly · 4a91541c
      Scott LaVarnway authored
      to the dqcoeff or qcoeff buffer.  The encoder would
      populate the dc coeffs of the y blocks as a separate
      stage (recon_dcblock) and the decoder would use a special
      version of the idct.  This change eliminates the extra copy
      and reduces the code footprint.
      
      [Tero] Added needed changes to armv6 and NEON assembly.
      
      Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
      4a91541c