1. 30 Sep, 2014 1 commit
    • Jingning Han's avatar
      Remove redundant header file declaration · 0829d2be
      Jingning Han authored
      Some header file in vp9_idct.c has been included in vp9_idct.h.
      This commit removes these redundant declarations.
      
      Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445
      0829d2be
  2. 12 Sep, 2014 1 commit
    • Deb Mukherjee's avatar
      Adds high bitdepth transform functions and tests · 10783d4f
      Deb Mukherjee authored
      Adds various high bitdepth transform functions and tests.
      Much of the changes are related to using typedefs tran_low_t
      and tran_high_t for the final transform cofficients and intermediate
      stages of the transform computation respectively rather than fixed
      types int16_t/int. When vp9_highbitdepth configure flag is off,
      these map tp int16_t/int32_t, but when the flag is on, they map
      to int32_t/int64_t to make space for needed extra precision.
      
      Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
      10783d4f
  3. 08 May, 2014 1 commit
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
      41a350a8
  4. 28 Jan, 2014 1 commit
  5. 20 Nov, 2013 1 commit
  6. 15 Nov, 2013 1 commit
  7. 24 Oct, 2013 1 commit
    • Yunqing Wang's avatar
      Add 32x32 idct function for eob<=34 case · f88315cb
      Yunqing Wang authored
      When only upper-left 8x8 area has non-zero dct coefficients, we
      could skip 1D IDCT for 9th to 32th rows to save operations. This
      function is called when eob <= 34.
      
      Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
      f88315cb
  8. 12 Oct, 2013 1 commit
  9. 11 Oct, 2013 3 commits
  10. 10 Oct, 2013 2 commits
    • Dmitry Kovalev's avatar
      Removing vp9_idct4_1d_sse2 function. · ddf1b762
      Dmitry Kovalev authored
      We have two SSE2-optimized functions for idct4_1d:
        vp9_idct4_1d_sse2 <-- removing this one
        idct4_1d_sse2
      
      vp9_idct4_1d_sse2 was used only by the following functions which already
      have SSE2 optimized variants:
        vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
        idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
        vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
      
      Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
      ddf1b762
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 32x32 functions. · 1e766b50
      Dmitry Kovalev authored
      Renames:
        vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
        vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
        vp9_idct_add_32x32        -> vp9_idct32x32_add
      
      Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
      1e766b50
  11. 08 Oct, 2013 1 commit
    • Jingning Han's avatar
      All zero coeff skip in IDCT 32x32 · 6594ca88
      Jingning Han authored
      When all coefficients are zeros, skip the corresponding 1-D inverse
      transform. This practice has been used in the SSE2 implementation of
      inverse 32x32 DCT. This commit imports this algorithm into the C code.
      
      Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
      6594ca88
  12. 07 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 16x16 functions. · b096c5a3
      Dmitry Kovalev authored
      Renames:
        vp9_short_idct16x16_add    -> vp9_idct16x16_256_add
        vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add
        vp9_short_idct16x16_1_add  -> vp9_idct16x16_1_add
        vp9_idct_add_16x16         -> vp9_idct16x16_add
      
      Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
      b096c5a3
  13. 06 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 8x8 functions. · c6ad70d5
      Dmitry Kovalev authored
      Renames:
        vp9_short_idct8x8_add    -> vp9_idct8x8_64_add
        vp9_short_idct8x8_1_add  -> vp9_idct8x8_1_add
        vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
        vp9_idct_add_8x8         -> vp9_idct8x8_add
      
      Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
      c6ad70d5
  14. 04 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT/IWHT functions. · 3a060257
      Dmitry Kovalev authored
      The idea is to have the following names for each transform size:
      
      vp9_idct4x4_add
        vp9_idct4x4_1_add
        vp9_idct4x4_10_add
        vp9_idct4x4_16_add
      
      vp9_idct8x8_add
        vp9_idct8x8_1_add
        vp9_idct8x8_10_add
        vp9_idct8x8_64_add
      
      etc for 16x16, 32x32
      
      The actual list of renames in this patch:
      
      vp9_idct_add_lossless     -> vp9_iwht4x4_add
      vp9_short_iwalsh4x4_add   -> vp9_iwht4x4_16_add
      vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
      
      vp9_idct_add            -> vp9_idct4x4_add
      vp9_short_idct4x4_add   -> vp9_idct4x4_16_add
      vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
      
      Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
      3a060257
  15. 02 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Moving all idct/iht functions in one place. · be7eec79
      Dmitry Kovalev authored
      Moving functions from vp9_idct_blk to vp9_idct because these functions are
      used from both encoder and decoder. Removing duplicated code from
      vp9_encodemb.c and reusing existing functions.
      
      Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
      be7eec79
  16. 30 Sep, 2013 1 commit
    • Dmitry Kovalev's avatar
      Removing vp9_add_constant_residual_{8x8, 16x16, 32x32} functions. · 548671dd
      Dmitry Kovalev authored
      We don't need these functions anymore. The only one which was actually
      used is vp9_add_constant_residual_32x32. Addition of
      vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
      version of vp9_short_idct32x32_1_add will be added in the next patch set,
      right now it is only C implementation. Now we have all idct functions
      implemented in a consistent manner.
      
      Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
      548671dd
  17. 27 Sep, 2013 1 commit
  18. 26 Sep, 2013 1 commit
  19. 24 Sep, 2013 1 commit
    • Yaowu Xu's avatar
      Rename defined constants · 6037f179
      Yaowu Xu authored
      The change is to better reflect the nature of the constants.
      
      Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb
      6037f179
  20. 01 Aug, 2013 1 commit
    • Jingning Han's avatar
      Remove unused vp9_short_idct10_32x32_add · 67719abd
      Jingning Han authored
      The inverse 32x32 transform detects all zero entries and skips the
      computations accordingly per 8 rows in the first 1-D operation. The
      function vp9_short_idct10_32x32_add performs differently and is not
      used anywhere, hence removed.
      
      Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
      67719abd
  21. 29 Jul, 2013 1 commit
    • Jingning Han's avatar
      16x16 inverse 2D-DCT with DC only · a7c4de22
      Jingning Han authored
      This commit provides special handle on 16x16 inverse 2D-DCT, where
      only DC coefficient is quantized to be non-zero value.
      
      Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
      a7c4de22
  22. 26 Jul, 2013 1 commit
    • Jingning Han's avatar
      Special handle on DC only inverse 8x8 2D-DCT · 325e0aa6
      Jingning Han authored
      This commit enables a special handle for the 8x8 inverse 2D-DCT,
      where only DC coefficient is quantized to be non-zero. For bus_cif
      at 2000 kbps, it provides about 1% speed-up at speed 0.
      
      Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
      325e0aa6
  23. 24 Jul, 2013 1 commit
  24. 17 Jul, 2013 1 commit
  25. 16 Jul, 2013 1 commit
    • Jingning Han's avatar
      SSE2 16x16 inverse ADST/DCT hybrid transform · d05f66aa
      Jingning Han authored
      This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
      hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
      This provides about 1% encoding speed-up at speed 0.
      
      Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
      d05f66aa
  26. 13 Jul, 2013 1 commit
  27. 30 May, 2013 1 commit
    • Yaowu Xu's avatar
      Changed to use a new variant of WHT · 042e70e4
      Yaowu Xu authored
      The commit changed to use a new variant of Walsh-Hadamard Transform
      by Tim Terriberry. This new variant has the best compression among a
      number of variants that developed by Tim.
      
      Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
      042e70e4
  28. 27 May, 2013 1 commit
  29. 21 May, 2013 1 commit
  30. 20 May, 2013 1 commit
    • Scott LaVarnway's avatar
      WIP: 4x4 idct/recon merge · ba48a111
      Scott LaVarnway authored
      This patch eliminates the intermediate diff buffer usage by
      combining the short idct and the add residual into one function.
      The encoder can use the same code as well.
      
      Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
      ba48a111
  31. 16 May, 2013 1 commit
    • Scott LaVarnway's avatar
      WIP: 8x8 idct/recon merge · 794a7bed
      Scott LaVarnway authored
      This patch eliminates the intermediate diff buffer usage by
      combining the short idct and the add residual into one function.
      The encoder can use the same code as well.
      
      Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
      794a7bed
  32. 15 May, 2013 1 commit
    • Scott LaVarnway's avatar
      WIP: 16x16 idct/recon merge · a272ff25
      Scott LaVarnway authored
      This patch eliminates the intermediate diff buffer usage by
      combining the short idct and the add residual into one function.
      The encoder can use the same code as well.
      
      Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
      a272ff25
  33. 14 May, 2013 1 commit
    • Scott LaVarnway's avatar
      WIP: 32x32 idct/recon merge · 2cf0d4be
      Scott LaVarnway authored
      This patch eliminates the intermediate diff buffer usage by
      combining the short idct and the add residual into one function.
      The encoder can use the same code as well.
      
      Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
      2cf0d4be
  34. 13 Mar, 2013 1 commit
    • Yaowu Xu's avatar
      removed reference to "LLM" and "x8" · 00555263
      Yaowu Xu authored
      The commit changed the name of files and function to remove obselete
      reference to LLM and x8.
      
      Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
      00555263
  35. 08 Mar, 2013 1 commit
    • Yunqing Wang's avatar
      Add vp9_idct4_1d_sse2 · 11ca81f8
      Yunqing Wang authored
      Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also,
      modified the parameter type passed to vp9_short_iht functions to
      make it work with rtcd prototype.
      
      Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe
      11ca81f8
  36. 05 Mar, 2013 1 commit
    • Ronald S. Bultje's avatar
      Make superblocks independent of macroblock code and data. · 111ca421
      Ronald S. Bultje authored
      Split macroblock and superblock tokenization and detokenization
      functions and coefficient-related data structs so that the bitstream
      layout and related code of superblock coefficients looks less like it's
      a hack to fit macroblocks in superblocks.
      
      In addition, unify chroma transform size selection from luma transform
      size (i.e. always use the same size, as long as it fits the predictor);
      in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
      transform will now use the 16x16 (instead of the 8x8) chroma transform,
      and 64x64 superblocks using the 32x32 luma transform will now use the
      32x32 (instead of the 16x16) chroma transform.
      
      Lastly, add a trellis optimize function for 32x32 transform blocks.
      
      HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
      a few negative points here and there that I might want to analyze
      a little closer.
      
      Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
      111ca421
  37. 01 Mar, 2013 1 commit
    • Yunqing Wang's avatar
      Add eob<=10 case in idct32x32 · c550bb3b
      Yunqing Wang authored
      Simplified idct32x32 calculation when there are only 10 or less
      non-zero coefficients in 32x32 block. This helps the decoder
      performance.
      
      Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
      c550bb3b