1. 20 Jun, 2016 1 commit
  2. 09 May, 2016 1 commit
  3. 05 Oct, 2015 1 commit
  4. 29 Sep, 2015 1 commit
    • Julia Robson's avatar
      Accelerated transform in high bit depth · 406030d1
      Julia Robson authored
      When configured with high bitdepth enabled, the 8bit transform
      stopped using optimised code. This made 8bit content decode slowly.
      Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
  5. 04 Aug, 2015 1 commit
  6. 02 Aug, 2015 1 commit
  7. 31 Jul, 2015 1 commit
    • Jingning Han's avatar
      Factor inverse transform functions into vpx_dsp · e8b133c7
      Jingning Han authored
      This commit moves the module inverse transform functions from vp9
      to vpx_dsp folder. The hybrid transform wrapper functions stay in
      the vp9 folder, since it involves codec-specific data structures.
      Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
  8. 26 Jul, 2015 1 commit
    • Jingning Han's avatar
      Refactor vp9_idct.h file · 5ebc8feb
      Jingning Han authored
      Separate the common coefficient constant into vpx_dsp/txfm_common.h.
      Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
      This clears the use case of vp9_idct.h in vpx_dsp folder.
      Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
  9. 04 Jun, 2015 1 commit
  10. 15 May, 2015 1 commit
  11. 13 May, 2015 1 commit
    • Johann's avatar
      Relocate memory operations for common code · 1d7ccd53
      Johann authored
      With the sad functions, and hopefully the variance functions soon,
      moving to the vpx_dsp location, place the defines used in the
      reference C code in a common location.
      Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
  12. 01 May, 2015 3 commits
  13. 11 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Corrected optimization of 8x8 DCT code · 5c22224e
      Peter de Rivaz authored
      The 8x8 DCT uses a fast version whenever possible.
      There was a mistake in the checking code which
      meant sometimes the fast version was used when it
      was not safe to do so.
      Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
      (cherry picked from commit fd05fb0c21e253b4d6f92d7e0b752850ff8ab188)
  14. 02 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Added high bitdepth sse2 transform functions · 7e40a55e
      Peter de Rivaz authored
      Also removes some spurious changes in common/vp9_blockd.h which
      was introduced by a rebase issue between nextgen and master branches.
      Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
      (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba)
      (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3)
      (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
  15. 05 Nov, 2014 1 commit
  16. 06 Sep, 2014 1 commit
  17. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
  18. 23 May, 2014 1 commit
    • Jingning Han's avatar
      Inverse 16x16 2D-DCT SSSE3 implementation · 48b08913
      Jingning Han authored
      This commit enables the SSSE3 implementation of full inverse 16x16
      2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
      about 7% speed-up.
      Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
  19. 08 May, 2014 1 commit
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
  20. 28 Jan, 2014 1 commit
  21. 09 Jan, 2014 1 commit
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
  22. 08 Jan, 2014 1 commit
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
  23. 03 Jan, 2014 3 commits
    • Jingning Han's avatar
      Tune IDCT8_1D macro function interface · 3e0c62b5
      Jingning Han authored
      This commit adds input/output ports for IDCT8_1D macro function to
      provide more flexibility in variable use. It allows to skip several
      buffer swap operations.
      Change-Id: I21f3450509537322293043b3281bfd3949868677
    • Jingning Han's avatar
      Reduce num of buffer swap calls in idct8_1d_sse2 · 0b1a2713
      Jingning Han authored
      This commit merges the initial buffer swap operations in idct8_1d_sse2
      into the array transpose step, hence reducing number of instructions
      Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
    • Jingning Han's avatar
      Rework idct8x8_10 SSE2 implementation · 1bb11781
      Jingning Han authored
      This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
      the fact that only top-left 4x4 block contains non-zero coefficients,
      and hence reduces the instructions needed.
      The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
      estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
      frames coded at 4000kbps, the average decoding speed goes up from
      79.3 fps to 79.7 fps.
      Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
  24. 03 Dec, 2013 1 commit
  25. 26 Nov, 2013 1 commit
    • Abo Talib Mahfoodh's avatar
      improve vp9_idct32x32_34(x1.472)&1024(x1.032)_add_sse2 · f97d91ab
      Abo Talib Mahfoodh authored
      speedup: 1.472
      IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
      based on the fact that Only upper-left 8x8 has
      non-zero values.
      speedup: 1.032
      Tested with: park_joy_420_720p50.y4m
      Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
  26. 19 Nov, 2013 1 commit
    • Abo Talib Mahfoodh's avatar
      Improve vp9_iht4x4_16_add_sse2 (x1.341) · 613e2d2e
      Abo Talib Mahfoodh authored
      This rebase is a better implementation of the previous ones.
      Modifications are done to reduce the total clock cycle.
      Speedup: 1.341
      Compiled with -O3
      Tested with: park_joy_420_720p50.y4m
      Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
  27. 24 Oct, 2013 1 commit
    • Yunqing Wang's avatar
      Add 32x32 idct function for eob<=34 case · f88315cb
      Yunqing Wang authored
      When only upper-left 8x8 area has non-zero dct coefficients, we
      could skip 1D IDCT for 9th to 32th rows to save operations. This
      function is called when eob <= 34.
      Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
  28. 23 Oct, 2013 1 commit
  29. 22 Oct, 2013 1 commit
    • Abo Talib Mahfoodh's avatar
      Improve vp9_idct4x4_1_add_sse2 · 908a992d
      Abo Talib Mahfoodh authored
      Simple modification to reduce number of cycles in the
      Original function number of cycles: 973
      Modified function number of cycles: 835
      Improvment factor: 1.165
      Tested with: park_joy_420_720p50.y4m
      Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
  30. 12 Oct, 2013 1 commit
  31. 11 Oct, 2013 1 commit
  32. 10 Oct, 2013 2 commits
    • Dmitry Kovalev's avatar
      Removing vp9_idct4_1d_sse2 function. · ddf1b762
      Dmitry Kovalev authored
      We have two SSE2-optimized functions for idct4_1d:
        vp9_idct4_1d_sse2 <-- removing this one
      vp9_idct4_1d_sse2 was used only by the following functions which already
      have SSE2 optimized variants:
        vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
        idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
        vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
      Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 32x32 functions. · 1e766b50
      Dmitry Kovalev authored
        vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
        vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
        vp9_idct_add_32x32        -> vp9_idct32x32_add
      Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
  33. 07 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 16x16 functions. · b096c5a3
      Dmitry Kovalev authored
        vp9_short_idct16x16_add    -> vp9_idct16x16_256_add
        vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add
        vp9_short_idct16x16_1_add  -> vp9_idct16x16_1_add
        vp9_idct_add_16x16         -> vp9_idct16x16_add
      Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
  34. 06 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 8x8 functions. · c6ad70d5
      Dmitry Kovalev authored
        vp9_short_idct8x8_add    -> vp9_idct8x8_64_add
        vp9_short_idct8x8_1_add  -> vp9_idct8x8_1_add
        vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
        vp9_idct_add_8x8         -> vp9_idct8x8_add
      Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
  35. 04 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT/IWHT functions. · 3a060257
      Dmitry Kovalev authored
      The idea is to have the following names for each transform size:
      etc for 16x16, 32x32
      The actual list of renames in this patch:
      vp9_idct_add_lossless     -> vp9_iwht4x4_add
      vp9_short_iwalsh4x4_add   -> vp9_iwht4x4_16_add
      vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
      vp9_idct_add            -> vp9_idct4x4_add
      vp9_short_idct4x4_add   -> vp9_idct4x4_16_add
      vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
      Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1