1. 02 Feb, 2018 1 commit
  2. 01 Nov, 2016 1 commit
  3. 19 Oct, 2016 1 commit
  4. 02 Sep, 2016 1 commit
  5. 01 Sep, 2016 2 commits
  6. 12 Aug, 2016 1 commit
  7. 04 Aug, 2016 1 commit
  8. 22 Mar, 2016 2 commits
    • Yaowu Xu's avatar
      vp10/ -> av1/ · cfea7dd7
      Yaowu Xu authored
      Change-Id: Ia055d03656ad1580447eced8687949583fdf4089
      cfea7dd7
    • Yaowu Xu's avatar
      Rename vpx to aom · bf4202ed
      Yaowu Xu authored
      Change-Id: Ibc7933fba85feeb30ef9b14b302d932aff19f54e
      bf4202ed
  9. 28 Jan, 2016 1 commit
  10. 21 Jan, 2016 1 commit
  11. 09 Nov, 2015 1 commit
    • Johann's avatar
      Release v1.5.0 · cbecf57f
      Johann authored
      Javan Whistling Duck release.
      
      Change-Id: If44c9ca16a8188b68759325fbacc771365cb4af8
      cbecf57f
  12. 10 Sep, 2015 1 commit
    • Angie Chiang's avatar
      Isolate vp10's inv_txfm from vp9 · 87175ed5
      Angie Chiang authored
      1) copy following files from vpx_dsp/ to vp10/common/
      vp10_inv_txfm.c
      vp10_inv_txfm.h
      vp10_inv_txfm_sse2.c
      vp10_inv_txfm_sse2.h
      
      2) change the function prefix "vpx_" to "vp10_" in above files
      
      3) add unit test at vp10_inv_txfm_test.cc
      
      Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913
      87175ed5
  13. 04 Aug, 2015 1 commit
  14. 02 Aug, 2015 1 commit
  15. 31 Jul, 2015 1 commit
    • Jingning Han's avatar
      Factor inverse transform functions into vpx_dsp · e8b133c7
      Jingning Han authored
      This commit moves the module inverse transform functions from vp9
      to vpx_dsp folder. The hybrid transform wrapper functions stay in
      the vp9 folder, since it involves codec-specific data structures.
      
      Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
      e8b133c7
  16. 26 Jul, 2015 1 commit
    • Jingning Han's avatar
      Refactor vp9_idct.h file · 5ebc8feb
      Jingning Han authored
      Separate the common coefficient constant into vpx_dsp/txfm_common.h.
      Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
      This clears the use case of vp9_idct.h in vpx_dsp folder.
      
      Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
      5ebc8feb
  17. 04 Jun, 2015 1 commit
  18. 15 May, 2015 1 commit
  19. 13 May, 2015 1 commit
    • Johann's avatar
      Relocate memory operations for common code · 1d7ccd53
      Johann authored
      With the sad functions, and hopefully the variance functions soon,
      moving to the vpx_dsp location, place the defines used in the
      reference C code in a common location.
      
      Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
      1d7ccd53
  20. 01 May, 2015 3 commits
  21. 11 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Corrected optimization of 8x8 DCT code · 5c22224e
      Peter de Rivaz authored
      The 8x8 DCT uses a fast version whenever possible.
      There was a mistake in the checking code which
      meant sometimes the fast version was used when it
      was not safe to do so.
      
      Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
      (cherry picked from commit fd05fb0c21e253b4d6f92d7e0b752850ff8ab188)
      5c22224e
  22. 02 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Added high bitdepth sse2 transform functions · 7e40a55e
      Peter de Rivaz authored
      Also removes some spurious changes in common/vp9_blockd.h which
      was introduced by a rebase issue between nextgen and master branches.
      
      Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
      (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba)
      (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3)
      (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
      7e40a55e
  23. 05 Nov, 2014 1 commit
  24. 06 Sep, 2014 1 commit
  25. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
      6d21cbd2
  26. 23 May, 2014 1 commit
    • Jingning Han's avatar
      Inverse 16x16 2D-DCT SSSE3 implementation · 48b08913
      Jingning Han authored
      This commit enables the SSSE3 implementation of full inverse 16x16
      2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
      about 7% speed-up.
      
      Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
      48b08913
  27. 08 May, 2014 1 commit
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
      41a350a8
  28. 28 Jan, 2014 1 commit
  29. 09 Jan, 2014 1 commit
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
      af31b27a
  30. 08 Jan, 2014 1 commit
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
      ba6ab46c
  31. 03 Jan, 2014 3 commits
    • Jingning Han's avatar
      Tune IDCT8_1D macro function interface · 3e0c62b5
      Jingning Han authored
      This commit adds input/output ports for IDCT8_1D macro function to
      provide more flexibility in variable use. It allows to skip several
      buffer swap operations.
      
      Change-Id: I21f3450509537322293043b3281bfd3949868677
      3e0c62b5
    • Jingning Han's avatar
      Reduce num of buffer swap calls in idct8_1d_sse2 · 0b1a2713
      Jingning Han authored
      This commit merges the initial buffer swap operations in idct8_1d_sse2
      into the array transpose step, hence reducing number of instructions
      therein.
      
      Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
      0b1a2713
    • Jingning Han's avatar
      Rework idct8x8_10 SSE2 implementation · 1bb11781
      Jingning Han authored
      This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
      the fact that only top-left 4x4 block contains non-zero coefficients,
      and hence reduces the instructions needed.
      
      The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
      estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
      frames coded at 4000kbps, the average decoding speed goes up from
      79.3 fps to 79.7 fps.
      
      Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
      1bb11781
  32. 03 Dec, 2013 1 commit
  33. 26 Nov, 2013 1 commit
    • Abo Talib Mahfoodh's avatar
      improve vp9_idct32x32_34(x1.472)&1024(x1.032)_add_sse2 · f97d91ab
      Abo Talib Mahfoodh authored
      vp9_idct32x32_34_add_sse2:
      speedup: 1.472
      IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
      based on the fact that Only upper-left 8x8 has
      non-zero values.
      
      vp9_idct32x32_1024_add_sse2:
      speedup: 1.032
      
      Tested with: park_joy_420_720p50.y4m
      
      Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
      f97d91ab
  34. 19 Nov, 2013 1 commit
    • Abo Talib Mahfoodh's avatar
      Improve vp9_iht4x4_16_add_sse2 (x1.341) · 613e2d2e
      Abo Talib Mahfoodh authored
      This rebase is a better implementation of the previous ones.
      
      Modifications are done to reduce the total clock cycle.
      Speedup: 1.341
      Compiled with -O3
      Tested with: park_joy_420_720p50.y4m
      
      Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
      613e2d2e