- 20 Jun, 2016 1 commit
-
-
Yaowu Xu authored
This fixes MSVC warnings. Change-Id: I675d8486230b2b74d7973d95720a4995c4750282
-
- 09 May, 2016 1 commit
-
-
Scott LaVarnway authored
Change-Id: I431ea0d9abe764d110a1ba32a8cb15e2fdac8805
-
- 05 Oct, 2015 1 commit
-
-
Scott LaVarnway authored
Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3
-
- 29 Sep, 2015 1 commit
-
-
Julia Robson authored
When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
-
- 04 Aug, 2015 1 commit
-
-
Jingning Han authored
This commit clears the function naming convention in vpx_dsp. It replaces vp9_ prefix of global functions with vpx_ prefix. It also removes the vp9_ prefix from static functions. Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
-
- 02 Aug, 2015 1 commit
-
-
Jingning Han authored
Change-Id: Ibab434fb4bd6da02dba087582ed74811f555c3ed
-
- 31 Jul, 2015 1 commit
-
-
Jingning Han authored
This commit moves the module inverse transform functions from vp9 to vpx_dsp folder. The hybrid transform wrapper functions stay in the vp9 folder, since it involves codec-specific data structures. Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
-
- 26 Jul, 2015 1 commit
-
-
Jingning Han authored
Separate the common coefficient constant into vpx_dsp/txfm_common.h. Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h. This clears the use case of vp9_idct.h in vpx_dsp folder. Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
-
- 04 Jun, 2015 1 commit
-
-
hkuang authored
Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e
-
- 15 May, 2015 1 commit
-
-
James Zern authored
silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
-
- 13 May, 2015 1 commit
-
-
Johann authored
With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
-
- 01 May, 2015 3 commits
-
-
James Zern authored
+ fix some whitespace Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2
-
James Zern authored
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
-
James Zern authored
offsetting by a variable stride prevents instruction reordering, resulting in poor assembly. additionally reroll 16x16/32x32 loops to reduce register spill with this new format Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
-
- 11 Dec, 2014 1 commit
-
-
Peter de Rivaz authored
The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit fd05fb0c21e253b4d6f92d7e0b752850ff8ab188)
-
- 02 Dec, 2014 1 commit
-
-
Peter de Rivaz authored
Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
-
- 05 Nov, 2014 1 commit
-
-
Yaowu Xu authored
For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
-
- 06 Sep, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: If91017b792572c9db6e257011ca307bef8428486
-
- 28 May, 2014 1 commit
-
-
Jingning Han authored
This commit enables SSSE3 implementation of the inverse 2D-DCT with only first 10 coefficients non-zero. It reduces the runtime of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up. Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
-
- 23 May, 2014 1 commit
-
-
Jingning Han authored
This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
-
- 08 May, 2014 1 commit
-
-
Jingning Han authored
The scanning order has the first 12 coefficients of the 8x8 2D-DCT sitting in the top left 4x4 block. Hence the partial inverse 8x8 2D-DCT allows to handle cases with eob below 12. The overall runtime of the inverse 8x8 2D-DCT unit is reduced from 166 cycles (using SSE2) to 150 cycles (using SSSE3). Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
-
- 28 Jan, 2014 1 commit
-
-
Dmitry Kovalev authored
It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
-
- 09 Jan, 2014 1 commit
-
-
Jingning Han authored
This commit further optimizes SSE2 operations in the second 1-D inverse 16x16 DCT, with (<10) non-zero coefficients. The average runtime of this module goes down from 779 cycles -> 725 cycles. Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
-
- 08 Jan, 2014 1 commit
-
-
Jingning Han authored
This commit is the first patch optimizing SSE2 implementation of inverse 16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row) transformation. It exploits the fact that only top-left 4x4 block contains non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients. The average runtime of idct16x16_10 unit is reduced from 883 cycles -> 779 cycles (12% faster). For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes down from 310651 ms -> 305910 ms. The decoding speed goes up from 80.37 fps -> 80.87 fps. Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
-
- 03 Jan, 2014 3 commits
-
-
Jingning Han authored
This commit adds input/output ports for IDCT8_1D macro function to provide more flexibility in variable use. It allows to skip several buffer swap operations. Change-Id: I21f3450509537322293043b3281bfd3949868677
-
Jingning Han authored
This commit merges the initial buffer swap operations in idct8_1d_sse2 into the array transpose step, hence reducing number of instructions therein. Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
-
Jingning Han authored
This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits the fact that only top-left 4x4 block contains non-zero coefficients, and hence reduces the instructions needed. The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles, estimated by averaging over 100000 runs. For pedestrian_area_1080p 300 frames coded at 4000kbps, the average decoding speed goes up from 79.3 fps to 79.7 fps. Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
-
- 03 Dec, 2013 1 commit
-
-
Abo Talib Mahfoodh authored
The performance gain of idct16x16_10_add_sse2 function is not noticeable. However since both functions use the IDCT16_1D, idct16x16_10_add_sse2 should be modified as well. Tested with: park_joy_420_720p50.y4m Change-Id: I02b957e36fcf997c677d15baf496533895271bff
-
- 26 Nov, 2013 1 commit
-
-
Abo Talib Mahfoodh authored
vp9_idct32x32_34_add_sse2: speedup: 1.472 IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized based on the fact that Only upper-left 8x8 has non-zero values. vp9_idct32x32_1024_add_sse2: speedup: 1.032 Tested with: park_joy_420_720p50.y4m Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
-
- 19 Nov, 2013 1 commit
-
-
Abo Talib Mahfoodh authored
This rebase is a better implementation of the previous ones. Modifications are done to reduce the total clock cycle. Speedup: 1.341 Compiled with -O3 Tested with: park_joy_420_720p50.y4m Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
-
- 24 Oct, 2013 1 commit
-
-
Yunqing Wang authored
When only upper-left 8x8 area has non-zero dct coefficients, we could skip 1D IDCT for 9th to 32th rows to save operations. This function is called when eob <= 34. Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
-
- 23 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
For consistency with idct function names. Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
-
- 22 Oct, 2013 1 commit
-
-
Abo Talib Mahfoodh authored
Simple modification to reduce number of cycles in the function. Original function number of cycles: 973 Modified function number of cycles: 835 Improvment factor: 1.165 Tested with: park_joy_420_720p50.y4m Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
-
- 12 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Also renaming dest_stride to stride in some places. Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
-
- 11 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Renames: vp9_short_iht4x4_add -> vp9_iht4x4_16_add vp9_short_iht8x8_add -> vp9_iht8x8_64_add vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
-
- 10 Oct, 2013 2 commits
-
-
Dmitry Kovalev authored
We have two SSE2-optimized functions for idct4_1d: vp9_idct4_1d_sse2 <-- removing this one idct4_1d_sse2 vp9_idct4_1d_sse2 was used only by the following functions which already have SSE2 optimized variants: vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2 idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2 vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2 Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
-
Dmitry Kovalev authored
Renames: vp9_short_idct32x32_add -> vp9_idct32x32_1024_add vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add vp9_idct_add_32x32 -> vp9_idct32x32_add Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
-
- 07 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Renames: vp9_short_idct16x16_add -> vp9_idct16x16_256_add vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add vp9_idct_add_16x16 -> vp9_idct16x16_add Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
-
- 06 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Renames: vp9_short_idct8x8_add -> vp9_idct8x8_64_add vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add vp9_idct_add_8x8 -> vp9_idct8x8_add Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
-
- 04 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
The idea is to have the following names for each transform size: vp9_idct4x4_add vp9_idct4x4_1_add vp9_idct4x4_10_add vp9_idct4x4_16_add vp9_idct8x8_add vp9_idct8x8_1_add vp9_idct8x8_10_add vp9_idct8x8_64_add etc for 16x16, 32x32 The actual list of renames in this patch: vp9_idct_add_lossless -> vp9_iwht4x4_add vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add vp9_idct_add -> vp9_idct4x4_add vp9_short_idct4x4_add -> vp9_idct4x4_16_add vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
-