- 28 Jul, 2015 1 commit
-
-
Jingning Han authored
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/. Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
-
- 15 May, 2015 1 commit
-
-
James Zern authored
this file shouldn't be built directly, it is included in vp9_dct_avx2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
-
- 28 Jul, 2014 1 commit
-
-
levytamar82 authored
Remove all the redundant dct functions (dct4x4, dct8x8) in avx2 except dct32x32 those functions were copied originally from dct_sse2 Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
-
- 13 Feb, 2014 1 commit
-
-
Andrew Russell authored
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
-
- 06 Feb, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d
-
- 28 Jan, 2014 1 commit
-
-
Dmitry Kovalev authored
It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
-
- 21 Nov, 2013 2 commits
-
-
levytamar82 authored
Change-Id: I6366e84490883b72362f762369d7e5bccb64f02f
-
Abo Talib Mahfoodh authored
Modifications are done to reduce the total clock cycle. Speedup: 1.2 Tested with: park_joy_420_720p50.y4m Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
-
- 13 Nov, 2013 1 commit
-
-
Jingning Han authored
The step that sums three input samples could potentially cause the intermediate result go beyond 16 bit limit, when operating as the second 1-D transform. This commit fixes the issue. Change-Id: Iaf512449ac2d25ddd8a806d760afab362c62a516
-
- 24 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I78f7012f967a777ddd39bae6671eb501df6bbfe8
-
- 23 Oct, 2013 4 commits
-
-
Dmitry Kovalev authored
For consistency with idct function names. Renames: vp9_short_fdct4x4 -> vp9_fdct4x4 vp9_short_walsh4x4 -> vp9_fwht4x4 Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
-
Dmitry Kovalev authored
For consistency with idct function names. Change-Id: Ie77b7178e0894c57cd5cb9243c949eb9224ece18
-
Dmitry Kovalev authored
For consistency with idct function names. Change-Id: I5ca355ba99fdba04f09254be95cf79808b534f71
-
Dmitry Kovalev authored
For consistency with idct function names. Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
-
- 21 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
-
- 18 Oct, 2013 2 commits
-
-
Dmitry Kovalev authored
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
-
Dmitry Kovalev authored
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
-
- 15 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Icbcf68b5b685a56f255ebc3859c9692accdadf9e
-
- 24 Sep, 2013 1 commit
-
-
A.Mahfoodh authored
Mathematically the results are the same. Change-Id: I1c5126cd3ca64e8515ca6331e0989c6f7dd651a0
-
- 12 Aug, 2013 1 commit
-
-
Jingning Han authored
Enable SSE2 implementation of high precision 32x32 forward DCT. The intermediate stacks are of 32-bits. The run-time goes down from 32126 cycles to 13442 cycles. Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
-
- 06 Aug, 2013 2 commits
-
-
Jingning Han authored
Resolve compile warnings on re-define FDCT32x32_2D template. Change-Id: Idb3a54ef8d2710ce7245b726379a0e5c875f5cad
-
Christian Duvivier authored
This is in preparation for the SSE2 version of the high-precision 32x32 forward DCT which will share a lot of code with the existing low precision version used for rate-distortion search. Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0
-
- 10 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2 operations. It reduces the runtime from 5433 cycles to 1621 cycles, at no compression performance loss. Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
-
- 03 Jul, 2013 1 commit
-
-
Jingning Han authored
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT hybrid transform coding. Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
-
- 29 Jun, 2013 2 commits
-
-
Christian Duvivier authored
43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
-
Jingning Han authored
This commit enables SSE2 4x4 foward hybrid transform. The runtime goes from 249 cycles down to 74 cycles. Overall around 2% speed-up at no compression performance change. Change-Id: Iad4d526346e05c7be896466c05500711bb763660
-
- 28 Jun, 2013 1 commit
-
-
Jingning Han authored
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
-
- 26 Jun, 2013 1 commit
-
-
Yaowu Xu authored
The aligned array in parameter list caused win32 build to report c2719 error. This commit fixed the issue by make the parameter type a pointer instead of an array. Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992
-
- 25 Jun, 2013 3 commits
-
-
Jingning Han authored
Improve the round-trip precision to meet the unit test setttings. Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
-
Jingning Han authored
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles. Change-Id: I137758b81cd127b936175284310e81378db64552
-
Jingning Han authored
This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
-
- 26 Apr, 2013 2 commits
- 16 Apr, 2013 2 commits
-
-
Christian Duvivier authored
Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
-
Christian Duvivier authored
Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
-
- 18 Mar, 2013 1 commit
-
-
Yunqing Wang authored
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
-
- 15 Mar, 2013 1 commit
-
-
Christian Duvivier authored
Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289
-
- 28 Feb, 2013 2 commits
-
-
Jim Bankoski authored
Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df
-
Christian Duvivier authored
Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
-