- 15 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
27 degrees intra predictor is actually 207 degrees, so renaming it. Change-Id: Ife96a910437eb80ccdc0b7a5b7a62c77542ae5be
-
- 14 Aug, 2013 2 commits
-
-
hkuang authored
Change-Id: I27134b9a5cace2bdad53534562c91d829b48838d
-
Dmitry Kovalev authored
Adding const to above and left pointers. Cleanup. Change-Id: I51e195fa2e2923048043fe68b4e38a47ee82cda1
-
- 12 Aug, 2013 1 commit
-
-
Jingning Han authored
Enable SSE2 implementation of high precision 32x32 forward DCT. The intermediate stacks are of 32-bits. The run-time goes down from 32126 cycles to 13442 cycles. Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
-
- 07 Aug, 2013 1 commit
-
-
Christian Duvivier authored
Change-Id: Idec4cae0cb9b3a29835fd2750d354c1393d47aa4
-
- 06 Aug, 2013 6 commits
-
-
Jim Bankoski authored
also fixed bug in sad calcs Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
-
Jim Bankoski authored
Change-Id: I4a3c83119cdf8a205920034c8019d855d5504605
-
Jim Bankoski authored
Enable use_x86inc as a commandline option. Fix Bug with sse2 when x86inc is disabled. Adds Sad asm protection to x86inc protection Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6
-
Jim Bankoski authored
Change-Id: If0399d8e11f4ebe75a5c91abb8d6a52a7709065b
-
Jim Bankoski authored
Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40
-
Jim Bankoski authored
Support enabling it or disabling it. Moved read out to configure.sh so that its done once instead of in make and in config. Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b
-
- 05 Aug, 2013 2 commits
-
-
Jim Bankoski authored
Change-Id: I226e5094d216b09dc47fa5511a66e2d314608000
-
Jim Bankoski authored
Chromium does not support 32bit builds for Mac which use x86inc.asm. Make the files which include it work if 64bit or not PIC enabled starting with vp9_copy_sse2.asm Consolidate these targets in vp9_rtcd_defs.sh Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
-
- 02 Aug, 2013 2 commits
-
-
Dmitry Kovalev authored
Change-Id: I3fe90eb40088a5b07bdc7d66d93ffe6ef99943d5
-
Mans Rullgard authored
Change-Id: I13e0880df234f15abc4cc7c57fe84488d5d46a75
-
- 01 Aug, 2013 1 commit
-
-
Jingning Han authored
The inverse 32x32 transform detects all zero entries and skips the computations accordingly per 8 rows in the first 1-D operation. The function vp9_short_idct10_32x32_add performs differently and is not used anywhere, hence removed. Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
-
- 29 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit provides special handle on 16x16 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero value. Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
-
- 26 Jul, 2013 2 commits
-
-
Jingning Han authored
This commit enables a special handle for the 8x8 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero. For bus_cif at 2000 kbps, it provides about 1% speed-up at speed 0. Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
-
Ronald S. Bultje authored
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
-
- 25 Jul, 2013 1 commit
-
-
Jingning Han authored
Add SSE2 implementation to handle the special case of inverse 2D-DCT where only DC coefficient is non-zero. Change-Id: I2c6a59e21e5e77b8cf39a4af5eecf4d5ade32e2f
-
- 24 Jul, 2013 1 commit
-
-
Jingning Han authored
They share the same functionality, so merging together. Change-Id: I98a0386fcee052cb854f9ff90c283c1b844bcb79
-
- 18 Jul, 2013 1 commit
-
-
hkuang authored
Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda
-
- 17 Jul, 2013 1 commit
-
-
Johann authored
Call the individually optimized horizontal and vertical functions. This implementation abuses the temp buffer. This will be replaced with a custom optimized function. Over 2x speedup. Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
-
- 16 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles. This provides about 1% encoding speed-up at speed 0. Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
-
- 13 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit enables SSE2 implementation of 8x8 inverse ADST/DCT transform. The runtime goes from 1216 cycles -> 266 cycles. For bus_cif at 2000 kbps, the overall runtime reduces from 253707ms -> 248430ms, i.e., 2% speed-up at speed 0. Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb
-
- 12 Jul, 2013 1 commit
-
-
Johann authored
Super basic conversion from the other implementations. Any changes to one should be trivial to copy over keep in sync. Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
-
- 11 Jul, 2013 4 commits
-
-
Johann authored
Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
-
hkuang authored
Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423
-
Jingning Han authored
Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from 292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the overall runtime of speed 0 goes from 301s to 295s (2% speed-up). Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628
-
Ronald S. Bultje authored
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
-
- 10 Jul, 2013 6 commits
-
-
John Koleszar authored
Where possible, do the 16 pixel wide filter while doing the horizontal filtering pass. The same approach can be taken for the mbloop_filter when that's implemented. Doing so on the vertical pass is a little more involved, but possible. Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
-
Jingning Han authored
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2 operations. It reduces the runtime from 5433 cycles to 1621 cycles, at no compression performance loss. Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
-
Ronald S. Bultje authored
Change-Id: Iad70966b986f65259329070e258f76ef0af816b4
-
Ronald S. Bultje authored
Change-Id: I3441c059214c2956e8261331bbf521525a617a86
-
Ronald S. Bultje authored
Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997
-
Ronald S. Bultje authored
Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0
-
- 09 Jul, 2013 2 commits
-
-
Frank Galligan authored
- The vp9 mbfilter C code will branch on flat and mask. This CL will perform both branches and combine the data. A later CL will perform a check to see if all patch will take one branch. - These functions are about 1.75 times faster than the C code on Nexus 7. PS #3 - Changed all functions to dub limit, blimit, and thresh from vld {dx[]}, freeing up r4-r6. - Changed code to use vbif to reduce one instruction and free up a d register. Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
-
Ronald S. Bultje authored
This probably has a mildly negative impact on performance, but will (in future commits - or possibly merged with this one) allow SIMD implementations of individual intra prediction functions. We may perhaps want to consider having separate functions per txfm-size also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for each intra prediction mode), but I haven't played much with that yet. Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
-
- 01 Jul, 2013 2 commits
-
-
Ronald S. Bultje authored
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
-
Ronald S. Bultje authored
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
-