- 25 Jun, 2013 2 commits
-
-
Jingning Han authored
Remove vp9_intra4x4_predict(). Use the common intra prediction function for all block sizes. Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
-
Jingning Han authored
This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
-
- 21 Jun, 2013 3 commits
-
-
John Koleszar authored
The functions no longer referenced. Change-Id: If2705dfbc607f79ec8ec2242d5e03bec27a35aaf
-
Ronald S. Bultje authored
Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
-
Ronald S. Bultje authored
3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
-
- 20 Jun, 2013 2 commits
-
-
Ronald S. Bultje authored
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to 3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't perfectly interleaved, and can probably be improved further in the future. I've marked this with a few TODOs/FIXMEs in the code. Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
-
Ronald S. Bultje authored
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
-
- 18 Jun, 2013 1 commit
-
-
Jingning Han authored
This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
-
- 14 Jun, 2013 1 commit
-
-
Jingning Han authored
The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
-
- 13 Jun, 2013 1 commit
-
-
Jingning Han authored
The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
-
- 12 Jun, 2013 5 commits
-
-
Scott LaVarnway authored
Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved by almost 20%. Change-Id: Iaa4785be293a32a42e8db07141bd699f504b8c67
-
Ronald S. Bultje authored
Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56 to 4min42. Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f
-
Scott LaVarnway authored
Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved my 20%. Change-Id: Ia0057f55d66d1445882351ea6c43b595a5a980e5
-
John Koleszar authored
These functions are not used, and appear to have been superceded. Change-Id: I86fe51b088264f6b1b8d4d232bba97b371b98120
-
John Koleszar authored
The mmx routines work as expected for the loop filter, so enable them. Change-Id: I2bbd9b99a4445fcba17bb95002f1fb6e01fe8f85
-
- 10 Jun, 2013 1 commit
-
-
John Koleszar authored
Change-Id: I524ba98841f2e1850e3276ac365c501cea31546d
-
- 08 Jun, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: Iec41736c2b6140715f90f40de5ae6cf52497a9b8
-
- 06 Jun, 2013 1 commit
-
-
John Koleszar authored
This version of the loop filter supports non-4:2:0 subsampling and a fourth plane, as well as changing the filtering order to be more friendly to hardware implementations. The filters are applied first to all vertical edges within the 64x64 SB, followed by the top horizontal edge and any internal horizontal edges. Since filtering is applied on each 4x4 edge serially, a dependency is created from filtering one block edge to the next. It would be possible to remove this depencnecy by building all filtering decisions from the unfiltered reconstruction data. Change-Id: I08f3e9683eb7bded8a76651cbc50fc0dfdd05fa7
-
- 31 May, 2013 1 commit
-
-
Jim Bankoski authored
This speed 1 - uses variance threshold stolen from static-thresh to determine split. Any superblock with greater than the variance set by static thresh * quantizer index squared is split. In addition transform size is set to largest size less than or equal to partition size, sub pixel filter is set to normal, and only 12 modes are used at all. Change-Id: If7a2858ee70f96d1eb989c04fd87a332b147abef
-
- 23 May, 2013 1 commit
-
-
Jingning Han authored
Move 4x4/4x8/8x4 partition coding out of experimental list. This commit fixed the unit test failure issues. It also resolved the merge conflicts between 4x4 block level partition and iterative motion search for comp_inter_inter. Change-Id: I898671f0631f5ddc4f5cc68d4c62ead7de9c5a58
-
- 22 May, 2013 1 commit
-
-
Yunqing Wang authored
Added SSE2 version of variance functions for super blocks. Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d
-
- 21 May, 2013 1 commit
-
-
Scott LaVarnway authored
No longer used. Change-Id: Ica5166f7117f4693dffdf7633dcfc1b263103d0d
-
- 20 May, 2013 1 commit
-
-
Scott LaVarnway authored
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
-
- 16 May, 2013 2 commits
-
-
Scott LaVarnway authored
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
-
Jingning Han authored
These building blocks enable rate-distortion optimization search over block sizes of 8x4 and 4x8. Need to convert them into mmx/sse forms. Change-Id: I570ea2d22d14ceec3fe3575128d7dfa172a577de
-
- 15 May, 2013 1 commit
-
-
Scott LaVarnway authored
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
-
- 14 May, 2013 3 commits
-
-
Scott LaVarnway authored
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
-
Dmitry Kovalev authored
Change-Id: I299feefa64b93bd62263aea1ff1e41e85faeb6ca
-
John Koleszar authored
This reverts commit a9333111 Change-Id: I2321f88011178381adbcffeda1bcc6a430ab8f1d
-
- 13 May, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Id1cc1c2663b9c2219cb830ffb4b0c6ab3468dc04
-
- 10 May, 2013 2 commits
-
-
Dmitry Kovalev authored
Change-Id: Ic11dc052fb641687c015e1bbc37181b9babcd43e
-
Yunqing Wang authored
In current code, motion vectors got from single prediction mode are used in compound prediction mode directly. These motion vectors may not give accurate prediction since they are searched independently. In this patch, we took Pascal's suggestion, and did joint motion search in compound prediction mode to find better motion vectors in this situation. Test results: Overall PSNR: 0.570%(derf), 0.918%(stdhd); SSIM: 0.572%(derf), 1.009%(stdhd); The encoder is a little slower. This can be improved since some c code is used in motion search. Change-Id: Ib30c9240f6c56c9b070867b4ca89412a76d9f3c6
-
- 07 May, 2013 1 commit
-
-
Jingning Han authored
Pull sb8x8 out of experimental list. verified via borg run tests. Fixed unit test failures. Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76
-
- 04 May, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I1df17f45721c690d157800daa6a0b377e3d32bc2
-
- 02 May, 2013 2 commits
-
-
Ronald S. Bultje authored
Fixes valgrind uninitialized value use warnings. Change-Id: Ie9314d684e2ad194f8aca5bde1729fb9b7c0221d
-
Ronald S. Bultje authored
The encoder reconstruction is now correct. Decoder to follow shortly. Change-Id: Iedf98cdaebb4ca1256c7714cad7024a75853ad6a
-
- 01 May, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I390bb1cedc835f439fd5dd6cda6572b29cbb139c
-
- 26 Apr, 2013 3 commits
-
-
John Koleszar authored
All members can be referenced from their per-plane counterparts, and removes assumptions about 24 blocks per macroblock. Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
-
John Koleszar authored
Access these members from MACROBLOCKD instead. Change-Id: I7907230dd473ff12ebe182b9280d8b7f12a888c4
-
Scott LaVarnway authored
This originally was "Removed update_blockd_bmi()". Now, this patch removed bmi from blockd and uses the bmi found in mode_info_context. Eliminates unnecessary bmi copies between blockd and mode_info_context. Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
-