- Sep 29, 2010
-
-
Adrian Grange authored
Moved the bounds computation on vertical MV component out of the loop that processes MBs within a MB row.
-
- Sep 28, 2010
-
-
Adrian Grange authored
Enabled the first-pass encode to output the map of macroblock coding modes required by the AltRef filter.
-
Adrian Grange authored
Modified AltRef temporal filter to adapt filter length based on macroblock coding modes selected during first-pass encode. Also added sub-pixel motion compensation to the AltRef filter.
-
Timothy B. Terriberry authored
The existing code applied a 6-tap filter with 0's on either end. We're already paying the branch penalty to avoid computing the two extra columns needed as input to this filter. We might as well save time computing the filter as well. This reduces the inner loop from 21 instructions to 16, the number of loads per iteration from 4 to 1, and the number of multiplies from 7 to 4. The gain in overall decoding performance, however, is small (less than 1%). This change also means we now valgrind clean on ARMv6, which is its real purpose. The errors reported here were valgrind's fault (it does not detect that 0 times an uninitialized value is initialized), but Julian Seward says it would slow down valgrind considerably to make such checks. Speeding up libvpx rather, even by a small amount, seems a much better idea if only to enable proper valgrind checking of the rest of the codec. Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
-
- Sep 27, 2010
-
-
Paul Wilkins authored
This affects control of the active quantizer range. Change-Id: I30511fc81ac9f75ff20d9f1372382423d56739da
-
John Koleszar authored
Missed the .h file in the move. Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
-
- Sep 24, 2010
-
-
Timothy B. Terriberry authored
This function was accessing values below the stack pointer, which can be corrupted by signal delivery at any time. Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
-
Johann Koenig authored
previous implementation compared each set of values to limit and then &'d them together, requiring a compare and & for each value. this does the accumulation first, requiring only one compare Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
-
John Koleszar authored
This patch avoids compiling some debugging code in onyx_if.c. The most significant fix is to avoid generating code for vp8_write_yuv_frame, which is never called. Some other code was removed by the dead code elimination performed by the compiler, and this patch does it with the preprocessor instead. There are advantages both ways. Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
-
John Koleszar authored
reconintra_mt.c is only required for building the decoder right now. It could definitely be used for the encoder in the future, but it currently depends on decoder only data structures. (onyxd_int.h, VP8D_COMP, etc). Move it from common/ to decoder/ until the necessary changes to the common multithread code are complete. This patch is needed to build with --disable-vp8-decoder. Change-Id: I568c52221a2b309234d269675cba97131ce35c86
-
- Sep 23, 2010
-
-
John Koleszar authored
Having these symbols be available as functions rather than data is occasionally more convenient. Implemented this way rather than a get-codec-by-id style to avoid creating a link-time dependency between the encoder and the decoder. Fixes issue #169 Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
-
Yunqing Wang authored
In multi-threaded decoder, set different sync ranges for different video resolutions. Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
-
- Sep 22, 2010
-
-
Johann Koenig authored
The new loopfilter was originally introduced as an experimental change. It's permanent now. Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
-
- Sep 21, 2010
-
-
John Koleszar authored
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
-
John Koleszar authored
The MV decoding changes in c5fb0eb8 introduced a bug where the macroblock clamping state was reset for each partition, so if an earlier partition needed clamping but a subsequent one didn't, the MB wouldn't receive clamping. Instead, the state is only set during splitmv decoding, never cleared. Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
-
- Sep 20, 2010
-
-
Fritz Koenig authored
Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
-
Fritz Koenig authored
Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
-
Guillermo Ballester Valor authored
The patch related with issue #55 (5a72620d) fixed some warnings, but the fix was not optimal. It actually was a trick to confuse compiler rather than a fix. This patch fixes it by creating a new macro used when needed just a high limit check for an unsigned. Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
-
- Sep 17, 2010
-
-
Johann Koenig authored
the previous commit laid the groundwork by doing two sets of idcts together. this moved that further by grouping the interesting data (q[0], q+16[0]) together to allow using wider instructions. also managed to drop a few instructions by recognizing that the constant for sinpi8sqrt2 could be downshifted all the time which avoided a dowshift as well as workarounds for a function which only accepted signed data looks like a modest gain for performance: at qcif, went from ~180 fps to ~183 Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
-
Yunqing Wang authored
On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
-
- Sep 16, 2010
-
-
John Koleszar authored
These files aren't currently used, and we can get them back if we need them. Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
-
John Koleszar authored
This patch reduces the size of the global tables maintained by the tokenizer to 16k from 80k-96k. See issue #177. Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
-
- Sep 14, 2010
-
-
Fritz Koenig authored
There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
-
- Sep 10, 2010
-
-
Fritz Koenig authored
Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
-
- Sep 09, 2010
-
-
Scott LaVarnway authored
Improved the subset block search and fill. (about 3% improvement for 32 bit) Modified/merged the code in order to create vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock level. This will allow the decode loop (in the future) to decode modes/mvs on a frame, row, or mb level. Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
-
Johann Koenig authored
Expand 93c32a55 which used SSE2 instructions to do two idct/dequant/recons at a time to NEON. Initial working commit. More work needs to be put into rearranging and interlacing the data to take advantage of quadword operations, which is when we'll hopefully see a much better boost Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
-
John Koleszar authored
When ARFs are enabled in non-lagged compress modes, the GF interval was being reset to zero. Non-lagged ARF updates were enabled in commit 63ccfbd5, but this incorrect GF interval caused a quality regression. Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
-
John Koleszar authored
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
-
- Sep 08, 2010
-
-
Jim Bankoski authored
vp8_get_compressed_data() was defeating logic in encode_frame_to_datarate() that determined the reference buffers to search and forcing all frames to be eligible to search. In cases where buffers have identical contents, this is unnecessary extra work. Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
-
Jim Bankoski authored
ARFs were explicitly disabled except in lagged compress mode. New ARF logic allows for the ARF buffer to hold an older golden frame, which does not require lagged compress. Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
-
Fritz Koenig authored
Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
-
- Sep 03, 2010
-
-
Scott LaVarnway authored
Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
-
- Sep 02, 2010
-
-
John Koleszar authored
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
-
James Zern authored
Remove the dependency on postproc.c for the encoder in general, the only unchecked need for it is when CONFIG_PSNR is enabled. All other cases are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file will still be included. Additionally, when VP8_SET_POSTPROC is used with the encoder when post processing has been disabled an error will be returned. This addresses issue #153. Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
-
Yaowu Xu authored
This allows experiments of using different rounding and zerobin constants for 2nd order blocks. Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
-
John Koleszar authored
This is not the behavior that most users expect. Change-Id: I226126ea400c22cf1f7918e80ea7fe0771c569cb
-
Frank Galligan authored
There was an extremely rare deadlock that happened when one thread was waiting to start the loop filter on frame n while the other threads were starting to work on frame n+1. Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
-
- Sep 01, 2010
-
-
Yunqing Wang authored
This is a workaround for gLucid problem. Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
-
- Aug 31, 2010
-
-
Paul Wilkins authored
These changes improve the behaviour of the code with forced key frames sent in by a calling application. The sizing of the frames is still suboptimal for two pass in particular but the behaviour is much better than it was. Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
-
Johann Koenig authored
make the arm asm detokenizer work with the new structures Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
-