- Sep 24, 2010
-
-
Johann Koenig authored
previous implementation compared each set of values to limit and then &'d them together, requiring a compare and & for each value. this does the accumulation first, requiring only one compare Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
-
John Koleszar authored
-
Yunqing Wang authored
-
John Koleszar authored
reconintra_mt.c is only required for building the decoder right now. It could definitely be used for the encoder in the future, but it currently depends on decoder only data structures. (onyxd_int.h, VP8D_COMP, etc). Move it from common/ to decoder/ until the necessary changes to the common multithread code are complete. This patch is needed to build with --disable-vp8-decoder. Change-Id: I568c52221a2b309234d269675cba97131ce35c86
-
John Koleszar authored
Shared libs generally require PIC, so this saves a little typing at configure time. Change-Id: I357d70cc68434f3283fee78873052d2b7d77c777
-
John Koleszar authored
Build with -O2 rather than -O3, to dissuade the compiler from inlining so much. See issue #1. Change-Id: Iacb8ddb59125d3f01c5fea846b45a1c004c9aee0
-
John Koleszar authored
-
- Sep 23, 2010
-
-
John Koleszar authored
Having these symbols be available as functions rather than data is occasionally more convenient. Implemented this way rather than a get-codec-by-id style to avoid creating a link-time dependency between the encoder and the decoder. Fixes issue #169 Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
-
Yunqing Wang authored
In multi-threaded decoder, set different sync ranges for different video resolutions. Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
-
- Sep 22, 2010
-
-
Johann Koenig authored
The new loopfilter was originally introduced as an experimental change. It's permanent now. Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
-
- Sep 21, 2010
-
-
John Koleszar authored
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
-
Johann Koenig authored
-
Johann Koenig authored
Also, move with other ppc32 options Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
-
John Koleszar authored
-
John Koleszar authored
The MV decoding changes in c5fb0eb8 introduced a bug where the macroblock clamping state was reset for each partition, so if an earlier partition needed clamping but a subsequent one didn't, the MB wouldn't receive clamping. Instead, the state is only set during splitmv decoding, never cleared. Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
-
John Koleszar authored
-
John Koleszar authored
-
John Koleszar authored
-
Yunqing Wang authored
-
- Sep 20, 2010
-
-
Fritz Koenig authored
Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
-
Fritz Koenig authored
-
Johann Koenig authored
-
Johann Koenig authored
-
Fritz Koenig authored
Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
-
Guillermo Ballester Valor authored
The patch related with issue #55 (5a72620d) fixed some warnings, but the fix was not optimal. It actually was a trick to confuse compiler rather than a fix. This patch fixes it by creating a new macro used when needed just a high limit check for an unsigned. Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
-
- Sep 17, 2010
-
-
Johann Koenig authored
the previous commit laid the groundwork by doing two sets of idcts together. this moved that further by grouping the interesting data (q[0], q+16[0]) together to allow using wider instructions. also managed to drop a few instructions by recognizing that the constant for sinpi8sqrt2 could be downshifted all the time which avoided a dowshift as well as workarounds for a function which only accepted signed data looks like a modest gain for performance: at qcif, went from ~180 fps to ~183 Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
-
Yunqing Wang authored
On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
-
- Sep 16, 2010
-
-
John Koleszar authored
These files aren't currently used, and we can get them back if we need them. Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
-
John Koleszar authored
This patch reduces the size of the global tables maintained by the tokenizer to 16k from 80k-96k. See issue #177. Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
-
- Sep 15, 2010
-
-
Fritz Koenig authored
GET_GOT was producing a zero length call. This resulted in pipeline flushes occuring when returing from the assembly functions. Masked on out of order cores, but evident on Atom cores. Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
-
- Sep 14, 2010
-
-
Fritz Koenig authored
There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
-
- Sep 13, 2010
-
-
Fritz Koenig authored
-
John Koleszar authored
Fixes issue 89. Thanks to josejx for the patch. Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
-
John Koleszar authored
Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
-
- Sep 10, 2010
-
-
Fritz Koenig authored
Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
-
- Sep 09, 2010
-
-
Scott LaVarnway authored
-
Scott LaVarnway authored
Improved the subset block search and fill. (about 3% improvement for 32 bit) Modified/merged the code in order to create vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock level. This will allow the decode loop (in the future) to decode modes/mvs on a frame, row, or mb level. Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
-
Johann Koenig authored
Expand 93c32a55 which used SSE2 instructions to do two idct/dequant/recons at a time to NEON. Initial working commit. More work needs to be put into rearranging and interlacing the data to take advantage of quadword operations, which is when we'll hopefully see a much better boost Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
-
John Koleszar authored
When ARFs are enabled in non-lagged compress modes, the GF interval was being reset to zero. Non-lagged ARF updates were enabled in commit 63ccfbd5, but this incorrect GF interval caused a quality regression. Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
-