- Sep 21, 2010
-
-
Johann Koenig authored
Also, move with other ppc32 options Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
-
John Koleszar authored
-
John Koleszar authored
-
Yunqing Wang authored
-
- Sep 20, 2010
-
-
Fritz Koenig authored
Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
-
Fritz Koenig authored
-
Johann Koenig authored
-
Johann Koenig authored
-
Fritz Koenig authored
Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
-
Guillermo Ballester Valor authored
The patch related with issue #55 (5a72620d) fixed some warnings, but the fix was not optimal. It actually was a trick to confuse compiler rather than a fix. This patch fixes it by creating a new macro used when needed just a high limit check for an unsigned. Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
-
- Sep 17, 2010
-
-
Johann Koenig authored
the previous commit laid the groundwork by doing two sets of idcts together. this moved that further by grouping the interesting data (q[0], q+16[0]) together to allow using wider instructions. also managed to drop a few instructions by recognizing that the constant for sinpi8sqrt2 could be downshifted all the time which avoided a dowshift as well as workarounds for a function which only accepted signed data looks like a modest gain for performance: at qcif, went from ~180 fps to ~183 Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
-
Yunqing Wang authored
On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
-
- Sep 16, 2010
-
-
John Koleszar authored
These files aren't currently used, and we can get them back if we need them. Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
-
John Koleszar authored
This patch reduces the size of the global tables maintained by the tokenizer to 16k from 80k-96k. See issue #177. Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
-
- Sep 15, 2010
-
-
Fritz Koenig authored
GET_GOT was producing a zero length call. This resulted in pipeline flushes occuring when returing from the assembly functions. Masked on out of order cores, but evident on Atom cores. Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
-
- Sep 14, 2010
-
-
Fritz Koenig authored
There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
-
- Sep 13, 2010
-
-
Fritz Koenig authored
-
John Koleszar authored
Fixes issue 89. Thanks to josejx for the patch. Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
-
John Koleszar authored
Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
-
- Sep 10, 2010
-
-
Fritz Koenig authored
Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
-
- Sep 09, 2010
-
-
Scott LaVarnway authored
-
Scott LaVarnway authored
Improved the subset block search and fill. (about 3% improvement for 32 bit) Modified/merged the code in order to create vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock level. This will allow the decode loop (in the future) to decode modes/mvs on a frame, row, or mb level. Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
-
Johann Koenig authored
Expand 93c32a55 which used SSE2 instructions to do two idct/dequant/recons at a time to NEON. Initial working commit. More work needs to be put into rearranging and interlacing the data to take advantage of quadword operations, which is when we'll hopefully see a much better boost Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
-
John Koleszar authored
When ARFs are enabled in non-lagged compress modes, the GF interval was being reset to zero. Non-lagged ARF updates were enabled in commit 63ccfbd5, but this incorrect GF interval caused a quality regression. Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
-
John Koleszar authored
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
-
- Sep 08, 2010
-
-
Jim Bankoski authored
vp8_get_compressed_data() was defeating logic in encode_frame_to_datarate() that determined the reference buffers to search and forcing all frames to be eligible to search. In cases where buffers have identical contents, this is unnecessary extra work. Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
-
Jim Bankoski authored
ARFs were explicitly disabled except in lagged compress mode. New ARF logic allows for the ARF buffer to hold an older golden frame, which does not require lagged compress. Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
-
Fritz Koenig authored
Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
-
- Sep 03, 2010
-
-
Scott LaVarnway authored
Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
-
- Sep 02, 2010
-
-
John Koleszar authored
Change-Id: I184e927987544e9f34f890249b589ea13a93a330
-
John Koleszar authored
Change-Id: I0395ffa107651a773fd11d12682ab9372f76a90b
-
John Koleszar authored
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
-
John Koleszar authored
Changed to use QueryPerformanceCounter on Windows rather than only when building with MSVC, so that MSVC can link libs built with MinGW. Fixes issue #149. Change-Id: Ie2dc7edc8f4d096cf95ec5ffb1ab00f2d67b3e7d
-
John Koleszar authored
gcc -dumpmachine returns only 'mingw32' Change-Id: I774d05a97c5131fc12009e436712c319e54490a5
-
John Koleszar authored
Fixes http://code.google.com/p/webm/issues/detail?id=112 Thanks to Ramiro Polla for the issue/fix. Change-Id: I7f7b547a4ea3270e183f59280510066cc29a619e
-
James Zern authored
Remove the dependency on postproc.c for the encoder in general, the only unchecked need for it is when CONFIG_PSNR is enabled. All other cases are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file will still be included. Additionally, when VP8_SET_POSTPROC is used with the encoder when post processing has been disabled an error will be returned. This addresses issue #153. Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
-
John Koleszar authored
-
John Koleszar authored
-
Yaowu Xu authored
This allows experiments of using different rounding and zerobin constants for 2nd order blocks. Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
-