Commits · e4d43c21c709572349b0226ac62c6c0e4c95132b · Xiph.Org / aom-rav1e

Sep 28, 2010

Merge "update gitignore" · e4d43c21
Johann Koenig authored 14 years ago

e4d43c21

Johann Koenig authored 14 years ago

this was excluding all .asm files when it should have just been .asm
files in the top level directory and .asm.s files lower down. also be
more restrictive on some other items, and run the whole thing through
sort to keep it organized

Change-Id: Ia48525033226b13098a491ce89465d0377b990c2

6fa5c24a

Add 4-tap version of 2nd-pass ARMv6 MC filter. · 18dc92fd

Timothy B. Terriberry authored 14 years ago

The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
 extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
 of loads per iteration from 4 to 1, and the number of multiplies
 from 7 to 4.
The gain in overall decoding performance, however, is small (less
 than 1%).

This change also means we now valgrind clean on ARMv6, which is
 its real purpose.
The errors reported here were valgrind's fault (it does not detect
 that 0 times an uninitialized value is initialized), but Julian
 Seward says it would slow down valgrind considerably to make such
 checks.
Speeding up libvpx rather, even by a small amount, seems a much
 better idea if only to enable proper valgrind checking of the
 rest of the codec.

Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16

18dc92fd

Sep 27, 2010
- Badly placed initialization of rolling rate monitors. · 305be4e4
  Paul Wilkins authored 14 years ago
  
  This affects control of the active quantizer range. Change-Id: I30511fc81ac9f75ff20d9f1372382423d56739da
  305be4e4
- move reconintra_mt to decoder (fixup) · 2b521ab5
  John Koleszar authored 14 years ago
  
  Missed the .h file in the move. Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
  2b521ab5
- Merge "disable compilation of debugging code" · 9fdcdc51
  John Koleszar authored 14 years ago
  
  9fdcdc51
- Merge "combine max values and compare once" · 063be9b8
  Johann Koenig authored 14 years ago
  
  063be9b8
- Merge "Fix valgrind errors in vp8_sixtap_predict8x4_armv6()." · b955a69b
  Johann Koenig authored 14 years ago
  
  b955a69b
- Merge "darwin-icc: build for specific SDKs" · 02e8a7bb
  John Koleszar authored 14 years ago
  
  02e8a7bb
Sep 24, 2010

Fix valgrind errors in vp8_sixtap_predict8x4_armv6(). · e2795e99

Timothy B. Terriberry authored 14 years ago

This function was accessing values below the stack pointer, which
 can be corrupted by signal delivery at any time.

Change-Id: I92945b30817562eb0340f289e74c108da72aeaca

e2795e99

combine max values and compare once · f30e8dd7

Johann Koenig authored 14 years ago

previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.

this does the accumulation first, requiring only one compare

Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323

f30e8dd7

Merge "move reconintra_mt to decoder (for now)" · dbd57c26
John Koleszar authored 14 years ago

dbd57c26

disable compilation of debugging code · 8ca779ab

John Koleszar authored 14 years ago

This patch avoids compiling some debugging code in onyx_if.c. The most
significant fix is to avoid generating code for vp8_write_yuv_frame,
which is never called. Some other code was removed by the dead code
elimination performed by the compiler, and this patch does it with the
preprocessor instead. There are advantages both ways.

Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458

8ca779ab

darwin-icc: build for specific SDKs · cbdc1298

John Koleszar authored 14 years ago

Add the missing -isysroot and -mmacosx-version-min flags to ICC builds.
Fixes issue #185.

Change-Id: I2fb37fcaaafef7122a61ced603569f4aa17f8bbc

cbdc1298

Merge "Adjust multi-thread sync ranges according to image sizes" · aab0f5b1
Yunqing Wang authored 14 years ago

aab0f5b1

move reconintra_mt to decoder (for now) · 48e76ff4

John Koleszar authored 14 years ago

reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.

This patch is needed to build with --disable-vp8-decoder.

Change-Id: I568c52221a2b309234d269675cba97131ce35c86

48e76ff4

configure: enable PIC for shared libs by default · e913eb97

John Koleszar authored 14 years ago

Shared libs generally require PIC, so this saves a little typing at
configure time.

Change-Id: I357d70cc68434f3283fee78873052d2b7d77c777

e913eb97

configure: add --enable-small · f9b2ca5b

John Koleszar authored 14 years ago

Build with -O2 rather than -O3, to dissuade the compiler from inlining
so much. See issue #1.

Change-Id: Iacb8ddb59125d3f01c5fea846b45a1c004c9aee0

f9b2ca5b

Merge "Add getter functions for the interface data symbols" · 329aaaf4
John Koleszar authored 14 years ago

329aaaf4

Sep 23, 2010

Add getter functions for the interface data symbols · fa7a55bb

John Koleszar authored 14 years ago

Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.

Fixes issue #169

Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d

fa7a55bb

Adjust multi-thread sync ranges according to image sizes · 8db5da29

Yunqing Wang authored 14 years ago

In multi-threaded decoder, set different sync ranges for
different video resolutions.

Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7

8db5da29

Sep 22, 2010

Remove dead code · 7fed3832

Johann Koenig authored 14 years ago

The new loopfilter was originally introduced as an experimental change.
It's permanent now.

Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546

7fed3832

Sep 21, 2010
- unset execute bit on c source · cdd20666
  John Koleszar authored 14 years ago
  
  Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
  cdd20666
- Merge "Fix typo" · a8a38bcf
  Johann Koenig authored 14 years ago
  
  a8a38bcf
- Fix typo · 0511cbff
  Johann Koenig authored 14 years ago
  
  Also, move with other ppc32 options Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
  0511cbff
- Merge "Don't reset mb clamping state during splitmv decoding" · 6f4c0435
  John Koleszar authored 14 years ago
  
  6f4c0435
- Don't reset mb clamping state during splitmv decoding · 4d391e8e
  John Koleszar authored 14 years ago
  
  The MV decoding changes in c5fb0eb8 introduced a bug where the macroblock clamping state was reset for each partition, so if an earlier partition needed clamping but a subsequent one didn't, the MB wouldn't receive clamping. Instead, the state is only set during splitmv decoding, never cleared. Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
  4d391e8e
- Merge "gitignore: initial version" · 3d5f8291
  John Koleszar authored 14 years ago
  
  3d5f8291
- Merge "configure: support for ppc32-linux-gcc" · 12651b3c
  John Koleszar authored 14 years ago
  
  12651b3c
- Merge "Add high limit check for unsigned parameters" · 015cfcaf
  John Koleszar authored 14 years ago
  
  015cfcaf
- Merge "Restructure multi-threaded decoder" · a23ccf8f
  Yunqing Wang authored 14 years ago
  
  a23ccf8f
Sep 20, 2010

Use movq instead of movdqu. · b7dc9398

Fritz Koenig authored 14 years ago

Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f

b7dc9398

Merge "Better choice of instruction filter mask comparision." · 1c906448
Fritz Koenig authored 14 years ago

1c906448
Merge "reorder data to use wider instructions" · 6cf2b4aa
Johann Koenig authored 14 years ago

6cf2b4aa
Merge "Update NEON wide idcts" · 9c9afbab
Johann Koenig authored 14 years ago

9c9afbab

Better choice of instruction filter mask comparision. · 8eae7fe7

Fritz Koenig authored 14 years ago

Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82

8eae7fe7

Add high limit check for unsigned parameters · 23690686

Guillermo Ballester Valor authored 14 years ago

The patch related with issue #55 (5a72620d) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.

This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.

Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5

23690686

Sep 17, 2010

reorder data to use wider instructions · 022323bf

Johann Koenig authored 14 years ago

the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data

looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf

022323bf

Restructure multi-threaded decoder · f857a850

Yunqing Wang authored 14 years ago

On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9

f857a850

Sep 16, 2010

cleanup: remove unused xprintf · 9100073e

John Koleszar authored 14 years ago

These files aren't currently used, and we can get them back if we
need them.

Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5

9100073e