Commits · f30e8dd7bd8d91b230586ce18ccc1001a6af1a18 · Xiph.Org / aom-rav1e

Sep 24, 2010

combine max values and compare once · f30e8dd7

Johann Koenig authored 14 years ago

previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.

this does the accumulation first, requiring only one compare

Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323

f30e8dd7

Merge "move reconintra_mt to decoder (for now)" · dbd57c26
John Koleszar authored 14 years ago

dbd57c26
Merge "Adjust multi-thread sync ranges according to image sizes" · aab0f5b1
Yunqing Wang authored 14 years ago

aab0f5b1

move reconintra_mt to decoder (for now) · 48e76ff4

John Koleszar authored 14 years ago

reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.

This patch is needed to build with --disable-vp8-decoder.

Change-Id: I568c52221a2b309234d269675cba97131ce35c86

48e76ff4

configure: enable PIC for shared libs by default · e913eb97

John Koleszar authored 14 years ago

Shared libs generally require PIC, so this saves a little typing at
configure time.

Change-Id: I357d70cc68434f3283fee78873052d2b7d77c777

e913eb97

configure: add --enable-small · f9b2ca5b

John Koleszar authored 14 years ago

Build with -O2 rather than -O3, to dissuade the compiler from inlining
so much. See issue #1.

Change-Id: Iacb8ddb59125d3f01c5fea846b45a1c004c9aee0

f9b2ca5b

Merge "Add getter functions for the interface data symbols" · 329aaaf4
John Koleszar authored 14 years ago

329aaaf4

Sep 23, 2010

Add getter functions for the interface data symbols · fa7a55bb

John Koleszar authored 14 years ago

Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.

Fixes issue #169

Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d

fa7a55bb

Adjust multi-thread sync ranges according to image sizes · 8db5da29

Yunqing Wang authored 14 years ago

In multi-threaded decoder, set different sync ranges for
different video resolutions.

Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7

8db5da29

Sep 22, 2010

Remove dead code · 7fed3832

Johann Koenig authored 14 years ago

The new loopfilter was originally introduced as an experimental change.
It's permanent now.

Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546

7fed3832

Sep 21, 2010
- unset execute bit on c source · cdd20666
  John Koleszar authored 14 years ago
  
  Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
  cdd20666
- Merge "Fix typo" · a8a38bcf
  Johann Koenig authored 14 years ago
  
  a8a38bcf
- Fix typo · 0511cbff
  Johann Koenig authored 14 years ago
  
  Also, move with other ppc32 options Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
  0511cbff
- Merge "Don't reset mb clamping state during splitmv decoding" · 6f4c0435
  John Koleszar authored 14 years ago
  
  6f4c0435
- Don't reset mb clamping state during splitmv decoding · 4d391e8e
  John Koleszar authored 14 years ago
  
  The MV decoding changes in c5fb0eb8 introduced a bug where the macroblock clamping state was reset for each partition, so if an earlier partition needed clamping but a subsequent one didn't, the MB wouldn't receive clamping. Instead, the state is only set during splitmv decoding, never cleared. Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
  4d391e8e
- Merge "gitignore: initial version" · 3d5f8291
  John Koleszar authored 14 years ago
  
  3d5f8291
- Merge "configure: support for ppc32-linux-gcc" · 12651b3c
  John Koleszar authored 14 years ago
  
  12651b3c
- Merge "Add high limit check for unsigned parameters" · 015cfcaf
  John Koleszar authored 14 years ago
  
  015cfcaf
- Merge "Restructure multi-threaded decoder" · a23ccf8f
  Yunqing Wang authored 14 years ago
  
  a23ccf8f
Sep 20, 2010

Use movq instead of movdqu. · b7dc9398

Fritz Koenig authored 14 years ago

Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f

b7dc9398

Merge "Better choice of instruction filter mask comparision." · 1c906448
Fritz Koenig authored 14 years ago

1c906448
Merge "reorder data to use wider instructions" · 6cf2b4aa
Johann Koenig authored 14 years ago

6cf2b4aa
Merge "Update NEON wide idcts" · 9c9afbab
Johann Koenig authored 14 years ago

9c9afbab

Better choice of instruction filter mask comparision. · 8eae7fe7

Fritz Koenig authored 14 years ago

Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82

8eae7fe7

Add high limit check for unsigned parameters · 23690686

Guillermo Ballester Valor authored 14 years ago

The patch related with issue #55 (5a72620d) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.

This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.

Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5

23690686

Sep 17, 2010

reorder data to use wider instructions · 022323bf

Johann Koenig authored 14 years ago

the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data

looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf

022323bf

Restructure multi-threaded decoder · f857a850

Yunqing Wang authored 14 years ago

On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9

f857a850

Sep 16, 2010

cleanup: remove unused xprintf · 9100073e

John Koleszar authored 14 years ago

These files aren't currently used, and we can get them back if we
need them.

Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5

9100073e

Reduce size of tokenizer tables · 147b125b

John Koleszar authored 14 years ago

This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.

Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe

147b125b

Sep 15, 2010

Modify GET_GOT macro for performance. · 746439ef

Fritz Koenig authored 14 years ago

GET_GOT was producing a zero length call.  This resulted in
pipeline flushes occuring when returing from the assembly
functions.  Masked on out of order cores, but evident on
Atom cores.

Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd

746439ef

Sep 14, 2010

Removed unnecessary pxor. · 769f2424

Fritz Koenig authored 14 years ago

There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.

Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1

769f2424

Sep 13, 2010
- Merge "Make block access to frame buffer sequential" · 71a1c197
  Fritz Koenig authored 14 years ago
  
  71a1c197
- configure: support for ppc32-linux-gcc · 887d6ef4
  John Koleszar authored 14 years ago
  
  Fixes issue 89. Thanks to josejx for the patch. Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
  887d6ef4
- cosmetics: expand tabs in configure · 7f1a908b
  John Koleszar authored 14 years ago
  
  Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
  7f1a908b
Sep 10, 2010

Make block access to frame buffer sequential · a65cd3de

Fritz Koenig authored 14 years ago

Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.

Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d

a65cd3de

Sep 09, 2010

Merge "Improved subset block search" · a32ded1d
Scott LaVarnway authored 14 years ago

a32ded1d

Improved subset block search · c5fb0eb8

Scott LaVarnway authored 14 years ago

Improved the subset block search and fill.  (about 3% improvement for
32 bit)  Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.

Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3

c5fb0eb8

Update NEON wide idcts · 14ba7642

Johann Koenig authored 14 years ago

Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost

Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1

14ba7642

Fix GF interval for non-lagged ARFs · edcbb1c1

John Koleszar authored 14 years ago

When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd5, but this incorrect GF interval caused a quality regression.

Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3

edcbb1c1

Merge branch 'master' of git://review.webmproject.org/libvpx · 6d90f867
Fritz Koenig authored 14 years ago

6d90f867