Commits · 0511cbff7a9eb1869ec9b4c534e7a4989ac42e8a · Xiph.Org / aom-rav1e

Sep 21, 2010
- Fix typo · 0511cbff
  Johann Koenig authored 14 years ago
  
  Also, move with other ppc32 options Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
  0511cbff
- Merge "configure: support for ppc32-linux-gcc" · 12651b3c
  John Koleszar authored 14 years ago
  
  12651b3c
- Merge "Add high limit check for unsigned parameters" · 015cfcaf
  John Koleszar authored 14 years ago
  
  015cfcaf
- Merge "Restructure multi-threaded decoder" · a23ccf8f
  Yunqing Wang authored 14 years ago
  
  a23ccf8f
Sep 20, 2010

Fritz Koenig authored 14 years ago

Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f

b7dc9398

Merge "Better choice of instruction filter mask comparision." · 1c906448
Fritz Koenig authored 14 years ago

1c906448
Merge "reorder data to use wider instructions" · 6cf2b4aa
Johann Koenig authored 14 years ago

6cf2b4aa
Merge "Update NEON wide idcts" · 9c9afbab
Johann Koenig authored 14 years ago

9c9afbab

Better choice of instruction filter mask comparision. · 8eae7fe7

Fritz Koenig authored 14 years ago

Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82

8eae7fe7

Add high limit check for unsigned parameters · 23690686

Guillermo Ballester Valor authored 14 years ago

The patch related with issue #55 (5a72620d) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.

This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.

Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5

23690686

Sep 17, 2010

reorder data to use wider instructions · 022323bf

Johann Koenig authored 14 years ago

the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data

looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf

022323bf

Restructure multi-threaded decoder · f857a850

Yunqing Wang authored 14 years ago

On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9

f857a850

Sep 16, 2010

cleanup: remove unused xprintf · 9100073e

John Koleszar authored 14 years ago

These files aren't currently used, and we can get them back if we
need them.

Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5

9100073e

Reduce size of tokenizer tables · 147b125b

John Koleszar authored 14 years ago

This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.

Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe

147b125b

Sep 15, 2010

Modify GET_GOT macro for performance. · 746439ef

Fritz Koenig authored 14 years ago

GET_GOT was producing a zero length call.  This resulted in
pipeline flushes occuring when returing from the assembly
functions.  Masked on out of order cores, but evident on
Atom cores.

Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd

746439ef

Sep 14, 2010

Removed unnecessary pxor. · 769f2424

Fritz Koenig authored 14 years ago

There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.

Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1

769f2424

Sep 13, 2010
- Merge "Make block access to frame buffer sequential" · 71a1c197
  Fritz Koenig authored 14 years ago
  
  71a1c197
- configure: support for ppc32-linux-gcc · 887d6ef4
  John Koleszar authored 14 years ago
  
  Fixes issue 89. Thanks to josejx for the patch. Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
  887d6ef4
- cosmetics: expand tabs in configure · 7f1a908b
  John Koleszar authored 14 years ago
  
  Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
  7f1a908b
Sep 10, 2010

Make block access to frame buffer sequential · a65cd3de

Fritz Koenig authored 14 years ago

Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.

Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d

a65cd3de

Sep 09, 2010

Merge "Improved subset block search" · a32ded1d
Scott LaVarnway authored 14 years ago

a32ded1d

Improved subset block search · c5fb0eb8

Scott LaVarnway authored 14 years ago

Improved the subset block search and fill.  (about 3% improvement for
32 bit)  Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.

Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3

c5fb0eb8

Update NEON wide idcts · 14ba7642

Johann Koenig authored 14 years ago

Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost

Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1

14ba7642

Fix GF interval for non-lagged ARFs · edcbb1c1

John Koleszar authored 14 years ago

When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd5, but this incorrect GF interval caused a quality regression.

Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3

edcbb1c1

Merge branch 'master' of git://review.webmproject.org/libvpx · 6d90f867
Fritz Koenig authored 14 years ago

6d90f867

Use WebM in copyright notice for consistency · c2140b8a

John Koleszar authored 14 years ago

Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba

c2140b8a

Sep 08, 2010

Skip unnecessary search of identical frames · 69ae8f47

Jim Bankoski authored 14 years ago

vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.

Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114

69ae8f47

Enable ARFs for non-lagged compress · 63ccfbd5

Jim Bankoski authored 14 years ago

ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.

Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79

63ccfbd5

Bilinear subpixel optimizations for ssse3. · 3fb37162

Fritz Koenig authored 14 years ago

Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.

Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea

3fb37162

Sep 03, 2010

Reduced the size of MB_MODE_INFO · 0de458f6

Scott LaVarnway authored 14 years ago

Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK.  Also reduced the size of other members
of the MB_MODE_INFO struct.  For 1080p, the memory was reduced
by 1,209,516 bytes.  The decoder performance appeared to improve
by 3% for the clip used.
Note:  The main goal for this change is to improve the decoder
performance.  The encoder will be revisited at a later date for
further structure cleanup.

Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613

0de458f6

Sep 02, 2010

Update CHANGELOG for v0.9.2 release · b0519a26
John Koleszar authored 14 years ago
```
Change-Id: I184e927987544e9f34f890249b589ea13a93a330
```
b0519a26
Update AUTHORS · e4b50024
John Koleszar authored 14 years ago
```
Change-Id: I0395ffa107651a773fd11d12682ab9372f76a90b
```
e4b50024
Whitespace: nuke CRLFs · 4496db45
John Koleszar authored 14 years ago
```
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
```
4496db45

Use native win32 timers on mingw · daab4bcb

John Koleszar authored 14 years ago

Changed to use QueryPerformanceCounter on Windows rather than only
when building with MSVC, so that MSVC can link libs built with
MinGW.

Fixes issue #149.

Change-Id: Ie2dc7edc8f4d096cf95ec5ffb1ab00f2d67b3e7d

daab4bcb

Fix target detection on mingw32 · d6ee72a7

John Koleszar authored 14 years ago

gcc -dumpmachine returns only 'mingw32'

Change-Id: I774d05a97c5131fc12009e436712c319e54490a5

d6ee72a7

Use -fno-common for mingw · 21039ce1

John Koleszar authored 14 years ago

Fixes http://code.google.com/p/webm/issues/detail?id=112

Thanks to Ramiro Polla for the issue/fix.

Change-Id: I7f7b547a4ea3270e183f59280510066cc29a619e

21039ce1

encoder: remove postproc dependency · 76640f85

James Zern authored 14 years ago

Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.

Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.

This addresses issue #153.

Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089

76640f85

Merge "added separate rounding/zbin constants for 2nd order" · 7a3e0a1d
John Koleszar authored 14 years ago

7a3e0a1d
Merge "Disable frame dropping by default" · 9398be0f
John Koleszar authored 14 years ago

9398be0f

added separate rounding/zbin constants for 2nd order · fca12920

Yaowu Xu authored 14 years ago

This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.

Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71

fca12920