Commits · b0660457fe46a48246e42a8e5c0ce78c0e2e4164 · Xiph.Org / aom-rav1e

Aug 19, 2010

Revert "Removed ssse3 sixtap code" · b0660457
Jim Bankoski authored 14 years ago
```
This reverts commit 6ea5bb85.
```
b0660457

Johann Koenig authored 14 years ago

move some things around, reorder some instructions

constant 0 is used several times. load it once per call in horiz,
once per loop in vert.

separate saturating instructions to avoid stalls.

just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0

document some stalls for further consideration

Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec

52852da7

Merge "fix armv6 simpleloop filter" · a522be29
Johann Koenig authored 14 years ago

a522be29

fix armv6 simpleloop filter · 467a0b99

Johann Koenig authored 14 years ago

test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.

Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27

467a0b99

Aug 18, 2010
- Removed ssse3 sixtap code · 6ea5bb85
  Scott LaVarnway authored 14 years ago
  
  Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
  6ea5bb85
Aug 16, 2010

Merge "store more vars than we removed" · 496cf8cc
John Koleszar authored 14 years ago

496cf8cc

store more vars than we removed · c75f3993

Johann Koenig authored 14 years ago

only saved r4-11+lr, but were storing r4-r12+lr

Change-Id: If77df1998af50e9badee7d99ef53543046434675

c75f3993

arm: fix missing dependency with --enable-shared · 9aa498b8

John Koleszar authored 14 years ago

The C version of the dequant/idct/add function depends on the C
version of the IDCT, but this isn't compiled in on ARM. Since this
code has asm version, we can just remove this file to eliminate the
link error.

Change-Id: I21de74d89d3765a1db2da27292b20727c53178e9

9aa498b8

Aug 13, 2010

move segmentation_common to encoder · 80d3923a

John Koleszar authored 14 years ago

vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.

Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa

80d3923a

Aug 12, 2010

framework for assembly version of the detokenizer · 9602799c

Johann Koenig authored 14 years ago

adds a compile time option: --enable-arm-asm-detok which pulls in
vp8/decoder/arm/detokenize.asm

currently about break even speed wise, but changes are pending to
the fill code (branch and load 3 bytes versus conditionally always
load one) and the error handling. Currently it doesn't handle zero
runs or overrunning the buffer.

this is really just so i don't have to rebase my changes all the
time to run benchmarks - now just need to replace one file!

Change-Id: I56d0e2354dc0ca3811bffd0e88fe1f952fa6c797

9602799c

update structure · 633646b7

Johann Koenig authored 14 years ago

mode_info_context->mbmi no longer gets copied up a level

Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995

633646b7

remove unused definition · 1ec7981c

Johann Koenig authored 14 years ago

asm_offsets contains some definitions which are no longer used. this
was one of them. v6 build works now

Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df

1ec7981c

Removed unnecessary MB_MODE_INFO copies · 9c7a0090

Scott LaVarnway authored 14 years ago

These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD.  As a result,
a large number compile errors had to be fixed.

Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3

9c7a0090

Aug 11, 2010

Merge "Finished vp8_sixtap_predict4x4_ssse3 function" · f5615b61
Scott LaVarnway authored 14 years ago

f5615b61
cosmetics: add missing 2D array braces · d22e2968
John Koleszar authored 14 years ago
```
Silences compile warning.

Change-Id: I4b207d97f8570fe29aa2710e4ce4f02e7e43b57a
```
d22e2968

avoid negative array subscript warnings · 392a9582

John Koleszar authored 14 years ago

The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.

Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5

392a9582

Finished vp8_sixtap_predict4x4_ssse3 function · b07e5b6f

Scott LaVarnway authored 14 years ago

Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3
assembly routines.  Also removed unused assembly.

Change-Id: I01c1021835f2edda9da706822345f217087ca0d0

b07e5b6f

rename DETOK_[AL] · c0ba42d3

Johann Koenig authored 14 years ago

everything else uses lowercase detok

Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04

c0ba42d3

Moved gf_active code to encoder only · 99f46d62

Scott LaVarnway authored 14 years ago

The gf_active code is only used by the encoder, so it was moved from
common and decoder.

Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025

99f46d62

Removed duplicate functions · c404fa42
Yaowu Xu authored 14 years ago
```
Change-Id: Ie587972ccefd3c762b8cdf8ef39345cd22924b9b
```
c404fa42

Normalize quantizer's zero bin and rounding factors · 3b95a46c

Yaowu Xu authored 14 years ago

This patch changes a few numbers in the two constant arrays
for quantizer's zerobin and rounding factors, in general to
make the sum of the two factors for any Q to be 128.  While
it might be beneficial to calibrate the two arrays for best
quantizer performance, it is not the purpose of this patch.
Normalizing the two arrays will enable quick optimization
of the current faster quantizer, i.e .zerobin check can be
removed.

Change-Id: If9abfd7929bf4b8e9ecd64a79d817c6728c820bd

3b95a46c

Add trellis quantization. · 8fa38096

Timothy B. Terriberry authored 14 years ago

Replace the exponential search for optimal rounding during
 quantization with a linear Viterbi trellis and enable it
 by default when using --best.
Right now this operates on top of the output of the adaptive
 zero-bin quantizer in vp8_regular_quantize_b() and gives a small
 gain.
It can be tested as a replacement for that quantizer by
 enabling the call to vp8_strict_quantize_b(), which uses
 normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
 in order to take advantage of activity masking, since there is
 limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
 quantizer (which is lambda-dependent) loses to
 vp8_regular_quantize_b() alone (which is not) on my test clip.

Patch Set 3:

Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.

Patch Set 2:

Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.

(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant  to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)

Change-Id:	Ib1a043b665d75fbf00cb0257b7c18e90eebab95e

8fa38096

Aug 10, 2010

Added ssse3 version of sixtap filters · e4fe8669

Scott LaVarnway authored 14 years ago

Improved decoder performance by 9% for the clip used.

Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9

e4fe8669

First modification of multi-thread decoder · ba2e107d

Yunqing Wang authored 14 years ago

This is the first modification of VP8 multi-thread decoder, which uses
same threads to decode macroblocks and then do loopfiltering for each
frame.

Inspired by Rob Clark, synchronization was done on every 8 macroblocks
instead of every macroblock to reduce lock contention.

Comparing with the original code, this implementation gave about 15%-
20% performance gain while decoding my test clips on a Core2 Quad
platform (Linux).

The work is not done yet.

Test on other platforms are needed.

Change-Id: Ice9ddb0b511af1359b9f71e65066143c04fef3b5

ba2e107d

Aug 09, 2010

Mark loopfilter C functions as static · 618c7d27

John Koleszar authored 14 years ago

Clang defaults to C99 mode, and inline works differently in C99.
(gcc, on the other hand, defaults to a special gnu-style inlining,
which uses different syntax.)   Making the functions static makes sure
clang doesn't decide to discard a function because it's too large to
inline.

Thanks to eli.friedman for the patch.

Fixes http://code.google.com/p/webm/issues/detail?id=114

Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4

618c7d27

Aug 02, 2010

Merge "Issue 150: Fixing linker warning in extend.c." · cfb204ea
John Koleszar authored 14 years ago

cfb204ea

configure: support directories containing .o · 4e6827a0

John Koleszar authored 14 years ago

Fixes http://code.google.com/p/webm/issues/detail?id=96

The regex which postprocesses the gcc make-deps (-M) output was too
greedy and matching in the dependencies part of the rule rather than
the target only. The patch provided with the issue was not correct, as
it tried to match the .o at the end of the line, which isn't correct
at least for my GCC version. This patch matches word characters
instead of .*

Thanks to raimue and the MacPorts community for isolating this issue.

Change-Id: I28510da2252e03db910c017101d9db12e5945a27

4e6827a0

nasm: avoid space before the :data symbol type. · 0e8f108f

Jan Kratochvil authored 14 years ago

global label:data
           ^^

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id:	I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021

0e8f108f

nasm: end labels with colon (':') · 0327d3df

Jan Kratochvil authored 14 years ago

Labels should end by colon (':'), nasm requires it.

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f

0327d3df

nasm: use OWORD vs DQWORD · c8134bc5

Jan Kratochvil authored 14 years ago

nasm knows only OWORD.  yasm knows both OWORD and DQWORD.

Provide nasm compatibility.  No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I62151390089e90df9a7667822fa594ac20b00e78

c8134bc5

Merge "Replace pinsrw (SSE) with MMX instructions" · 67529821
John Koleszar authored 14 years ago

67529821

Replace pinsrw (SSE) with MMX instructions · 7d243701

Philip Jägenstedt authored 14 years ago

Fixes http://code.google.com/p/webm/issues/detail?id=136

Change-Id:	I5a3e294061644a1a9718e8ba4a39548ede25cc42

7d243701

Jul 29, 2010
- apple: include proper mach primatives · 38a20e03
  John Koleszar authored 14 years ago
  
  Fixes implicit declaration warning for 'mach_task_self'. Patch courtesy of timeless at gmail.com Change-Id: I9991dedd1ccfddc092eca86705ecbc3b764b799d
  38a20e03
- Merge "Enable the switch between two versions of quantizer" · c2a8d8b5
  Yaowu Xu authored 14 years ago
  
  c2a8d8b5
Jul 28, 2010

Removed two unused global variables. · 062e6c18

Frank Galligan authored 14 years ago

Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems
because it was increasing the .bss by 1572864 bytes.

Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87

062e6c18

Enable the switch between two versions of quantizer · f95c80b6

Yaowu Xu authored 14 years ago

To facilitate more testing related to quantizer and rate
control, the old version quantizer is added back. old and
new quantizer can be switched back and forth by define or
un-define the macro "EXACT_QUANT".

Change-Id: Ia77e687622421550f10e9d65a9884128a79a65ff

f95c80b6

Jul 27, 2010

configure: pass original arguments through to make dist · 23d68a5f

John Koleszar authored 14 years ago

When running configure automatically through the make dist target,
reuse the arguments passed to the original configure command.

Change-Id: I40e5b8384d6485a565b91e6d2356d5bc9c4c5928

23d68a5f

Merge "msvs: fix install of codec sources" · aa82363c
John Koleszar authored 14 years ago

aa82363c

x86/sse2: disable asm quantizer · a570bbd4

Johann Koenig authored 14 years ago

follow up to Change I0e51492d: neon: disable asm quantizer

Now x86 doesn't segfault with --disable-runtime-cpu-detect and -p=2

Change-Id: I8ca127bb299198efebbcbd5a661e81788361933f

a570bbd4

Fix build w/o RTCD · b9a038a5

Johann Koenig authored 14 years ago

So many places to update ...

Change-Id: Ide957b40cc833f99c2d1849acade6850fbf7585d

b9a038a5