Commits · 385865f8202a8f273d6c9920134ddfbeb2b323da · Xiph.Org / aom-rav1e

Oct 25, 2010

Johann Koenig authored 14 years ago

clean up compiler warnings, man in the yellow hat warnings, and start to
remove unused #includes

Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6

385865f8

reuse common loopfilter code · 1376f061

Johann Koenig authored 14 years ago

there were four versions for the regular and
macroblock loopfilters:
horizontal [y|uv]
vertical [y|uv]

this moves all the common code into 2 functions:
vp8_loop_filter_neon
vp8_mbloop_filter_neon

this provides no gain in performance. there's a bit
of jitter, but it trends down ~0.25-0.5%. however,
this is a huge gain maintenance. also, there is the
potential to drop some stack usage in the macroblock
loopfilter.

Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92

1376f061

Add runtime CPU detection support for ARM. · b71962fd

Timothy B. Terriberry authored 14 years ago

The primary goal is to allow a binary to be built which supports
 NEON, but can fall back to non-NEON routines, since some Android
 devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
 Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
 which versions of each function to build, and when
 CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
 at run time.
In order for this to work, the CFLAGS must be set to something
 appropriate (e.g., without -mfpu=neon for ARMv7, and with
 appropriate -march and -mcpu for even earlier configurations), or
 the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
 required at build time, since the ARM assembler will refuse to emit
 them otherwise.
I have not attempted to make any changes to configure to do this
 automatically.
Doing so will probably require the addition of new configure options.

Many of the hooks for RTCD on ARM were already there, but a lot of
 the code had bit-rotted, and a good deal of the ARM-specific code
 is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
 of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
 site were expanded to check the RTCD flags at that site, but they
 should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
 these should be moved into an RTCD struct for thread safety (I
 believe every platform currently supported has atomic pointer
 stores, but this is not guaranteed).

The encoder's boolhuff functions did not even have _c and armv7
 suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
 version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
 used was rbit, and this was completely superfluous, so I reworked
 them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
 ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
 least ARMv5TE, I did not try to detect these at runtime, and simply
 enable them for ARMv5 and above.

Finally, the NEON register saving code was completely non-reentrant,
 since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
 and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
 and produced identical output, while using the correct accelerated
 functions on each.
I did not test on any earlier processors.

Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa

b71962fd

isolate new temporal filtering code · e81e30c2

Johann Koenig authored 14 years ago

onyx_if is getting pretty big. split out the temporal code to make it
easier to look at.

Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd

e81e30c2

Oct 22, 2010

Merge "Improve handling of invalid frames." · 3b9e72b2
John Koleszar authored 14 years ago
```
Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
```
3b9e72b2

Improve handling of invalid frames. · 09bcc1f7

Timothy B. Terriberry authored 14 years ago

The code was not checking for frame sizes smaller than 3 bytes, and the
partition size checks might have failed if the input buffer was within
16MB of the top of the heap.
In addition, the reference count on the current frame buffer was not
being decremented on error, so after a small number of errors, no new
frame buffer could be found and it would run off the list of them.

Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46

09bcc1f7

Convert [4][4] matrices to [16] arrays. · 8f75ea6b

Timothy B. Terriberry authored 14 years ago

Most of the code that actually uses these matrices indexes them as
 if they were a single contiguous array, and coverity produces
 reports about the resulting accesses that overflow the static
 bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
 arrays should eliminate the report, and removes a good deal of
 extraneous indexing and address operators from the code.

Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23

8f75ea6b

Oct 21, 2010

Change altref times to preceding pts+1. · 45e64941

Frank Galligan authored 14 years ago

Change the pts of the altref frame to be as close as possible to the
pts of the preceding frame and still be strictly increasing.

Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7

45e64941

Merge "Move firstpass motion map to stats packet" · 1ee3ebcd
John Koleszar authored 14 years ago

1ee3ebcd

Move firstpass motion map to stats packet · bb7dd5b1

John Koleszar authored 14 years ago

The first implementation of the firstpass motion map for motion
compensated temporal filtering created a file, fpmotionmap.stt,
in the current working directory. This was not safe for multiple
encoder instances. This patch merges this data into the first pass
stats packet interface, so that it is handled like the other
(numerical) firstpass stats.

The new stats packet is defined as follows:
    Numerical Stats (16 doubles) -- 128 bytes
    Motion Map                   -- 1 byte / Macroblock
    Padding                      -- to align packet to 8 bytes

The fpmotionmap.stt file can still be generated for debugging
purposes in the same way that the textual version of the stats
are available (defining OUTPUT_FPF in firstpass.c)

Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca

bb7dd5b1

Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm · 4cefb443
Yunqing Wang authored 14 years ago
```
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
```
4cefb443
Merge "Rewrite vp8_short_walsh4x4_sse2()" · 31752f2f
Yunqing Wang authored 14 years ago

31752f2f
Merge "Add SSE2 subtract functions" · 09187475
Yunqing Wang authored 14 years ago

09187475

Rewrite vp8_short_walsh4x4_sse2() · fc94ffce

Yunqing Wang authored 14 years ago

This rewriting reflects changes made in commit "Improve the
accuracy of forward walsh-hadamard transform". Since this function
is not called much, only a small encoder performance gain (~0.5% )
is seen.

Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b

fc94ffce

Oct 20, 2010
- Merge "Update arnr strength range form 1-6 to 0-6." · bdf469c9
  John Koleszar authored 14 years ago
  
  bdf469c9
- Update arnr strength range form 1-6 to 0-6. · 15542721
  Frank Galligan authored 14 years ago
  
  Change-Id: I8eb49c56f7509f0a8074d440e8345b9e3344b85b
  15542721
Oct 19, 2010
- Merge "fixed a typo that mis-used Y plane stride for UV blocks." · fc2f8daf
  Yaowu Xu authored 14 years ago
  
  fc2f8daf
- Merge "change to make use of more trellis quantization" · b9fe6d4d
  Yaowu Xu authored 14 years ago
  
  b9fe6d4d
Oct 18, 2010

Add SSE2 subtract functions · 4db20765

Yunqing Wang authored 14 years ago

Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.

Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e

4db20765

copy compiler warning fixes · ce1ce992

Johann Koenig authored 14 years ago

generic version got fixed, but not the arm version. fixes:
vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
argument 5 of 'fn_ptr->sdx3f' differ in signedness
vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
argument is of type 'int *'

and another unsigned change to keep the files similar

Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db

ce1ce992

Oct 15, 2010

remove dead code · 963bcd6c

Johann Koenig authored 14 years ago

vp8_diamond_search_sadx4 isn't used in arm because there is no
corrosponding sdx4df as in x86. rather than keep it in sync with
../mcomp.c, delete it

vp8_hex_search had the original, more readable/understandable code if`d
out. it's also available in ../mcomp.c, so remove the dead copy

Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e

963bcd6c

change to make use of more trellis quantization · 2e53e9e5

Yaowu Xu authored 14 years ago

when a subsequent frame is encoded as an alt reference frame, it is
unlikely that any mb in current frame will be used as reference for
future frames, so we can enable quantization optimization even when
the RD constant is slightly rate-biased. The change has an overall
benefit between 0.1% to 0.2% bit savings on the test sets based on
vpxssim scores.

Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea

2e53e9e5

Oct 14, 2010

Merge "Fix one gcc compiler warning" · a2b598a2
Yunqing Wang authored 14 years ago

a2b598a2

Fix one gcc compiler warning · 7804befb

Yunqing Wang authored 14 years ago

../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’:
../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’

Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f

7804befb

Merge "Improve bounds checking in vp8_diamond_search_sadx4()" · 7f31d987
Yunqing Wang authored 14 years ago

7f31d987

Improve bounds checking in vp8_diamond_search_sadx4() · d6da7b8e

Yunqing Wang authored 14 years ago

In order to know if all 4/8 neighbor points are within the bounds,
4 bounds checking are enough instead of checking 4 bounds for
each points (16/32 checkings). This improvement reduces cost of
vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
performance gain (test options: 1 pass, good, speed=4).

Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a

d6da7b8e

Fix compiler warning about vp8_fast_quantize_b_impl_ssse2. · 1dc0ca13

Fritz Koenig authored 14 years ago

Typo had function defined as _ssse2 and prototyped as _sse2.

Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe

1dc0ca13

Oct 13, 2010

Correct QWORD usage in assembly files · 92df4a06

Fritz Koenig authored 14 years ago

QWORD was being undefined because it was being used
incorrectly.

Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876

92df4a06

Add processor dectection for x86. · 0f5c63e4

Fritz Koenig authored 14 years ago

Use cpuid to check the vendor string against known
architectures.

Change-Id: I3fbd7f73638d71857a0c4a44a6275eb295fb4cef

0f5c63e4

Oct 12, 2010

GCC inline restrictions were not adequate. · e50f5d40

Fritz Koenig authored 14 years ago

=r was not restrictive enough and the compiler was not returning
ebx correctly.

Change-Id: I7606e384067bd5fb69189802f1ff64ccc5aa02d6

e50f5d40

Centralize mb skip state calculation · 13685747

John Koleszar authored 14 years ago

This patch moves the scattered updates to the mb skip state
(mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
changes to the quantizer exposed a bug where if a macroblock
could be coded as a skip but isn't, the encoder would run the
loopfilter but the decoder wouldn't, causing a reference buffer
mismatch.

The loopfilter is controlled by a flag called dc_diff. The decoder
looks at the number of decoded coefficients when setting this flag.
The encoder sets this flag based on the skip state, since any
skippable macroblock should be transmitted as a skip. The coefficient
optimization pass (vp8_optimize_b()) could change the coefficients
such that a block that was not a skip becomes one. The encoder was
not updating the skip state in this situation for intra coded blocks.

The underlying issue predates it, but this bug was recently triggered
by enabling trellis quantization on the Y2 block in commit dcd29e36,
and by changing the quantizer range control in commit 305be4e4.

Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802

13685747

Merge "Add const qualifiers to variance/SAD functions." · acff1627
John Koleszar authored 14 years ago

acff1627

Add const qualifiers to variance/SAD functions. · f4a85944

Timothy B. Terriberry authored 14 years ago

These functions should never change their input, and there's no
 reason not to declare that.
This allows them to be passed static const data.

Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c

f4a85944

Merge "Move vp8_strict_quantize_b inside EXACT_QUANT #define." · 037345eb
John Koleszar authored 14 years ago

037345eb
Merge "Remove INTRARDOPT #define and intra_rd_opt option." · fc018e0d
John Koleszar authored 14 years ago

fc018e0d

Oct 11, 2010

Move vp8_strict_quantize_b inside EXACT_QUANT #define. · 82c43398

Timothy B. Terriberry authored 14 years ago

There is currently no inexact version of this function, so do not
 even compile it without EXACT_QUANT.
This will prevent someone from inadvertently trying to use it without
 the proper EXACT_QUANT setup.

Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d

82c43398

Remove INTRARDOPT #define and intra_rd_opt option. · dd08db93

Timothy B. Terriberry authored 14 years ago

This is just eliminating some cruft.
Although a number of variables are declared only when INTRARDOPT
 is defined, they are used elsewhere without that protection, and
 no longer just for intra RDO.
The intra_rd_opt flag was hard-coded to 1 and never checked.

Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f

dd08db93

Merge "Added vp8_fast_quantize_b_sse2" · 6b1b28a8
Scott LaVarnway authored 14 years ago

6b1b28a8
Merge "Remove ivfenc usage message leading underscores" · 4d2b178a
John Koleszar authored 14 years ago

4d2b178a

Remove ivfenc usage message leading underscores · 78f2d3ed

John Koleszar authored 14 years ago

An earlier automatic transform changed eg '\nOptions' to '\n_options'
which is incorrect in these printfs. Fix these.

Change-Id: I7e0f37931ef82b79fadddd7058ce0df5572e2ca1

78f2d3ed