Commits · f496f601fb2e48390c822f89df60c6b2398026ab · Xiph.Org / aom-rav1e

Feb 12, 2013

Add tile column size limits (256 pixels min, 4096 pixels max). · f496f601

Ronald S. Bultje authored 12 years ago

This is after discussion with the hardware team. Update the unit test
to take these sizes into account. Split out some duplicate code into
a separate file so it can be shared.

Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5

f496f601

Clean up detokenize contextualization to be like tokenizer. · 491d0952
Ronald S. Bultje authored 12 years ago
```
Change-Id: I47174f797df2103da8913c6fb4f4e741817bae82
```
491d0952

Faster convolve8_avg. · 094e2572

Christian Duvivier authored 12 years ago

Implement convolve8_avg using common functions which are already optimized
instead of using more obscure ones which have only C versions. Encoder
overall speed-up of about 12%.

Change-Id: I8c57aa76936c8a48f22b115f19f61d9f2ae1e4b6

094e2572

Feb 11, 2013

butterfly inverse 4x4 ADST · 57e995ff

Jingning Han authored 12 years ago

fixed format issues.

Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.

Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960

57e995ff

Change rd thresholds and add speed trade off flags. · aec5bed3

Paul Wilkins authored 12 years ago

Experimental tweaks to various thresholds to measure
quality / speed trade off.

Add flag that allows static segmentation to be turned off
and disables it unless in the second pass of a two pass
encode.

Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b

aec5bed3

Feb 09, 2013

Bug fix: ssse3 version of subpixel did not match C code · eda30b41

Scott LaVarnway authored 12 years ago

A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters.
(vp9_sub_pel_filters_8lp)  Changed the order of the adds to fix this problem.
Also added ssse3 support for 4x4 subpixel filtering.

Change-Id: I475eaadae920794c2de5e01e9735c059a856518e

eda30b41

Port sadNxNx4d functions to x86inc.asm. · c0ce2ab3
Ronald S. Bultje authored 12 years ago
```
Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8
```
c0ce2ab3

Add sad64x64 and sad32x32 SSE2 versions. · 02ff360b

Ronald S. Bultje authored 12 years ago

Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this
makes them all slightly faster, particularly on x86-64. Remove SSE3
sad16x16 version, since the SSE2 version is now faster.

About 1.5% overall encoding speedup.

Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797

02ff360b

Make cost_coeffs() more efficient. · 639b863d

Ronald S. Bultje authored 12 years ago

Cache the constant offset in one variable to prevent re-loading that
in each loop iteration, and mark the function as inline so we can use
the fact that the transform size is always known in the caller.

Almost 1% faster encoding overall.

Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a

639b863d

Feb 08, 2013

Pass macroblock index to pick inter functions · 6125a1ed

John Koleszar authored 12 years ago

Pass the current mb row and column around rather than the
recon_yoffset and recon_uvoffset, since those offsets will
change from predictor to predictor, based on the reference
frame selection.

Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1

6125a1ed

Initial support for resolution changes on P-frames · 393b4856

John Koleszar authored 12 years ago

Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.

Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3

393b4856

Avoid allocating memory when resizing frames · c03d45de

John Koleszar authored 12 years ago

As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.

Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038

c03d45de

Adds a test for the VP8E_SET_SCALEMODE control · 88f99f4e

John Koleszar authored 12 years ago

Tests that the external interface to set the internal codec scaling
works as expected. Also updates the test to pull the height from
the decoded frame size rather than parsing the keyframe header,
in anticipation of allowing resolution changes on non-keyframes.

Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17

88f99f4e

Restore SSSE3 subpixel filters in new convolve framework · 29d47ac8

John Koleszar authored 12 years ago

This commit adds the 8 tap SSSE3 subpixel filters back into the code
underneath the convolve API. The C code is still called for 4x4
blocks, as well as compound prediction modes. This restores the
encode performance to be within about 8% of the baseline.

Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c

29d47ac8

Integerization of dct32x32 · dbccffe2

Yunqing Wang authored 12 years ago

Test on derf set showed 0.047% overall psnr change.

Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19

dbccffe2

Nearest / Zero Mv default entropy tweak. · bbede82f

Paul Wilkins authored 12 years ago

Tweak to default mode context to account for the fact
that when there are no non zero motion candidates
Nearest is now the preferred mode for coding a 0,0
vector.

Also resolve duplicate function name and typos.

Change-Id: I76802788d46c84e3d1c771be216a537ab7b12817

bbede82f

Feb 07, 2013

move dct/idct constants to a header file · e6ad9ab0

Yaowu Xu authored 12 years ago

also removed some un-unsed functions.

Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5

e6ad9ab0

Butterfly ADST based hybrid transform · d15e1da4

Jingning Han authored 12 years ago

Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.

Fixed BUILD warning.

Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.

This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.

Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d

d15e1da4

Added skip switches for SB32 and SB64 · 29731308

Paul Wilkins authored 12 years ago

Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.

Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.

Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8

29731308

Use fdct8x4 instead of fdct4x4 where the block size allows it. · 5cfd82bc

Ronald S. Bultje authored 12 years ago

This allows for faster SIMD implementations in the future (currently
there is no speed impact).

Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329

5cfd82bc

Use configure checks for various inline keywords. · aac73df1
Ronald S. Bultje authored 12 years ago
```
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
```
aac73df1

Feb 06, 2013

Add sse2 versions of sub_pixel_variance{32x32,64x64}. · a788e0fe
Ronald S. Bultje authored 12 years ago
```
7.5% faster overall encoding.

Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f
```
a788e0fe

Reindent segmentation code. · 55cafb61

Ronald S. Bultje authored 12 years ago

Indentation was off by 2 spaces for this particular block.

Change-Id: I1e587b7ad3eff77ade5521252d20c7bb2daa0f6d

55cafb61

Eliminate tautology · 31cbe2ed

John Koleszar authored 12 years ago

      Unreachable code
  that does nothing anyway
      removed forever.

Change-Id: I14105d2dd9dbc9d558f36464055e350dbeb45488

31cbe2ed

Fix mismatch after merge of the tiling patch. · 278df745
Ronald S. Bultje authored 12 years ago
```
Change-Id: I8ecc178b4d4069e721c7fec6d7631c00e4a3e5d5
```
278df745

Feb 05, 2013

[WIP] Add column-based tiling. · 1407bdc2

Ronald S. Bultje authored 12 years ago

This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.

There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log

Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.

The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.

Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.

TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
  should not cross tile edges, or we should emulate such borders as
  if they were off-frame, to limit error propagation to within one
  tile only. This doesn't have to be the default behaviour but could
  be an optional bitstream flag.

Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f

1407bdc2

Add SSE3 versions for sad{32x32,64x64}x4d functions. · 58c983d1
Ronald S. Bultje authored 12 years ago
```
Overall encoding about 15% faster.

Change-Id: I176a775c704317509e32eee83739721804120ff2
```
58c983d1

Convert subpixel filters to use convolve framework · 7a07eea1

John Koleszar authored 12 years ago

Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.

There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.

This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.

Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.

Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff

7a07eea1

Add 8-tap generic convolver · 5ca6a366

John Koleszar authored 12 years ago

This commit introduces a new convolution function which will be used to
replace the existing subpixel interpolation functions. It is much the
same as the existing functions, but allows for changing the filter
kernel on a per-pixel basis, and doesn't bake in knowledge of the
filter to be applied or the size of the resulting block into the
function name.

Replacing the existing subpel filters will come in a later commit.

Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91

5ca6a366

rewrite 4x4 idct and fdct · fa36981e

Yaowu Xu authored 12 years ago

This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.

Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0

fa36981e

Change definition of NearestMV. · 81043e8d

Paul Wilkins authored 12 years ago

This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.

Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a

81043e8d

Added vp9_short_idct1_32x32_c · 5780c4cb

Scott LaVarnway authored 12 years ago

and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1.  For the test clip used, the decoder performance improved
by 21+%.  Based on Yaowu's 16 point idct work.

Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43

5780c4cb

Feb 04, 2013

Re-factor code for rd thresholds. · 3ab53876

Paul Wilkins authored 12 years ago

Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.

Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.

Adjustment of thresholds for quality / speed tradeoff
to follow.

Change-Id: I117ee473157e151a1b93193d5f393449328de20d

3ab53876

re-write 8 point idct · 1eb79dc1

Yaowu Xu authored 12 years ago

to be consistent with idct16 and idct32.

Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204

1eb79dc1

a couple of minor fixes · ccaaeb4b

Yaowu Xu authored 12 years ago

fixed a function prototypes to prevent compiler warnings;
removed a function not in use;
un-capitialize "Refstride" to ref_stride

Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba

ccaaeb4b

Feb 01, 2013

Changes 16 point idct · 91e0e801

Yaowu Xu authored 12 years ago

This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.

Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.

Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63

91e0e801

Jan 31, 2013

fix a small bug in 16 point forward dct · ab1cad9b

Yaowu Xu authored 12 years ago

The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade

ab1cad9b

A fix point implementation of 32x32 idct · 5149d7f7

Yaowu Xu authored 12 years ago

This commit changes the 32x32 idct to use integer only. The algorithm
was taken directly from "A Fast Computational Algorithm for the
Discrete Cosine Tranform" by W. Chen, et al., which was published in
IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal
flow graph in the original paper is for a 32 point forward dct, the
current implementation of inverse DCT was done by follow the graph in
reversed direction.

With this implementation, the 32 point inverse dct contains a 16 point
inverse dct in its even portion, similarly the 16 point idct further
contains 8 point and 4 point inverse dcts.

As of patch 4, encoding tests showed there is no compression loss when
compared against the floating point baseline. Numbers even showed very
small postives. (cif: .01%, std-hd: .05%).

Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378

5149d7f7

Jan 30, 2013

don't code the branch for the predicted seg_id if that flag is false. · 3a4b18bc
Ronald S. Bultje authored 12 years ago
```
Change-Id: Icb6e21dc0c2d9918faa33c8bf70943660df7ad88
```
3a4b18bc

Default superblock skip flag to 32x32 for skip-blocks. · 3febf970

Ronald S. Bultje authored 12 years ago

This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.

Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be

3febf970