Commits · e55974bf86714e6403f68c010b89fbd62a4f35e5 · Xiph.Org / aom-rav1e

Nov 18, 2011

Speed selection support for disabled reference frames · e55974bf

John Koleszar authored Nov 18, 2011

There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.

Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.

Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700

e55974bf

Nov 11, 2011

avoid resetting framerate during vpx_codec_enc_config_set() · bdd35c13

John Koleszar authored Nov 11, 2011

The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().

Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465

bdd35c13

Nov 08, 2011

Additional clipping of buffer level to maximum buffer size · fa25a31e

Adrian Grange authored Nov 07, 2011

Added additional check of buffer level against maximum
buffer size.

Change-Id: Iaf1fbaf008601161e402b43ce82c3dbc129bf740

fa25a31e

Added check to make sure maximum buffer size not exceeded · 9dc95b0a

Adrian Grange authored Nov 07, 2011

Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.

This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.

As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).

Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6

9dc95b0a

Oct 20, 2011

Fix: check cx_data buffer prior to write · bc715113

James Berry authored Oct 12, 2011

check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.

Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28

bc715113

Oct 11, 2011

Added rate-targeted temporal scalability · 217591fd

Adrian Grange authored Oct 06, 2011

Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.

The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)

Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a

217591fd

Reset FPU state after calc_plane_error() · 07ba4119

John Koleszar authored Oct 11, 2011

Fixes a MMX/SSE2 mismatch when building with --enable-internal-stats.

Change-Id: I0c50a1f246f6916b7a5fc6f36864ceb362f25520

07ba4119

Sep 30, 2011

CQ and two pass rate control. · b6e27d5f

Paul Wilkins authored Sep 15, 2011

Changes to the selection of Q limits for two pass
and two pass CQ mode.

Allowance made for Mode and motion vector costs.
Some refactoring of common code.

For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).

Some increased tendency to undershoot even when
user CQ not reached.

Patch2: Removed some test code accidentally merged.

Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066

b6e27d5f

Sep 29, 2011

Multithreaded encoder, late sync loopfilter · 380d64ec

Attila Nagy authored Sep 16, 2011

Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.

Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066

380d64ec

Aug 25, 2011

Minor modification on key frame decision · 1f20202e

Yunqing Wang authored Aug 25, 2011

This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.

Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2

1f20202e

Aug 19, 2011

Copy less when active map is in use · 4e8d35a4

Alpha Lam authored Aug 09, 2011

When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.

This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.

Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4

4e8d35a4

Aug 12, 2011

Revert "Improved 1-pass CBR rate control" · e9613170

John Koleszar authored Aug 12, 2011

This reverts commit b5ea2fbc. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.

Conflicts:

	vp8/encoder/onyx_if.c
	vp8/encoder/ratectrl.c

Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af

e9613170

Aug 03, 2011

Fix source buffer selection · 238dae86

John Koleszar authored Aug 03, 2011

This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.

Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77

238dae86

Aug 01, 2011
- Fix building with --disable-postproc · 06c3d5bb
  John Koleszar authored Aug 01, 2011
  
  Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
  06c3d5bb
Jul 26, 2011

cosmetics: consistently use [u]int64_t · b45065d3

James Zern authored Jul 25, 2011

Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a

b45065d3

Jul 22, 2011

fix sharpness bug and clean up · a04ed0e8

Johann Koenig authored Jul 20, 2011

sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db

a04ed0e8

Preload reference area to an intermediate buffer in sub-pixel motion search · 20bd1446

Yunqing Wang authored Jun 28, 2011

In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f

20bd1446

Jul 20, 2011

Increase chrow row alignment to 16 bytes. · 7d1b37cd

Timothy B. Terriberry authored Jul 20, 2011

This is done by expanding luma row to 32-byte alignment, since
 there is currently a bunch of code that assumes that
 uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
 common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
 encoder/temporal_filter.c, and possibly others; I haven't done a
 full audit).
It also uses replaces the hardcoded border of 16 in a number of
 encoder buffers with VP8BORDERINPIXELS (currently 32), as the
 chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
 dumping the frame memory as a contiguous blob produces a valid,
 if padded, image.

Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003

7d1b37cd

Jul 18, 2011

Improved 1-pass CBR rate control · b5ea2fbc

John Koleszar authored Jun 29, 2011

This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.

Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.

Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920

b5ea2fbc

Jul 14, 2011

Remove unused speed features · 04dce631

John Koleszar authored Jul 14, 2011

min_fs_radius, max_fs_radius, full_freq were set but never read.

Change-Id: I82657f4e7f2ba2acc3cbc3faa5ec0de5b9c6ec74

04dce631

Jul 13, 2011

Add improvements made in good-quality mode to real-time mode · 0e9a6ed7

Yunqing Wang authored Jul 13, 2011

Several improvements we made in good-quality mode can be added
into real-time mode to speed up encoding in speed 1, 2, and 3
with small quality loss. Tests using tulip clip showed:

--rt --cpu-used=-1
(before change)
PSNR: 38.028
time: 1m33.195s
(after change)
PSNR: 38.014
time: 1m20.851s

--rt --cpu-used=-2
(before change)
PSNR: 37.773
time: 0m57.650s
(after change)
PSNR: 37.759
time: 0m54.594s

--rt --cpu-used=-3
(before change)
PSNR: 37.392
time: 0m42.865s
(after change)
PSNR: 37.375
time: 0m41.949s

Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9

0e9a6ed7

Jul 08, 2011

New loop filter interface · 62295844

Attila Nagy authored Jun 10, 2011

Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.

Change works only with --target=generic-gnu
All other targets have to be updated!

Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c

62295844

Jul 07, 2011

Set VPX_FRAME_IS_DROPPABLE · 37de0b8b

John Koleszar authored Jul 07, 2011

Allow the encoder to inform the application that the encoded frame will not
be used as a reference.

Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860

37de0b8b

Jun 29, 2011

Change to arf boost calculation. · 11694aab

Paul Wilkins authored Jun 28, 2011

In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.

The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.

The new code can be deactivated using #define NEW_BOOST 0

In its current default state the code searches forwards and backwards
from the proposed  position of the next alt ref.

The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.

I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.

Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c

11694aab

Jun 28, 2011

Adding support for independent partitions · 4cb0ebe5

Stefan Holmer authored Jun 10, 2011

Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21

4cb0ebe5

Jun 23, 2011

Revert "Reduce overshoot in 1 pass rate control" · db67dcba

John Koleszar authored Jun 23, 2011

This reverts commit 212f6183.

Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.

Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5

db67dcba

Jun 03, 2011

Reduce overshoot in 1 pass rate control · 212f6183

John Koleszar authored May 03, 2011

This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.

Tested on the CIF set over 200-500kbps using these settings:

  --buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
  --undershoot-pct=100

Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:

  --rt --cpu-used=-4

The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:

  One pass:
    baseline:   1.29715
    this patch: 1.23664

  Two pass:
    baseline:   1.32702
    this patch: 1.37824

This change had a positive effect on our quality metrics as well:

  One pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -0.42 / 2.86 / 27.32
    Overall PSNR    -0.90 / 2.00 / 17.27
    SSIM            -0.05 / 3.95 / 37.46

  Two pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -4.47 / 4.35 / 35.99
    Overall PSNR    -3.40 / 4.18 / 36.46
    SSIM            -4.56 / 6.98 / 53.67

  One pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -5.21 /  0.01 / 3.30
    Overall PSNR    -8.10 / -0.38 / 1.21
    SSIM            -7.38 / -0.11 / 3.17
    (note: most values here were close to the mean, there were a few
     outliers on files that were very sensitive to golden frame size)

  Two pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    0.00 / 0.00 / 0.00
    Overall PSNR    0.00 / 0.00 / 0.00
    SSIM            0.00 / 0.00 / 0.00

Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.

Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.

Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716

212f6183

Jun 01, 2011

Fix code under #if CONFIG_INTERNAL_STATS. · 34ba1876
Ronald S. Bultje authored Jun 01, 2011
```
Change-Id: Iccbd78d91c3071b16fb3b2911523a22092652ecd
```
34ba1876

neon fast quantize block pair · 61f0c090

Tero Rintaluoma authored May 09, 2011

vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
 - Additional 3-6% speedup compared to neon optimized fast
   quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)

Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e

61f0c090

May 31, 2011

Initialize first_time_stamp_ever · 0a72f568

John Koleszar authored May 31, 2011

Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.

Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621

0a72f568

May 27, 2011

bug fix check frame buffer index before copy · 8795b525

James Berry authored May 27, 2011

in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy.  If two frames
share the same buffer the flags will already be
set correctly.

Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e

8795b525

Use hex search for realtime mode speed>4 · 4d052bdd

Yunqing Wang authored May 27, 2011

Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.

Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c

4d052bdd

May 20, 2011

disable trellis optimization for first pass · d5b8f786

Yaowu Xu authored May 19, 2011

also remove 2 #defines and 1 function declaration that are not in use.

Change-Id: I8f743d0e3dd9ebf1de24a8b0c30ff09f29b00c53

d5b8f786

May 19, 2011

bug fix active_worst_quality set below active_best_quality · caa1b28b

James Berry authored May 19, 2011

fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.

Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039

caa1b28b

cleanup: collect twopass variables · 63cb1a7c

John Koleszar authored May 19, 2011

This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.

This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.

Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d

63cb1a7c

Remove unused members of VP8_COMP · 04849772

John Koleszar authored May 19, 2011

Various members that were either completely unreferenced or written
and not read.

Change-Id: Ie41ebac0ff0364a76f287586e4fe09a68907806e

04849772

Move quantizer init functions to quantize.c · 87254e0b
John Koleszar authored May 19, 2011
```
Group related functions together.

Change-Id: I92fd779225b75a7204650f1decb713142c655d71
```
87254e0b

May 13, 2011

Restructure of activity masking code. · ff52bf36

Paul Wilkins authored May 12, 2011

This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages

It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.

Change-Id: Id39b19eace37574dc429f25aae810c203709629b

ff52bf36

May 12, 2011

Improve framerate adaptation · 5ed116e2

John Koleszar authored May 12, 2011

This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)

Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e

5ed116e2

Removed mv_bits_sadcost · 71a7501b

Scott LaVarnway authored May 12, 2011

This sad cost is being generated but never used.

Change-Id: I562eebdcb792b743770954feca365b5b37491ecd

71a7501b