Commits · 86a74779deb4e69f890c5b4c983759b2ca03dc3e · Timothy B. Terriberry / rav1e

Mar 28, 2019

Return blocks by reference, not by value · 86a74779

Romain Vimont authored 5 years ago

The methods above_of(), left_of() and above_left_of() returned the
matching block by value, or a default block if the offset resulted in a
block outside boundaries.

The Block structure is quite big (std::mem::size_of::<Block>() == 120).
For reading a field, it is probably not optimal to return a whole Block
copy or a new default block (although the compiler might optimize such
accesses).

Moreover, the boundaries checks were often redundant, because already
done by the callers.

Instead, let the callers check boundaries and return a reference to the
matching block.

86a74779

Simplify BlockContext logic · 18ad2b5e
Romain Vimont authored 5 years ago
```
Rewrite conditions to make them easier to read.
```
18ad2b5e

Create RefType enum. · db58658a

Thomas Daede authored 6 years ago

Moves all functions that previously used usize to this type.
Instead of direct conversions to a slot number, use a to_index fn.

This also changes the size of the global mv state and context
ref counting arrays as they don't need LAST_FRAME.

db58658a

Mar 27, 2019

RDO of transform size for intra block · 6f320b1c

Yushin Cho authored 5 years ago and

Yushin Cho committed 5 years ago

Enabled for speed <= 2, i.e. when
fi.config.speed_settings.rdo_tx_decision == true.

AWCY result for speed 2

   PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
-1.3858 | -1.1944 | -0.8549 |  -0.9902 | -1.3141 | -0.9488 |    -1.1913

With encoding time increased < ~5%.

6f320b1c

Mar 26, 2019
- Fix deblocking code to deal with luma TX partitioning · 904cfa2d
  Monty Montgomery authored 5 years ago and Yushin Cho committed 5 years ago
  
  Subpartitioning the luma TX to sub-partition size broke the Chroma deblocking code. This patch implements proper luma/chroma tx size determination in the deblocking code.
  904cfa2d
- Disable tx partition (intra) for 422 chroma mode · c51dc9f6
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  c51dc9f6
- Revise rdo_tx_size_type() for tx_size decision · 6fb6ab1b
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  Also add asserts to check the # of tx block for chroma is only one. Later, this asserts become not valid if partition size is > 64x64 and chroma format is not 420.
  6fb6ab1b
- Add tx_size write function for intra mode block and its required sub functions · d6e08b5f
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  Requireed sub functions are: - get_tx_size_context() - tx_size_to_depth() - bsize_to_max_depth() - bsize_to_tx_size_cat() and etc. Also add sub_tx_size_map[] table.
  d6e08b5f
- Add left and above tx_size contexts to BlockContext · df7c345b
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  Also add reset_left_tx_context() function to BlockContext.
  df7c345b
- Add tx_size_cdf to CDFContext and init its counters · 77cdf116
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  77cdf116
- Add tx_mode_select field under FrameInvariant and encode it to bitstream · cdf5356f
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  cdf5356f
- Add max_txsize_rect_lookup[] table, and rewrite largest_uv_tx_size() based on it · 701b1b11
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  701b1b11
- Move set_tx_size() to encode_block_b() · a74b0084
  Yushin Cho authored 5 years ago and Yushin Cho committed 5 years ago
  
  a74b0084
- Add checkpoint structure for BlockContext · 06204913
  Romain Vimont authored 5 years ago
  
  BlockContext::checkpoint() returned a whole copy of BlockContext, while only a subset of its fields were actually part of the checkpoint. The other fields, unused, were filled by default values. Instead, use a structure dedicated to BlockContext checkpoints, containing only the relevant fields.
  06204913
- Add FrameBlocks structure · 65bd158b
  Romain Vimont authored 5 years ago
  
  In BlockContext, the blocks were stored as Vec<Vec<Block>>. Use a dedicated structure instead. This will allow to provide a tiled view (TileBlocks).
  65bd158b
- Avoid unneeded mutable borrows on BlockContext · 853a9752
  Romain Vimont authored 5 years ago
  
  BlockContext was borrowed but only read.
  853a9752
Mar 25, 2019
- Fast-path for aligned 16x16 in sad_sse2 · 15e0de25
  David Michael Barr authored 5 years ago
  
  Tidy up the loop as well.
  15e0de25
Mar 23, 2019
- Enable LTO on release and bench builds. · 4f3d5e51
  Thomas Daede authored 5 years ago and Luca Barbato committed 5 years ago
  
  This shows a pretty nice improvement of ~15% on encode block benchmarks.
  4f3d5e51
Mar 22, 2019

Remove unnecessary Arc in Option · 162f7243

Romain Vimont authored 5 years ago

Semantically, the functions want to receive either Some reference or
None. They don't care whether the actual object happens to be
atomically refcounted.

As a downside, the caller is required to do some conversions.

162f7243

Remove unnecessary Arc in parameters · 5669d76c

Romain Vimont authored 5 years ago

> Don't pass a smart pointer as a function parameter unless you want to
> use or manipulate the smart pointer itself, such as to share or
> transfer ownership.
                      (Herb Sutter)

<https://herbsutter.com/2013/06/05/gotw-91-solution-smart-pointer-parameters/>

5669d76c

Mar 21, 2019

Do not pass both (blk_w, blk_h) and BlockSize · 9317e7d2
Romain Vimont authored 5 years ago
```
This is redundant. For consistency, keep blk_w and blk_h.
```
9317e7d2
Fix associativity in build.sh desync check · ced2b342
Monty Montgomery authored 5 years ago and Thomas Daede committed 5 years ago

ced2b342

Make tmp_plane_opt internal to diamond_me_search() · 06c8bda1

Romain Vimont authored 5 years ago

An optional tempoary plane is passed from callers to callees to avoid
reallocations. However, the callers of diamond_me_search() need not
provide this plane; they just need a flag to enable subpixel motion
estimation.

06c8bda1

Make temporary plane internal · 59a458f3

Romain Vimont authored 5 years ago

The temporary plane need not be provided by the caller of
sub_pixel_me() and telescopic_subpel_search().

Expose it only when it allows to avoid reallocations between several
function calls.

59a458f3

Do not pass both BlockOffset and PlaneOffset · 537679d0

Romain Vimont authored 5 years ago

In motion estimation, several functions received both the offset
expressed in blocks and in pixels for the luma plane. This information
is redundant: a block offset is trivially convertible to a luma plane
offset.

With tiling, we need to manage both absolute offsets (relative to the
frame) and offsets relative to the current tile. This will be more
simple without duplication.

537679d0

Set timeout for cargo kcov to 20 minutes. · bec54843
Thomas Daede authored 5 years ago and Luca Barbato committed 5 years ago

bec54843

Mar 20, 2019

Make PlaneOffset derive Copy · 93151ecc
Romain Vimont authored 5 years ago
```
Like previous commits did for BlockOffset and SuperBlockOffset.
```
93151ecc
Make SuperBlockOffset derive Copy · 32effc32
Romain Vimont authored 5 years ago
```
Like previous commit did for BlockOffset.
```
32effc32

Make BlockOffset derive Copy · 42d96a36

Romain Vimont authored 5 years ago

BlockOffset has a size of 128 bits (the same as a slice), and is
trivially copyable, so make it derive Copy.

Once it derives Copy, clippy suggests to never pass it by reference:
<https://rust-lang.github.io/rust-clippy/master/index.html#trivially_copy_pass_by_ref>

So pass it by value everywhere to simplify usage.

In particular, this avoids lifetimes bounds where not necessary (e.g.
in get_sub_partitions()).

See <https://github.com/xiph/rav1e/pull/1126#issuecomment-474532123>.

42d96a36

estimate_motion_ss2: include it in the MotionEstimation trait · a3499ef0
Adrien Maglo authored 6 years ago

a3499ef0
Use diamond search for the half resolution motion estimation · 2c83dfd6
Adrien Maglo authored 6 years ago

2c83dfd6

diamond_me: save only selected frame motion vectors · fef34fda

Adrien Maglo authored 6 years ago

Save them by reference frame types instead of picture slot.
Do not add several times the zero motion vector to the predictor list.

fef34fda

Mar 19, 2019

Enable the Clippy's len_zero lint (#1128) · 465ae7f9
Vladimir Kazakov authored 6 years ago and Josh Holmer committed 6 years ago
```
https://rust-lang.github.io/rust-clippy/master/index.html#len_zero
```
465ae7f9

Add struct FrameMotionVectors · 5638ae64

Romain Vimont authored 6 years ago

The motion vectors were stored in a Vec<Vec<MotionVector>>.

The innermost Vec contains a flatten matrix (fi.w_in_b x fi.h_in_b) of
MotionVectors, and there are REF_FRAMES instances of them (the outermost
Vec).

Introduce a typed structure to replace the innermost Vec:
 - this improves readability;
 - this allows to expose it as a 2D array, thanks to Index and IndexMut
   traits;
 - this will allow to split it into (non-overlapping) tiled views,
   containing only the motion vectors for a bounded region of the plane
   (see <https://github.com/xiph/rav1e/pull/1126>).

5638ae64

Enable the Clippy's if_same_then_else lint · 8f07aebc
Vladimir Kazakov authored 6 years ago
```
https://rust-lang.github.io/rust-clippy/master/index.html#if_same_then_else
```
8f07aebc

Mar 18, 2019

Inline often called and almost-trivial functions (#1124) · 7a479a0c

David Michael Barr authored 6 years ago

* Inline constrain and msb for cdef_filter_block
  This reduces its average time by around 42%.
* Inline round_shift for pred_directional and others
  This reduces its average time by around 10%.
* Inline sgrproj_sum_finish to its various callers
  It is at the lowest level of a hot call graph and almost trivial.
* Inline get_mv_rate in motion estimation
  It is almost trivial and called often.

7a479a0c

Mar 16, 2019
- Enable the Clippy's manual_memcpy lint (#1122) · a6aedef7
  Vladimir Kazakov authored 6 years ago and Josh Holmer committed 6 years ago
  
  https://rust-lang.github.io/rust-clippy/master/index.html#manual_memcpy
  a6aedef7
- Enable prep_8tap assembly · 132e9027
  David Michael Barr authored 6 years ago
  
  132e9027
- Cast before left shift in native prep_8tap · d8017c92
  David Michael Barr authored 6 years ago
  
  d8017c92
Mar 15, 2019
- Disable prep_8tap assembly. · ffc99ed9
  Thomas Daede authored 6 years ago
  
  Temporarily fixes #1115.
  ffc99ed9