Commits · master · Vibhoothi / Demo

Jun 19, 2020

clear transient status · 99114995
Dayanne Fernandes authored 4 years ago

99114995
add elapsed time · 27a48fcc
Dayanne Fernandes authored 4 years ago

27a48fcc
rav1e_js: Workaround BigUint64Array safari browser · 425c5ca9
Takahiro authored 4 years ago
```
Co-authored-by: Urhengulas <johann.hemmann@code.berlin>
```
425c5ca9
rav1e_js: Add missing license · 61166e6c
Vibhoothi authored 4 years ago

61166e6c

Urhengulas authored 4 years ago

* expose Encoder, EncoderConfig, Frame, Packet to JavaScript
    * implement most config methods
    * supply docs for structs and methods
    * simple error-handling for various methods
* move simple_Encoding to tests/utils.rs

90c07031

v_frame: Prepare for jsapi v0.1 · bfff1757

Urhengulas authored 4 years ago

* add wasm_bindgen as dependency
* annotate pixel::ChromaSampling with #[wasm_bindgen]

bfff1757

rav1e: Prepare for jsapi v0.1 · da155623

Urhengulas authored 4 years ago

* add wasm_bindgen as dependency
* annotate with #[wasm_bindgen]
    * encoder::Tune
    * api::color::ChromaSamplePosition
    * api::color::PixelRange
* impl Debug for Context

da155623

Decoders version bump (#2323) · d7d1a2e9

Luca Barbato authored 4 years ago


* Bump aom-sys to 0.2.1
* Bump dav1d-sys to 0.3.2
* Update the aom tests accordingly
* Update the CI to match the new dependencies

Co-authored-by: Vibhoothi <vibhoothiiaanand@gmail.com>
Co-authored-by: Luni-4 <luni-4@hotmail.it>

d7d1a2e9

Jun 18, 2020

Bump AV-Metrics to 0.5.1 · 3b63cd09
Vibhoothi authored 4 years ago

3b63cd09

Use incremental builds for the release profile · 02c1ff0a

Kyle Siefring authored 4 years ago

Dramatically improves the speed of rebuilding release (i.e. not from
scratch). Little to no difference in runtime on awcy.

https://beta.arewecompressedyet.com/?job=codegen-units-incremental%402020-06-17T17%3A26%3A41.471Z&job=codegen-units%402020-06-17T14%3A13%3A14.505Z

before:
  release profile: full build time
    real    0m51.344s
    user    6m47.160s
  release profile: touch src/lib.rs
    real    0m18.483s
    user    1m29.580s

after:
  release profile: full build time
    real    0m48.754s
    user    6m53.001s
  release profile: touch src/lib.rs
    real    0m3.177s
    user    0m2.793s

02c1ff0a

Remove limit on codegen units for release profile · aaec49ba

Kyle Siefring authored 4 years ago

Decreases compile time on AWCY from ~4 mins to ~2.25 mins. Increases run
time roughly 4% and 10 seconds on the most intense clip/qp (MINECRAFT @
80).

Testing also shows that codegen-units=1 actaully causes a substantial
performance decrease when iterative compilation is enabled.

aaec49ba

Unbreak the CI deploy · 51afb318
Luca Barbato authored 4 years ago
```
Fixes #2407
```
51afb318

Jun 17, 2020

Remove the crate-type specifier · c944c52b
Luca Barbato authored 4 years ago
```
It was added to workaround a limitation of the old cargo-c.
```
c944c52b

Remove lto from the release profile · 35e904e0

Kyle Siefring authored 4 years ago

The performance difference has shrunk substantially adding inline to a
bunch of functions. The performance difference with or without lto is
about 4 seconds on the slowest clip/qp on a standard awcy run
(MINECRAFT, objective-1-fast, qp 80). The compile time difference is 42
seconds.

If it's possible to have good performance with lto off, then optimizing
without it will provide a good development profile. Going forward, it
would be advisable to use lto to check for performancem problems. It
may be possible to create a seperate profile between dev and release.
Would require https://github.com/rust-lang/cargo/issues/6988 and AWCY
would need to be modified to use the new profile/still work with old
versions of rav1e.

35e904e0

Jun 16, 2020
- Make some inlined fns const. · b14cbc73
  Kyle Siefring authored 4 years ago
  
  The previous commit made a large number of functions const. Make the function that we can const.
  b14cbc73
- Inline various code to improve non-lto performance · 90fa2d54
  Kyle Siefring authored 4 years ago
  
  Improves non-lto performance by 15-20%
  90fa2d54
- Remove unneeded parenteses · 610fa562
  Luca Barbato authored 4 years ago
  
  610fa562
- Use the patch syntax in Cargo.toml · 92111aa5
  Luca Barbato authored 4 years ago
  
  92111aa5
Jun 15, 2020

Correctly chunk sse calc in importance block sizes · 0680e52a

Kyle Siefring authored 4 years ago

A previous patch tried to do this, but missed the 4x4 case on subsampled
frames. Roughly 2% speedup on the default speed level.

0680e52a

Jun 14, 2020
- README: drop appveyor badge · db69fb4f
  Riccardo Magliocchetti authored 4 years ago
  
  As appveyor integration has been removed
  db69fb4f
- README: Remove Windows builds sections · 5de9af0d
  Christopher Degawa authored 4 years ago
  
  Since appveyor is no longer used anymore, this does not apply Signed-off-by: Christopher Degawa <ccom@randomderp.com>
  5de9af0d
Jun 12, 2020
- Pin the nasm-rs version · 132be5cf
  Luca Barbato authored 4 years ago
  
  CI has a version of nasm that is not compatible with nasm-rs 0.1.8.
  132be5cf
- Add more information on rust target-cpu=native support · 151863ea
  Luca Barbato authored 4 years ago
  
  151863ea
Jun 11, 2020

Setup CI: wasm-pack build, pack, test · c1432f0a

Urhengulas authored 4 years ago

Change:
* test in multiple browsers (firefox and chrome)
* test in nodejs (node version set explicitly)

c1432f0a

Add rav1e_js as subcrate · 47a5b460

Urhengulas authored 4 years ago

Changes:
* add also example website
* make alert() catchable
* let caller define # of frames to encode
* move alert to wrapper function
* add license headers
* add tests

47a5b460

Jun 10, 2020

CI: Run ignored dav1d decode tests on aarch64 · 1bf36a46
David Michael Barr authored 4 years ago

1bf36a46

Integrate aarch64 assembly: src/arm/64/mc16.S · 96350855

David Michael Barr authored 4 years ago

Integrated:
* rav1e_avg_16bpc_neon
* rav1e_prep_8tap_regular_16bpc_neon
* rav1e_prep_8tap_regular_sharp_16bpc_neon
* rav1e_prep_8tap_regular_smooth_16bpc_neon
* rav1e_prep_8tap_sharp_16bpc_neon
* rav1e_prep_8tap_sharp_regular_16bpc_neon
* rav1e_prep_8tap_sharp_smooth_16bpc_neon
* rav1e_prep_8tap_smooth_16bpc_neon
* rav1e_prep_8tap_smooth_regular_16bpc_neon
* rav1e_prep_8tap_smooth_sharp_16bpc_neon
* rav1e_prep_bilin_16bpc_neon
* rav1e_put_8tap_regular_16bpc_neon
* rav1e_put_8tap_regular_sharp_16bpc_neon
* rav1e_put_8tap_regular_smooth_16bpc_neon
* rav1e_put_8tap_sharp_16bpc_neon
* rav1e_put_8tap_sharp_regular_16bpc_neon
* rav1e_put_8tap_sharp_smooth_16bpc_neon
* rav1e_put_8tap_smooth_16bpc_neon
* rav1e_put_8tap_smooth_regular_16bpc_neon
* rav1e_put_8tap_smooth_sharp_16bpc_neon
* rav1e_put_bilin_16bpc_neon

Future work:
* rav1e_blend_16bpc_neon
* rav1e_blend_h_16bpc_neon
* rav1e_blend_v_16bpc_neon
* rav1e_emu_edge_16bpc_neon
* rav1e_mask_16bpc_neon
* rav1e_warp_affine_8x8_16bpc_neon
* rav1e_warp_affine_8x8t_16bpc_neon
* rav1e_w_avg_16bpc_neon
* rav1e_w_mask_420_16bpc_neon
* rav1e_w_mask_422_16bpc_neon
* rav1e_w_mask_444_16bpc_neon

96350855

Align with dav1d high bit-depth parameters · 828fa167

David Michael Barr authored 4 years ago

Assembly for high bit-depth functions in dav1d accept the maximum value
rather than the bit-depth.

i.e. bitdepth_max == (1 << bit_depth) -1

Also, use correct pointer types for consistency.

828fa167

Jun 09, 2020
- Add details of valid key value pairs accepted by rav1e_config_parse. (#2332) · d435f94e
  amCap1712 authored 4 years ago
  
  d435f94e
Jun 08, 2020
- CI: Update cargo-c version · e8e0b42c
  Luni-4 authored 4 years ago
  
  e8e0b42c
Jun 07, 2020

Fix for LRF choosing different LRU sizes in Y and UV when not 4:2:0 · 0be21f39

Monty Montgomery authored 4 years ago

This is a fix for issue 2311: Corrupt bitstream: multi-pass
encoding. The problem had nothing to do with multipass.  It's related
to issue 2212, specifically a missed corner-case check from that fix.

The LRF code correctly fixes up the UV LRU size to heed 4:2:2
colorspace restrictions, but then does not reconcile Y.  Because only
4:2:0 can have different LRU sizes for Y and UV, the fixup to only
chroma leads to a desync.  The fix is to treat the corrected 4:2:2 UV
LRU size value as a maximum, and when Y and UV differ, set Y and UV to
min(Y, UV).

0be21f39

Jun 05, 2020
- Suppress a new clippy check · 17f5c387
  Luca Barbato authored 4 years ago
  
  Introduced in rustc 1.44.0
  17f5c387
Jun 04, 2020
- Stop precomputing and passing around pmvs index · da7cd280
  Kyle Siefring authored 4 years ago
  
  Deduplicates a bunch of code.
  da7cd280
Jun 02, 2020

Clean up recon_intra.rs, which are ported from libaom · b750e481

Yushin Cho authored 4 years ago and

Yushin Cho committed 4 years ago

- Remove left over libaom code when poring to rav1e
- Uncomment some array declarations, which are not used now but required
for PARTITION_VERT_A and _B in the future.

b750e481

msac: Avoid attempting to refill after eob has already been reached · 88c8d38d

Henrik Gramner authored 4 years ago and

Vibhoothi committed 4 years ago

Utilize the unsigned representation of a signed integer to skip
the refill code if the count was already negative to begin with,
which saves a few clock cycles at the end of each tile.

88c8d38d

arm64: itx: Add NEON implementation of itx for 10 bpc · 3d94d425

Martin Storsjö authored 5 years ago and

Vibhoothi committed 4 years ago

Add an element size specifier to the existing individual transform
functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify
that they operate on input vectors of 8h, and make the symbols
public, to let the 10 bpc case call them from a different object file.
The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.

Make the existing itx.S compiled regardless of whether 8 bpc support
is enabled. For builds with 8 bpc support disabled, this does include
the unused frontend functions though, but this is hopefully tolerable
to avoid having to split the file into a sharable file for transforms
and a separate one for frontends.

This only implements the 10 bpc case, as that case can use transforms
operating on 16 bit coefficients in the second pass.

Relative speedup vs C for a few functions:

Cortex A53 A72 A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon: 4.14 4.06 4.49
inv_txfm_add_4x4_dct_dct_1_10bpc_neon: 6.51 6.49 6.42
inv_txfm_add_8x8_dct_dct_0_10bpc_neon: 5.02 4.63 6.23
inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 8.54 7.13 11.96
inv_txfm_add_16x16_dct_dct_0_10bpc_neon: 5.52 6.60 8.03
inv_txfm_add_16x16_dct_dct_1_10bpc_neon: 11.27 9.62 12.22
inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 9.60 6.97 8.59
inv_txfm_add_32x32_dct_dct_0_10bpc_neon: 2.60 3.48 3.19
inv_txfm_add_32x32_dct_dct_1_10bpc_neon: 14.65 12.64 16.86
inv_txfm_add_32x32_dct_dct_2_10bpc_neon: 11.57 8.80 12.68
inv_txfm_add_32x32_dct_dct_3_10bpc_neon: 8.79 8.00 9.21
inv_txfm_add_32x32_dct_dct_4_10bpc_neon: 7.58 6.21 7.80
inv_txfm_add_64x64_dct_dct_0_10bpc_neon: 2.41 2.85 2.75
inv_txfm_add_64x64_dct_dct_1_10bpc_neon: 12.91 10.27 12.24
inv_txfm_add_64x64_dct_dct_2_10bpc_neon: 10.96 7.97 10.31
inv_txfm_add_64x64_dct_dct_3_10bpc_neon: 8.95 7.42 9.55
inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 7.97 6.12 7.82

3d94d425

arm: Mark global symbols hidden · 373aa43a

Martin Storsjö authored 4 years ago and

Vibhoothi committed 4 years ago

This matches what is done in C by -fvisibility=hidden.

This avoids issues with relocations against other symbols exported
from another assembly file.

373aa43a

arm64: itx: Prepare for other bitdepths · 2272ef97

Martin Storsjö authored 5 years ago and

Vibhoothi committed 4 years ago


This commit is also having squashed changes of rav1e
specific(src/asm/aarch64/transform/inverse.rs), which updates
symbols so build will not be broken. Adds  "_8bpc" suffix to
differentiate b/w 8bpc and 16bpc.

Co-authored-by: Vibhoothi <vibhoothiiaanand@gmail.com>

2272ef97

arm64: itx: Share code for the three horz_16x8 functions · 39f95dd4
Martin Storsjö authored 4 years ago and Vibhoothi committed 4 years ago

39f95dd4

arm64: itx: Fix the eob checking for dct_dct_64x16 · c284bcdd

Martin Storsjö authored 4 years ago and

Vibhoothi committed 4 years ago

Before this, we never did the early exit from the first pass.

Before:                               Cortex A53      A72      A73
inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   7275.7   5198.3   5250.9
inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7276.1   5197.0   5251.3
inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7275.8   5196.2   5254.5
inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7273.6   5198.8   5254.2
After:
inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   5187.8   3763.8   3735.0
inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7280.6   5185.6   5256.3
inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7270.7   5179.8   5250.3
inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7271.7   5212.4   5256.4

The other related variants didn't have this bug and properly exited
early when possible.

c284bcdd