- Jun 19, 2020
-
-
Dayanne Fernandes authored
-
Dayanne Fernandes authored
-
Takahiro authored
Co-authored-by:
Urhengulas <johann.hemmann@code.berlin>
-
Vibhoothi authored
-
Urhengulas authored
* expose Encoder, EncoderConfig, Frame, Packet to JavaScript * implement most config methods * supply docs for structs and methods * simple error-handling for various methods * move simple_Encoding to tests/utils.rs
-
Urhengulas authored
* add wasm_bindgen as dependency * annotate pixel::ChromaSampling with #[wasm_bindgen]
-
Urhengulas authored
* add wasm_bindgen as dependency * annotate with #[wasm_bindgen] * encoder::Tune * api::color::ChromaSamplePosition * api::color::PixelRange * impl Debug for Context
-
Luca Barbato authored
* Bump aom-sys to 0.2.1 * Bump dav1d-sys to 0.3.2 * Update the aom tests accordingly * Update the CI to match the new dependencies Co-authored-by:
Vibhoothi <vibhoothiiaanand@gmail.com> Co-authored-by:
Luni-4 <luni-4@hotmail.it>
-
- Jun 18, 2020
-
-
Vibhoothi authored
-
Kyle Siefring authored
Dramatically improves the speed of rebuilding release (i.e. not from scratch). Little to no difference in runtime on awcy. https://beta.arewecompressedyet.com/?job=codegen-units-incremental%402020-06-17T17%3A26%3A41.471Z&job=codegen-units%402020-06-17T14%3A13%3A14.505Z before: release profile: full build time real 0m51.344s user 6m47.160s release profile: touch src/lib.rs real 0m18.483s user 1m29.580s after: release profile: full build time real 0m48.754s user 6m53.001s release profile: touch src/lib.rs real 0m3.177s user 0m2.793s
-
Kyle Siefring authored
Decreases compile time on AWCY from ~4 mins to ~2.25 mins. Increases run time roughly 4% and 10 seconds on the most intense clip/qp (MINECRAFT @ 80). Testing also shows that codegen-units=1 actaully causes a substantial performance decrease when iterative compilation is enabled.
-
Luca Barbato authored
Fixes #2407
-
- Jun 17, 2020
-
-
Luca Barbato authored
It was added to workaround a limitation of the old cargo-c.
-
Kyle Siefring authored
The performance difference has shrunk substantially adding inline to a bunch of functions. The performance difference with or without lto is about 4 seconds on the slowest clip/qp on a standard awcy run (MINECRAFT, objective-1-fast, qp 80). The compile time difference is 42 seconds. If it's possible to have good performance with lto off, then optimizing without it will provide a good development profile. Going forward, it would be advisable to use lto to check for performancem problems. It may be possible to create a seperate profile between dev and release. Would require https://github.com/rust-lang/cargo/issues/6988 and AWCY would need to be modified to use the new profile/still work with old versions of rav1e.
-
- Jun 16, 2020
-
-
Kyle Siefring authored
The previous commit made a large number of functions const. Make the function that we can const.
-
Kyle Siefring authored
Improves non-lto performance by 15-20%
-
Luca Barbato authored
-
Luca Barbato authored
-
- Jun 15, 2020
-
-
Kyle Siefring authored
A previous patch tried to do this, but missed the 4x4 case on subsampled frames. Roughly 2% speedup on the default speed level.
-
- Jun 14, 2020
-
-
Riccardo Magliocchetti authored
As appveyor integration has been removed
-
Christopher Degawa authored
Since appveyor is no longer used anymore, this does not apply Signed-off-by:
Christopher Degawa <ccom@randomderp.com>
-
- Jun 12, 2020
-
-
Luca Barbato authored
CI has a version of nasm that is not compatible with nasm-rs 0.1.8.
-
Luca Barbato authored
-
- Jun 11, 2020
-
-
Urhengulas authored
Change: * test in multiple browsers (firefox and chrome) * test in nodejs (node version set explicitly)
-
Urhengulas authored
Changes: * add also example website * make alert() catchable * let caller define # of frames to encode * move alert to wrapper function * add license headers * add tests
-
- Jun 10, 2020
-
-
David Michael Barr authored
-
David Michael Barr authored
Integrated: * rav1e_avg_16bpc_neon * rav1e_prep_8tap_regular_16bpc_neon * rav1e_prep_8tap_regular_sharp_16bpc_neon * rav1e_prep_8tap_regular_smooth_16bpc_neon * rav1e_prep_8tap_sharp_16bpc_neon * rav1e_prep_8tap_sharp_regular_16bpc_neon * rav1e_prep_8tap_sharp_smooth_16bpc_neon * rav1e_prep_8tap_smooth_16bpc_neon * rav1e_prep_8tap_smooth_regular_16bpc_neon * rav1e_prep_8tap_smooth_sharp_16bpc_neon * rav1e_prep_bilin_16bpc_neon * rav1e_put_8tap_regular_16bpc_neon * rav1e_put_8tap_regular_sharp_16bpc_neon * rav1e_put_8tap_regular_smooth_16bpc_neon * rav1e_put_8tap_sharp_16bpc_neon * rav1e_put_8tap_sharp_regular_16bpc_neon * rav1e_put_8tap_sharp_smooth_16bpc_neon * rav1e_put_8tap_smooth_16bpc_neon * rav1e_put_8tap_smooth_regular_16bpc_neon * rav1e_put_8tap_smooth_sharp_16bpc_neon * rav1e_put_bilin_16bpc_neon Future work: * rav1e_blend_16bpc_neon * rav1e_blend_h_16bpc_neon * rav1e_blend_v_16bpc_neon * rav1e_emu_edge_16bpc_neon * rav1e_mask_16bpc_neon * rav1e_warp_affine_8x8_16bpc_neon * rav1e_warp_affine_8x8t_16bpc_neon * rav1e_w_avg_16bpc_neon * rav1e_w_mask_420_16bpc_neon * rav1e_w_mask_422_16bpc_neon * rav1e_w_mask_444_16bpc_neon
-
David Michael Barr authored
Assembly for high bit-depth functions in dav1d accept the maximum value rather than the bit-depth. i.e. bitdepth_max == (1 << bit_depth) -1 Also, use correct pointer types for consistency.
-
- Jun 09, 2020
-
-
amCap1712 authored
-
- Jun 08, 2020
-
-
Luni-4 authored
-
- Jun 07, 2020
-
-
Monty Montgomery authored
This is a fix for issue 2311: Corrupt bitstream: multi-pass encoding. The problem had nothing to do with multipass. It's related to issue 2212, specifically a missed corner-case check from that fix. The LRF code correctly fixes up the UV LRU size to heed 4:2:2 colorspace restrictions, but then does not reconcile Y. Because only 4:2:0 can have different LRU sizes for Y and UV, the fixup to only chroma leads to a desync. The fix is to treat the corrected 4:2:2 UV LRU size value as a maximum, and when Y and UV differ, set Y and UV to min(Y, UV).
-
- Jun 05, 2020
-
-
Luca Barbato authored
Introduced in rustc 1.44.0
-
- Jun 04, 2020
-
-
Kyle Siefring authored
Deduplicates a bunch of code.
-
- Jun 02, 2020
-
-
- Remove left over libaom code when poring to rav1e - Uncomment some array declarations, which are not used now but required for PARTITION_VERT_A and _B in the future.
-
Utilize the unsigned representation of a signed integer to skip the refill code if the count was already negative to begin with, which saves a few clock cycles at the end of each tile.
-
Add an element size specifier to the existing individual transform functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify that they operate on input vectors of 8h, and make the symbols public, to let the 10 bpc case call them from a different object file. The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon. Make the existing itx.S compiled regardless of whether 8 bpc support is enabled. For builds with 8 bpc support disabled, this does include the unused frontend functions though, but this is hopefully tolerable to avoid having to split the file into a sharable file for transforms and a separate one for frontends. This only implements the 10 bpc case, as that case can use transforms operating on 16 bit coefficients in the second pass. Relative speedup vs C for a few functions: Cortex A53 A72 A73 inv_txfm_add_4x4_dct_dct_0_10bpc_neon: 4.14 4.06 4.49 inv_txfm_add_4x4_dct_dct_1_10bpc_neon: 6.51 6.49 6.42 inv_txfm_add_8x8_dct_dct_0_10bpc_neon: 5.02 4.63 6.23 inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 8.54 7.13 11.96 inv_txfm_add_16x16_dct_dct_0_10bpc_neon: 5.52 6.60 8.03 inv_txfm_add_16x16_dct_dct_1_10bpc_neon: 11.27 9.62 12.22 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 9.60 6.97 8.59 inv_txfm_add_32x32_dct_dct_0_10bpc_neon: 2.60 3.48 3.19 inv_txfm_add_32x32_dct_dct_1_10bpc_neon: 14.65 12.64 16.86 inv_txfm_add_32x32_dct_dct_2_10bpc_neon: 11.57 8.80 12.68 inv_txfm_add_32x32_dct_dct_3_10bpc_neon: 8.79 8.00 9.21 inv_txfm_add_32x32_dct_dct_4_10bpc_neon: 7.58 6.21 7.80 inv_txfm_add_64x64_dct_dct_0_10bpc_neon: 2.41 2.85 2.75 inv_txfm_add_64x64_dct_dct_1_10bpc_neon: 12.91 10.27 12.24 inv_txfm_add_64x64_dct_dct_2_10bpc_neon: 10.96 7.97 10.31 inv_txfm_add_64x64_dct_dct_3_10bpc_neon: 8.95 7.42 9.55 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 7.97 6.12 7.82
-
This matches what is done in C by -fvisibility=hidden. This avoids issues with relocations against other symbols exported from another assembly file.
-
This commit is also having squashed changes of rav1e specific(src/asm/aarch64/transform/inverse.rs), which updates symbols so build will not be broken. Adds "_8bpc" suffix to differentiate b/w 8bpc and 16bpc. Co-authored-by:
Vibhoothi <vibhoothiiaanand@gmail.com>
-
-
Before this, we never did the early exit from the first pass. Before: Cortex A53 A72 A73 inv_txfm_add_64x16_dct_dct_1_8bpc_neon: 7275.7 5198.3 5250.9 inv_txfm_add_64x16_dct_dct_2_8bpc_neon: 7276.1 5197.0 5251.3 inv_txfm_add_64x16_dct_dct_3_8bpc_neon: 7275.8 5196.2 5254.5 inv_txfm_add_64x16_dct_dct_4_8bpc_neon: 7273.6 5198.8 5254.2 After: inv_txfm_add_64x16_dct_dct_1_8bpc_neon: 5187.8 3763.8 3735.0 inv_txfm_add_64x16_dct_dct_2_8bpc_neon: 7280.6 5185.6 5256.3 inv_txfm_add_64x16_dct_dct_3_8bpc_neon: 7270.7 5179.8 5250.3 inv_txfm_add_64x16_dct_dct_4_8bpc_neon: 7271.7 5212.4 5256.4 The other related variants didn't have this bug and properly exited early when possible.
-