Skip to content
Snippets Groups Projects
  1. Jun 19, 2020
  2. Jun 18, 2020
  3. Jun 17, 2020
    • Luca Barbato's avatar
      Remove the crate-type specifier · c944c52b
      Luca Barbato authored
      It was added to workaround a limitation of the old cargo-c.
      c944c52b
    • Kyle Siefring's avatar
      Remove lto from the release profile · 35e904e0
      Kyle Siefring authored
      The performance difference has shrunk substantially adding inline to a
      bunch of functions. The performance difference with or without lto is
      about 4 seconds on the slowest clip/qp on a standard awcy run
      (MINECRAFT, objective-1-fast, qp 80). The compile time difference is 42
      seconds.
      
      If it's possible to have good performance with lto off, then optimizing
      without it will provide a good development profile. Going forward, it
      would be advisable to use lto to check for performancem problems. It
      may be possible to create a seperate profile between dev and release.
      Would require https://github.com/rust-lang/cargo/issues/6988 and AWCY
      would need to be modified to use the new profile/still work with old
      versions of rav1e.
      35e904e0
  4. Jun 16, 2020
  5. Jun 15, 2020
  6. Jun 14, 2020
  7. Jun 12, 2020
  8. Jun 11, 2020
    • Urhengulas's avatar
      Setup CI: wasm-pack build, pack, test · c1432f0a
      Urhengulas authored
      Change:
      * test in multiple browsers (firefox and chrome)
      * test in nodejs (node version set explicitly)
      c1432f0a
    • Urhengulas's avatar
      Add rav1e_js as subcrate · 47a5b460
      Urhengulas authored
      Changes:
      * add also example website
      * make alert() catchable
      * let caller define # of frames to encode
      * move alert to wrapper function
      * add license headers
      * add tests
      47a5b460
  9. Jun 10, 2020
    • David Michael Barr's avatar
      1bf36a46
    • David Michael Barr's avatar
      Integrate aarch64 assembly: src/arm/64/mc16.S · 96350855
      David Michael Barr authored
      Integrated:
      * rav1e_avg_16bpc_neon
      * rav1e_prep_8tap_regular_16bpc_neon
      * rav1e_prep_8tap_regular_sharp_16bpc_neon
      * rav1e_prep_8tap_regular_smooth_16bpc_neon
      * rav1e_prep_8tap_sharp_16bpc_neon
      * rav1e_prep_8tap_sharp_regular_16bpc_neon
      * rav1e_prep_8tap_sharp_smooth_16bpc_neon
      * rav1e_prep_8tap_smooth_16bpc_neon
      * rav1e_prep_8tap_smooth_regular_16bpc_neon
      * rav1e_prep_8tap_smooth_sharp_16bpc_neon
      * rav1e_prep_bilin_16bpc_neon
      * rav1e_put_8tap_regular_16bpc_neon
      * rav1e_put_8tap_regular_sharp_16bpc_neon
      * rav1e_put_8tap_regular_smooth_16bpc_neon
      * rav1e_put_8tap_sharp_16bpc_neon
      * rav1e_put_8tap_sharp_regular_16bpc_neon
      * rav1e_put_8tap_sharp_smooth_16bpc_neon
      * rav1e_put_8tap_smooth_16bpc_neon
      * rav1e_put_8tap_smooth_regular_16bpc_neon
      * rav1e_put_8tap_smooth_sharp_16bpc_neon
      * rav1e_put_bilin_16bpc_neon
      
      Future work:
      * rav1e_blend_16bpc_neon
      * rav1e_blend_h_16bpc_neon
      * rav1e_blend_v_16bpc_neon
      * rav1e_emu_edge_16bpc_neon
      * rav1e_mask_16bpc_neon
      * rav1e_warp_affine_8x8_16bpc_neon
      * rav1e_warp_affine_8x8t_16bpc_neon
      * rav1e_w_avg_16bpc_neon
      * rav1e_w_mask_420_16bpc_neon
      * rav1e_w_mask_422_16bpc_neon
      * rav1e_w_mask_444_16bpc_neon
      96350855
    • David Michael Barr's avatar
      Align with dav1d high bit-depth parameters · 828fa167
      David Michael Barr authored
      Assembly for high bit-depth functions in dav1d accept the maximum value
      rather than the bit-depth.
      
      i.e. bitdepth_max == (1 << bit_depth) -1
      
      Also, use correct pointer types for consistency.
      828fa167
  10. Jun 09, 2020
  11. Jun 08, 2020
  12. Jun 07, 2020
    • Monty Montgomery's avatar
      Fix for LRF choosing different LRU sizes in Y and UV when not 4:2:0 · 0be21f39
      Monty Montgomery authored
      This is a fix for issue 2311: Corrupt bitstream: multi-pass
      encoding. The problem had nothing to do with multipass.  It's related
      to issue 2212, specifically a missed corner-case check from that fix.
      
      The LRF code correctly fixes up the UV LRU size to heed 4:2:2
      colorspace restrictions, but then does not reconcile Y.  Because only
      4:2:0 can have different LRU sizes for Y and UV, the fixup to only
      chroma leads to a desync.  The fix is to treat the corrected 4:2:2 UV
      LRU size value as a maximum, and when Y and UV differ, set Y and UV to
      min(Y, UV).
      0be21f39
  13. Jun 05, 2020
  14. Jun 04, 2020
  15. Jun 02, 2020
    • Yushin Cho's avatar
      Clean up recon_intra.rs, which are ported from libaom · b750e481
      Yushin Cho authored and Yushin Cho's avatar Yushin Cho committed
      - Remove left over libaom code when poring to rav1e
      - Uncomment some array declarations, which are not used now but required
      for PARTITION_VERT_A and _B in the future.
      b750e481
    • Henrik Gramner's avatar
      msac: Avoid attempting to refill after eob has already been reached · 88c8d38d
      Henrik Gramner authored and Vibhoothi's avatar Vibhoothi committed
      Utilize the unsigned representation of a signed integer to skip
      the refill code if the count was already negative to begin with,
      which saves a few clock cycles at the end of each tile.
      88c8d38d
    • Martin Storsjö's avatar
      arm64: itx: Add NEON implementation of itx for 10 bpc · 3d94d425
      Martin Storsjö authored and Vibhoothi's avatar Vibhoothi committed
      Add an element size specifier to the existing individual transform
      functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify
      that they operate on input vectors of 8h, and make the symbols
      public, to let the 10 bpc case call them from a different object file.
      The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.
      
      Make the existing itx.S compiled regardless of whether 8 bpc support
      is enabled. For builds with 8 bpc support disabled, this does include
      the unused frontend functions though, but this is hopefully tolerable
      to avoid having to split the file into a sharable file for transforms
      and a separate one for frontends.
      
      This only implements the 10 bpc case, as that case can use transforms
      operating on 16 bit coefficients in the second pass.
      
      Relative speedup vs C for a few functions:
      
                                           Cortex A53    A72    A73
      inv_txfm_add_4x4_dct_dct_0_10bpc_neon:     4.14   4.06   4.49
      inv_txfm_add_4x4_dct_dct_1_10bpc_neon:     6.51   6.49   6.42
      inv_txfm_add_8x8_dct_dct_0_10bpc_neon:     5.02   4.63   6.23
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:     8.54   7.13  11.96
      inv_txfm_add_16x16_dct_dct_0_10bpc_neon:   5.52   6.60   8.03
      inv_txfm_add_16x16_dct_dct_1_10bpc_neon:  11.27   9.62  12.22
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   9.60   6.97   8.59
      inv_txfm_add_32x32_dct_dct_0_10bpc_neon:   2.60   3.48   3.19
      inv_txfm_add_32x32_dct_dct_1_10bpc_neon:  14.65  12.64  16.86
      inv_txfm_add_32x32_dct_dct_2_10bpc_neon:  11.57   8.80  12.68
      inv_txfm_add_32x32_dct_dct_3_10bpc_neon:   8.79   8.00   9.21
      inv_txfm_add_32x32_dct_dct_4_10bpc_neon:   7.58   6.21   7.80
      inv_txfm_add_64x64_dct_dct_0_10bpc_neon:   2.41   2.85   2.75
      inv_txfm_add_64x64_dct_dct_1_10bpc_neon:  12.91  10.27  12.24
      inv_txfm_add_64x64_dct_dct_2_10bpc_neon:  10.96   7.97  10.31
      inv_txfm_add_64x64_dct_dct_3_10bpc_neon:   8.95   7.42   9.55
      inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   7.97   6.12   7.82
      3d94d425
    • Martin Storsjö's avatar
      arm: Mark global symbols hidden · 373aa43a
      Martin Storsjö authored and Vibhoothi's avatar Vibhoothi committed
      This matches what is done in C by -fvisibility=hidden.
      
      This avoids issues with relocations against other symbols exported
      from another assembly file.
      373aa43a
    • Martin Storsjö's avatar
      arm64: itx: Prepare for other bitdepths · 2272ef97
      Martin Storsjö authored and Vibhoothi's avatar Vibhoothi committed
      
      This commit is also having squashed changes of rav1e
      specific(src/asm/aarch64/transform/inverse.rs), which updates
      symbols so build will not be broken. Adds  "_8bpc" suffix to
      differentiate b/w 8bpc and 16bpc.
      
      Co-authored-by: default avatarVibhoothi <vibhoothiiaanand@gmail.com>
      2272ef97
    • Martin Storsjö's avatar
      arm64: itx: Share code for the three horz_16x8 functions · 39f95dd4
      Martin Storsjö authored and Vibhoothi's avatar Vibhoothi committed
      39f95dd4
    • Martin Storsjö's avatar
      arm64: itx: Fix the eob checking for dct_dct_64x16 · c284bcdd
      Martin Storsjö authored and Vibhoothi's avatar Vibhoothi committed
      Before this, we never did the early exit from the first pass.
      
      Before:                               Cortex A53      A72      A73
      inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   7275.7   5198.3   5250.9
      inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7276.1   5197.0   5251.3
      inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7275.8   5196.2   5254.5
      inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7273.6   5198.8   5254.2
      After:
      inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   5187.8   3763.8   3735.0
      inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7280.6   5185.6   5256.3
      inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7270.7   5179.8   5250.3
      inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7271.7   5212.4   5256.4
      
      The other related variants didn't have this bug and properly exited
      early when possible.
      c284bcdd
Loading