- Feb 23, 2024
-
-
The existing code in vec_avx.h produced warning: dereferencing type-punned pointer will break strict-aliasing rules with gcc 6.4.0. We already had a macro to work around this within the rules of the C standard, but trying to use that here does not get optimized into a single MOVD like we were hoping. Replacing it with memcpy() instead does get optimized correctly, but requires switching from a macro to an inline function in order to be able to declare a local variable and return a value. We already have such an inline function in NSQ_del_dec_avx2.c, so hoist that out and use it everywhere, and then convert vec_avx.h to use it also.
-
Jean-Marc Valin authored
Also, remove -march=native because of AVX512VNNI and valgrind
-
- Feb 22, 2024
-
-
Jean-Marc Valin authored
Fixes regression in 83368e6. vcgez_s16() is A64-only, but vcge_s16(..., vdup_n_s16(0)) works everywhere.
-
Jean-Marc Valin authored
broken in 9cf12e92
-
Jean-Marc Valin authored
-
Since any value of dQ > 0 will cause the initial quantizer to degrade to the format-implied maximum (15) with a sufficient number of DRED frames, allow signaling a maximum smaller than 15. This allows encoders to improve the minimum quality of long DRED sequences (at the expense of bitrate) without requiring a constant quantizer for all frames (dQ == 0).
-
-
Timothy B. Terriberry authored
-
Timothy B. Terriberry authored
Commit 735c4070 added uses of intrinsics that require at least gcc 9.0 (cf. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78782>), even though AVX2 support may appear to be available in earlier gcc versions. We were not testing for this. Update the compiler test in configure.ac to use these intrinsics explicitly, so it will error out and disable AVX2 if they are not available.
-
- Feb 21, 2024
-
-
Jan Buethe authored
-
Jean-Marc Valin authored
Also, fix documentation about return value of zero.
-
- Feb 20, 2024
-
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
Trying to add padding in-place breaks when we have extensions, which causes a memcpy() with overlapping data. Just doing a copy instead.
-
Jean-Marc Valin authored
Silences NONTHREADSAFE_PSEUDOSTACK warnings
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
warning: expression does not compute the number of elements in this array Seems like gcc thinks we're trying to get the number of elements in our array or something like that. It then suggests adding parentheses to silence the warning.
-
- Feb 18, 2024
-
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
-
- Feb 17, 2024
-
-
Jean-Marc Valin authored
Also skip the first loss values being generated since they're biased towards "not lost" due to the initialization.
-
- Feb 16, 2024
-
-
They timeout on GitHub actions because those runners are slower.
-
Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-
Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-
Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-
Jan Buethe authored
-
Jean-Marc Valin authored
We don't need redundancy for the first active frame since we already have the main Opus payload.
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
Allows us to exclude the most recent silence from DRED
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
-
- Feb 15, 2024
-
-
Jean-Marc Valin authored
Use the neon version of silk_noise_shape_quantizer_short_prediction()
-
Jan Buethe authored
-
Jan Buethe authored
-
Jan Buethe authored
-
Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-
- Feb 14, 2024
-
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
Thanks to Igor Palaguta for reporting the issue. https://github.com/xiph/opus/issues/313
-
Jan Buethe authored
-
- Feb 11, 2024
-
-
Jean-Marc Valin authored
-
- Feb 10, 2024
-
-
xcorr_kernel_neon_fixed() read one more sample from y[] in the main loop than it needed to allow use of vector loads, but unlike the native asm in celt_pitch_xcorr_arm.s, the loop condition did not exit early enough to prevent this from overrunning the end of the array. Additionally, the tail loop _always_ read one value beyond what it needed. This patch fixes the loop condition on the main loop. Since this makes the tail section run even for lengths that are a multiple of 8 (e.g., on fully half the multiplies for usages like celt_fir() or celt_iir() with an order of 16, which is common), rather than try to fix the tail loop, we replace it with a non-looping adaptation of the native asm, which continues to use vector loads as much as possible for the remaining elements (and also does not read ahead past the end of the y[] array). Overall slowdown of test_opus_encode on a Raspberry Pi 5 Model B Rev 1.0 is 0.12% vs. 0.13% for fixing the existing tail loop. Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-
Compare the output of xcorr_kernel() against the results of xcorr_kernel_c() when configured with --enable-check-asm. Currently this is only checked in fixed point, as a float check requires more sophisticated error analysis and may need to be customized for each vector implementation. Signed-off-by:
Jean-Marc Valin <jmvalin@jmvalin.ca>
-