Skip to content
Snippets Groups Projects
  1. Feb 23, 2024
    • Timothy B. Terriberry's avatar
      Rework 32-bit SSE loads yet again. · 59dc75fa
      Timothy B. Terriberry authored and Jean-Marc Valin's avatar Jean-Marc Valin committed
      The existing code in vec_avx.h produced
        warning: dereferencing type-punned pointer will break
         strict-aliasing rules
       with gcc 6.4.0.
      We already had a macro to work around this within the rules of the
       C standard, but trying to use that here does not get optimized
       into a single MOVD like we were hoping.
      Replacing it with memcpy() instead does get optimized correctly,
       but requires switching from a macro to an inline function in order
       to be able to declare a local variable and return a value.
      We already have such an inline function in NSQ_del_dec_avx2.c, so
       hoist that out and use it everywhere, and then convert vec_avx.h
       to use it also.
      59dc75fa
    • Jean-Marc Valin's avatar
      Add Deep PLC/DRED/OSCE to random tests · 1186fb8e
      Jean-Marc Valin authored
      Also, remove -march=native because of AVX512VNNI and valgrind
      1186fb8e
  2. Feb 22, 2024
  3. Feb 21, 2024
  4. Feb 20, 2024
  5. Feb 18, 2024
  6. Feb 17, 2024
    • Jean-Marc Valin's avatar
      Add lossgen_demo · 393d463f
      Jean-Marc Valin authored
      Also skip the first loss values being generated since they're
      biased towards "not lost" due to the initialization.
      393d463f
  7. Feb 16, 2024
  8. Feb 15, 2024
  9. Feb 14, 2024
  10. Feb 11, 2024
  11. Feb 10, 2024
    • Timothy B. Terriberry's avatar
      Fix OOB read in fixed-point NEON intrinsics. · 3e69410e
      Timothy B. Terriberry authored and Jean-Marc Valin's avatar Jean-Marc Valin committed
      
      xcorr_kernel_neon_fixed() read one more sample from y[] in the
       main loop than it needed to allow use of vector loads, but unlike
       the native asm in celt_pitch_xcorr_arm.s, the loop condition did
       not exit early enough to prevent this from overrunning the end of
       the array.
      Additionally, the tail loop _always_ read one value beyond what it
       needed.
      
      This patch fixes the loop condition on the main loop.
      Since this makes the tail section run even for lengths that are a
       multiple of 8 (e.g., on fully half the multiplies for usages like
       celt_fir() or celt_iir() with an order of 16, which is common),
       rather than try to fix the tail loop, we replace it with a
       non-looping adaptation of the native asm, which continues to use
       vector loads as much as possible for the remaining elements (and
       also does not read ahead past the end of the y[] array).
      
      Overall slowdown of test_opus_encode on a Raspberry Pi 5 Model B
       Rev 1.0 is 0.12% vs. 0.13% for fixing the existing tail loop.
      
      Signed-off-by: default avatarJean-Marc Valin <jmvalin@jmvalin.ca>
      3e69410e
    • Timothy B. Terriberry's avatar
      Add check-asm for fixed-point xcorr_kernel(). · d5031251
      Timothy B. Terriberry authored and Jean-Marc Valin's avatar Jean-Marc Valin committed
      
      Compare the output of xcorr_kernel() against the results of
       xcorr_kernel_c() when configured with --enable-check-asm.
      Currently this is only checked in fixed point, as a float check
       requires more sophisticated error analysis and may need to be
       customized for each vector implementation.
      
      Signed-off-by: default avatarJean-Marc Valin <jmvalin@jmvalin.ca>
      d5031251
Loading