Skip to content
Snippets Groups Projects
  1. Feb 13, 2025
  2. Feb 12, 2025
  3. Jan 27, 2025
  4. Sep 11, 2024
  5. Mar 12, 2024
  6. Mar 11, 2024
  7. Mar 09, 2024
    • Jean-Marc Valin's avatar
      Fix unaligned load with MSVC · 824f1bec
      Jean-Marc Valin authored
      MSVC doesn't have a real __m128i_u, so it would generate an aligned
      store, resulting in a segfault. Adding explicit loadu/stureu
      intrinsics to make sure the compiler generates unaligned load/store
      824f1bec
  8. Mar 03, 2024
  9. Mar 01, 2024
  10. Feb 25, 2024
  11. Feb 23, 2024
    • Timothy B. Terriberry's avatar
      Rework 32-bit SSE loads yet again. · 59dc75fa
      Timothy B. Terriberry authored and Jean-Marc Valin's avatar Jean-Marc Valin committed
      The existing code in vec_avx.h produced
        warning: dereferencing type-punned pointer will break
         strict-aliasing rules
       with gcc 6.4.0.
      We already had a macro to work around this within the rules of the
       C standard, but trying to use that here does not get optimized
       into a single MOVD like we were hoping.
      Replacing it with memcpy() instead does get optimized correctly,
       but requires switching from a macro to an inline function in order
       to be able to declare a local variable and return a value.
      We already have such an inline function in NSQ_del_dec_avx2.c, so
       hoist that out and use it everywhere, and then convert vec_avx.h
       to use it also.
      59dc75fa
  12. Feb 22, 2024
  13. Feb 21, 2024
  14. Feb 20, 2024
  15. Feb 16, 2024
  16. Feb 15, 2024
  17. Feb 02, 2024
  18. Feb 01, 2024
  19. Jan 31, 2024
  20. Jan 25, 2024
  21. Dec 20, 2023
  22. Dec 15, 2023
  23. Nov 30, 2023
  24. Nov 29, 2023
  25. Nov 28, 2023
  26. Nov 21, 2023
  27. Nov 20, 2023
    • Jean-Marc Valin's avatar
      Misc fixes on previous patch · 6f99a338
      Jean-Marc Valin authored
      Fixes warnings, undefined behaviour, and check-asm failure
      6f99a338
    • Victor Ding's avatar
      Optimize NSQ_del_dec() for AVX2 · 735c4070
      Victor Ding authored and Jean-Marc Valin's avatar Jean-Marc Valin committed
      The optimization is bit-exact with C function.
      
      This optimization speeds up SILK encoder (floating point) as following:
      
      AMD Zen:
      Complexity 0-5 :      0%
      Complexity 6-7 : 3 -  7%
      Complexity 8-10: 8 - 15%
      
      Intel Skylake:
      Complexity 0-5 :       0%
      Complexity 6-7 : 14 - 18%
      Complexity 8-10: 17 - 22%
      
      Adapted by Jean-Marc Valin
      735c4070
Loading