1. 19 Sep, 2018 1 commit
  2. 06 May, 2018 1 commit
  3. 26 Jun, 2017 1 commit
  4. 19 Feb, 2017 3 commits
    • Erik de Castro Lopo's avatar
      SIMD: Accelerate decoding of 16 bit FLAC · ec795695
      Erik de Castro Lopo authored
      This patch removes FLAC__lpc_restore_signal_16_intrin_sse2().
      
      It's faster than C code, but not faster than MMX-accelerated
      ASM functions. It's also slower than the new SSE4.1 functions
      that were added by the previous patch.
      So this function wasn't very useful before, and now it's
      even less useful. I don't see a reason to keep it.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      ec795695
    • Erik de Castro Lopo's avatar
      SIMD: Improve decoding of some 24 bit files · f9f5646a
      Erik de Castro Lopo authored
      Accelerates decoding of non-Subset 24-bit FLAC files (where lpc_order
      > 12).
      
      The improved function is FLAC__lpc_restore_signal_wide_intrin_sse41().
      It requires SSE4.1 and it's used only by 32-bit libFLAC.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      f9f5646a
    • Erik de Castro Lopo's avatar
      SIMD: Add const qualifier where appropriate · 086b493a
      Erik de Castro Lopo authored
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      086b493a
  5. 31 Jan, 2017 2 commits
  6. 14 Jan, 2017 1 commit
    • Erik de Castro Lopo's avatar
      Purge usage of `unsigned` type · c6318e9d
      Erik de Castro Lopo authored
      As pointed out by Ozkan Sezer, on some platforms `int32_t` is actually
      a typedef for `long` so `unsigned` cannot be used interchangably with
      `FLAC__uint32`. Fix is to switch from `unsigned` to explicit sized ISO
      C types defined in <stdint.h>.
      c6318e9d
  7. 04 Dec, 2016 1 commit
  8. 20 Jun, 2016 1 commit
  9. 11 May, 2016 1 commit
  10. 25 Nov, 2014 1 commit
  11. 20 Sep, 2014 1 commit
  12. 28 Jul, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC : SSE optimisations. · 02591f6b
      Erik de Castro Lopo authored
      Add new function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse41()
      
      and rewrite function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()
      
      Testing shows noticeable speed increase on Intel Core i3/5/7 (up to 30%
      for -8 mode), AMD Athlon64, Phenom, Bulldozer/Piledriver, but no increase
      or even very small speed decrease (~2% for -8 mode) on Intel Core2.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      02591f6b
  13. 28 Jun, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC/lpc_intrin_sseN.c : Disambiguate macro names. · 9aa15464
      Erik de Castro Lopo authored
      Previously, the files lpc_intrin_sse2.c and lpc_intrin_sse41.c both defined
      macros RESIDUAL_RESULT and DATA_RESULT. This situation made it impossible
      to merge these files which we may do at some stage.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      9aa15464
  14. 15 Jun, 2014 1 commit
  15. 01 Jun, 2014 1 commit
  16. 24 Mar, 2014 1 commit
  17. 31 Jan, 2014 1 commit
    • Erik de Castro Lopo's avatar
      Add a fast shift for int64 values. · 4618512d
      Erik de Castro Lopo authored
      This patch changes the code from:
      	(FLAC__int32)(xmm.m128i_i64[0] >> lp_quantization)
      into:
      	_mm_cvtsi128_si32(_mm_srli_epi64(xmm, lp_quantization));
      
      Encoding of 24-bit .wav files with 32-bit FLAC became noticeably faster.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      4618512d
  18. 30 Jan, 2014 1 commit
  19. 03 Oct, 2013 1 commit
    • Erik de Castro Lopo's avatar
      Improve x86 instrinsic implementation. · ecd0acba
      Erik de Castro Lopo authored
      * Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
      * Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
        function to lpc_intrin_sse2.c
      * Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
        (useful for 24-bit en-/decoding)
      * Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
        disables precompute_partition_info_sums_32bit_asm_ia32_().
        SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
        PABSD so it is slightly slower.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      ecd0acba