1. 11 May, 2016 1 commit
  2. 25 Nov, 2014 1 commit
  3. 20 Sep, 2014 1 commit
  4. 28 Jul, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC : SSE optimisations. · 02591f6b
      Erik de Castro Lopo authored
      Add new function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse41()
      
      and rewrite function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()
      
      Testing shows noticeable speed increase on Intel Core i3/5/7 (up to 30%
      for -8 mode), AMD Athlon64, Phenom, Bulldozer/Piledriver, but no increase
      or even very small speed decrease (~2% for -8 mode) on Intel Core2.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      02591f6b
  5. 28 Jun, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC/lpc_intrin_sseN.c : Disambiguate macro names. · 9aa15464
      Erik de Castro Lopo authored
      Previously, the files lpc_intrin_sse2.c and lpc_intrin_sse41.c both defined
      macros RESIDUAL_RESULT and DATA_RESULT. This situation made it impossible
      to merge these files which we may do at some stage.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      9aa15464
  6. 15 Jun, 2014 1 commit
  7. 01 Jun, 2014 1 commit
  8. 24 Mar, 2014 1 commit
  9. 31 Jan, 2014 1 commit
    • Erik de Castro Lopo's avatar
      Add a fast shift for int64 values. · 4618512d
      Erik de Castro Lopo authored
      This patch changes the code from:
      	(FLAC__int32)(xmm.m128i_i64[0] >> lp_quantization)
      into:
      	_mm_cvtsi128_si32(_mm_srli_epi64(xmm, lp_quantization));
      
      Encoding of 24-bit .wav files with 32-bit FLAC became noticeably faster.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      4618512d
  10. 30 Jan, 2014 1 commit
  11. 03 Oct, 2013 1 commit
    • Erik de Castro Lopo's avatar
      Improve x86 instrinsic implementation. · ecd0acba
      Erik de Castro Lopo authored
      * Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
      * Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
        function to lpc_intrin_sse2.c
      * Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
        (useful for 24-bit en-/decoding)
      * Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
        disables precompute_partition_info_sums_32bit_asm_ia32_().
        SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
        PABSD so it is slightly slower.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      ecd0acba