1. 11 May, 2016 1 commit
  2. 25 Nov, 2014 1 commit
  3. 20 Sep, 2014 1 commit
  4. 28 Jul, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC : SSE optimisations. · 02591f6b
      Erik de Castro Lopo authored
      Add new function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse41()
      
      and rewrite function:
      
          FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()
      
      Testing shows noticeable speed increase on Intel Core i3/5/7 (up to 30%
      for -8 mode), AMD Athlon64, Phenom, Bulldozer/Piledriver, but no increase
      or even very small speed decrease (~2% for -8 mode) on Intel Core2.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      02591f6b
  5. 28 Jun, 2014 1 commit
    • Erik de Castro Lopo's avatar
      libFLAC/lpc_intrin_sseN.c : Disambiguate macro names. · 9aa15464
      Erik de Castro Lopo authored
      Previously, the files lpc_intrin_sse2.c and lpc_intrin_sse41.c both defined
      macros RESIDUAL_RESULT and DATA_RESULT. This situation made it impossible
      to merge these files which we may do at some stage.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      9aa15464
  6. 15 Jun, 2014 1 commit
  7. 01 Jun, 2014 1 commit
  8. 24 Mar, 2014 1 commit
  9. 24 Feb, 2014 1 commit
    • Erik de Castro Lopo's avatar
      Don't use intrinsics when they are slower. · cf0e42ae
      Erik de Castro Lopo authored
      More thorough en-/decoding tests show that sometimes the functions
      that use intrinsics are slower (or not really faster) than old
      plain C functions.
      
      After this patch the encoder doesn't use these new functions
      when their usefulness is questionable.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      cf0e42ae
  10. 01 Feb, 2014 1 commit
  11. 30 Jan, 2014 2 commits
  12. 03 Oct, 2013 1 commit
    • Erik de Castro Lopo's avatar
      Improve x86 instrinsic implementation. · ecd0acba
      Erik de Castro Lopo authored
      * Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
      * Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
        function to lpc_intrin_sse2.c
      * Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
        (useful for 24-bit en-/decoding)
      * Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
        disables precompute_partition_info_sums_32bit_asm_ia32_().
        SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
        PABSD so it is slightly slower.
      
      Patch-from: lvqcl <lvqcl.mail@gmail.com>
      ecd0acba