    Improve x86 instrinsic implementation. · ecd0acba
    Erik de Castro Lopo authored
    * Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
    * Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
      function to lpc_intrin_sse2.c
    * Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
      (useful for 24-bit en-/decoding)
    * Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
      disables precompute_partition_info_sums_32bit_asm_ia32_().
      SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
      PABSD so it is slightly slower.
    Patch-from: lvqcl <lvqcl.mail@gmail.com>
