Update and re-enable SILK SSE4.1 optimisations
A number of fixed-point SIMD intrinsics optimizations for Intel processors (up to the SSE4.1 instruction set) made to the SILK code a few years ago were subsequently disabled due to changes in the corresponding C functions, for example in the precision of input data. This is an update of those optimizations so that they can be used again with the latest codebase.
Also included are bit-exactness checks against the C code. They can be enabled like so:
./autogen.sh && ./configure --enable-fixed-point --enable-check-asm --enable-assertions && make check
Note 1: I only briefly looked at the CELT optimizations, as they are still enabled on master. However the C code seems to have changed a little bit over the past few years, so it could be that updates would be required there too. I also tried adding bit-exactness checks to some of them (whose corresponding C code hadn't changed) and got failures, so that could be something to look at in the future.
Note 2: Unfortunately, in terms of performance, re-enabling the SILK SSE4.1 optimizations hardly seems to make a difference to the overall performance, as most cycles seem to be spent in other bottlenecks, particularly the noise shape feedback warping filter. However, that is inherently difficult to parallelize, as each element of the filter depends on the result of the previous one. Therefore I did not attempt to make changes in that area as part of this branch, as my primary goal was just to re-instate the existing optimizations. I may try again in a separate branch.