Draft: Optimize NSQ_del_dec() for AVX2
The optimization is bit-exact with C function.
This optimization speeds up SILK encoder (floating point) as following:
AMD Zen: Complexity 0-5 : 0% Complexity 6-7 : 3 - 7% Complexity 8-10: 8 - 15%
Intel Skylake: Complexity 0-5 : 0% Complexity 6-7 : 14 - 18% Complexity 8-10: 17 - 22%