Draft: Optimize NSQ_del_dec() for AVX2
The optimization is bit-exact with C function.
This optimization speeds up SILK encoder (floating point) as following:
AMD Zen:
Complexity 0-5 : 0%
Complexity 6-7 : 3 - 7%
Complexity 8-10: 8 - 15%
Intel Skylake:
Complexity 0-5 : 0%
Complexity 6-7 : 14 - 18%
Complexity 8-10: 17 - 22%