Implement sum_sqr_shift() using two passes with no branch inside the loops
Slightly slower on x86, about the same speed on ARMv7, should be faster on DSPs.
Please register or sign in to comment
Slightly slower on x86, about the same speed on ARMv7, should be faster on DSPs.