-
Luc Trudeau authored
Includes unit tests for conformance and speed. SSE2/CFLAverageSpeedTest: 4x4: C time = 499 us, SIMD time = 156 us (~3.2x) 8x8: C time = 1124 us, SIMD time = 221 us (~5.1x) 16x16: C time = 4228 us, SIMD time = 620 us (~6.8x) 32x32: C time = 8743 us, SIMD time = 2236 us (~3.9x) AVX2/CFLAverageSpeedTest: 4x4: C time = 482 us, SIMD time = 180 us (~2.7x) 8x8: C time = 1007 us, SIMD time = 227 us (~4.4x) 16x16: C time = 3471 us, SIMD time = 324 us (~11x) 32x32: C time = 8758 us, SIMD time = 1443 us (~6.1x) Change-Id: Id5ae80142a9764f388c0770ebcff4e46fa3a4dad
b4faea73