• Luc Trudeau's avatar
    [CFL] SSE2/AVX2 versions of subtract_average · b4faea73
    Luc Trudeau authored
    Includes unit tests for conformance and speed.
    
    SSE2/CFLAverageSpeedTest:
    4x4: C time = 499 us, SIMD time = 156 us (~3.2x)
    8x8: C time = 1124 us, SIMD time = 221 us (~5.1x)
    16x16: C time = 4228 us, SIMD time = 620 us (~6.8x)
    32x32: C time = 8743 us, SIMD time = 2236 us (~3.9x)
    
    AVX2/CFLAverageSpeedTest:
    4x4: C time = 482 us, SIMD time = 180 us (~2.7x)
    8x8: C time = 1007 us, SIMD time = 227 us (~4.4x)
    16x16: C time = 3471 us, SIMD time = 324 us (~11x)
    32x32: C time = 8758 us, SIMD time = 1443 us (~6.1x)
    
    Change-Id: Id5ae80142a9764f388c0770ebcff4e46fa3a4dad
    b4faea73
acm_random.h 2.17 KB