Skip to content
  • Imdad Sardharwalla's avatar
    SSE4 and AVX2 implementations of updated FAST_SGR · d051e560
    Imdad Sardharwalla authored
    The SSE4.1 and AVX2 implementations of the self-guided filter have been updated
    to match the updated FAST_SGR C implementation in restoration.c.
    
    The self-guided filter speed tests have been altered to compare the speeds of
    the SIMD and C implementations of the relevant functions.
    
    Speed Tests (code compiled with CLANG)
    ===========
    
    For LowBD:
    - The SSE4.1 implementation is ~220% faster (~69% less time) than the C code
    - The AVX2 implementation is ~314% faster (~76% less time) than the C code
    
    For HighBD:
    - The SSE4.1 implementation is ~240% faster (~71% less time) than the C code
    - The AVX2 implementation is ~343% faster (~77% less time) than the C code
    
    Change-Id: Ic2734bb89ccd3f66667c68647e5f677a5a496233
    d051e560