Skip to content
  • Jingning Han's avatar
    Rework idct8x8_10 SSE2 implementation · 1bb11781
    Jingning Han authored
    This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
    the fact that only top-left 4x4 block contains non-zero coefficients,
    and hence reduces the instructions needed.
    
    The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
    estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
    frames coded at 4000kbps, the average decoding speed goes up from
    79.3 fps to 79.7 fps.
    
    Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
    1bb11781