Skip to content
  • Jingning Han's avatar
    Optimize 32x32 2D inverse DCT for speed-up · 9d67495f
    Jingning Han authored
    This commit exploits the sparsity of quantized coefficient matrix.
    It detects each 32x8 array and skip the corresponding inverse
    transformation if all entries are zero.
    
    For ped1080p at 8000 kbps, this on average reduces the runtime of
    32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
    cycles. It makes the overall encoding process about 2% faster at
    speed 0. The speed-up is more pronounceable for the decoding process.
    
    Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
    9d67495f