1. 08 May, 2014 2 commits
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
      41a350a8
    • Jingning Han's avatar
      SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero · 9e7b09bc
      Jingning Han authored
      This commit enables ssse3 assembly implementation of the 8x8
      inverse 2D-DCT with only first 10 coefficients non-zero. The
      average runtime for this unit goes down from 198 cycles to 129
      cycles (34.8% faster).
      
      Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
      9e7b09bc
  2. 05 May, 2014 1 commit
    • Jingning Han's avatar
      SSSE3 implementation of full inverse 8x8 2D-DCT · 52ae97b6
      Jingning Han authored
      This commit enables SSSE3 version full inverse 8x8 2D-DCT and
      reconstruction. It makes the runtime of vp9_idct8x8_64_add down
      from 256 cycles (SSE2) to 246 cycles.
      
      Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
      52ae97b6