Skip to content
  • Tamar Levy's avatar
    VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization · 3c5256d5
    Tamar Levy authored
    The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
    The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
    instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
    credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
    
    Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
    3c5256d5