Skip to content
  • Yunqing Wang's avatar
    Do vertical loopfiltering in parallel · ed36720b
    Yunqing Wang authored
    This patch followed "Add filter_selectively_vert_row2 to enable
    parallel loopfiltering" commit, and added x86 SSE2 optimization
    to do 16-pixel filtering in parallel. For other optimizations
    (neon and dspr2), current 16-pixel functions were done by calling
    8-pixel functions twice, and real 16-pixel functions could be added
    later.
    
    Decoder speedup:
    tulip clip:     2% speed gain;
    old_town_cross: 1.2% speed gain;
    bus:            2% speed gain.
    
    Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
    ed36720b