    Add SSE2 vectorized warp filter for lowbd · d5dfa96e
    David Barker authored
    End-to-end speed improvements: (measured on tempete_cif.y4m,
    20 frames for encoder and all 260 frames for decoder)
    * GLOBAL_MOTION encoder: ~10% faster
    * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
    * WARPED_MOTION encoder: ~2.5% faster
    * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
    The improvement in the GLOBAL_MOTION decoder is particularly
    large because its runtime is dominated by calls to warp_plane().
    This introduces minor changes to the output of the warp filter,
    but these should be rare.
    Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
