Skip to content
  • Geza Lore's avatar
    Add AVX vectorized vp9_diamond_search_sad · 5eefd3eb
    Geza Lore authored
    This function now has an AVX intrinsics version which is about 80%
    faster compared to the C implementation. This provides a 2-4% total
    speed-up for encode, depending on encoding parameters. The function
    utilizes 3 properties of the cost function lookup table, constructed
    in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
    For the joint cost:
      - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
    For the component costs:
      - For all i: mvsadcost[0][i] == mvsadcost[1][i]
            (equal per component cost)
      - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
            (Cost function is even)
    These must hold, otherwise the AVX version of the function cannot be used.
    
    Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
    5eefd3eb