• Geza Lore's avatar
    Add AVX vectorized vp9_diamond_search_sad · f1342a7b
    Geza Lore authored
    This function now has an AVX intrinsics version which is about 80%
    faster compared to the C implementation. This provides a 2-4% total
    speed-up for encode, depending on encoding parameters. The function
    utilizes 3 properties of the cost function lookup table, constructed
    in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
    For the joint cost:
      - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
    For the component costs:
      - For all i: mvsadcost[0][i] == mvsadcost[1][i]
            (equal per component cost)
      - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
            (Cost function is even)
    These must hold, otherwise the AVX version of the function cannot be used.
    
    Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
    f1342a7b
vp9cx.mk 5.71 KB