Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip the destination array, add the result of the inverse transform, to it and then flip the destination back. This has been replaced by flipping the result of the inverse transform before adding it to the destination. Up-Down flipping is done by negating the destination stride, and staring from the bottom, so it should now be free. Left-right flipping is done with the usual SSE2 instructions in the optimized code. The C functions match the SSE2 functions as expected, so the C functions now do the flipping as well when required. Adding this cleanly required some refactoring of the C functions, but there is no measurable performance impact when ext-tx is not enabled. Encode speedup with ext-tx enabled is about 3%. Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
Showing with 425 additions and 364 deletions