Add optimized vpx_sum_squares_2d_i16 for vp10.
Using this we can eliminate large numbers of calls to predict intra, and is also faster than most of the variance functions it replaces. This is an equivalence transform so coding performance is unaffected. Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all enabled. Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
Showing with 336 additions and 28 deletions