Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on a Xeon E5-2680 v2 desktop. Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid duplicate basenames. Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2() are identical. TODO: They should be unified later if there is no intention to keep a duplicate. Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
Showing with 183 additions and 18 deletions