    Implemented DST 16x16 SSE2 intrinsics optimization · 50a164a1
    Yi Luo authored
    - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
    - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
    - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
    - Added vp10_fht16x16_sse2() unit test against C version:
      vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
    - Unit test passed.
    - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
      and mobile_cif.y4m.
    Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
