• Yi Luo's avatar
    Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode · 977dccd1
    Yi Luo authored
    - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
      intrinsics optimization.
    - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
      and fdct4x4_sse4_1().
    - Used logic right shift to avoid coeff memory write/read.
    - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
    - Improved overall encoding performance >2.3% for 50 frames
      sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
      --bit-depth=12, 50 frames.
    - Unit test passed.
    Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
highbd_fwd_txfm_sse4.c 7.15 KB