1. 23 Mar, 2016 1 commit
    • Yi Luo's avatar
      Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode · 977dccd1
      Yi Luo authored
      - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
        intrinsics optimization.
      - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
        and fdct4x4_sse4_1().
      - Used logic right shift to avoid coeff memory write/read.
      - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
      - Improved overall encoding performance >2.3% for 50 frames
        sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
        --bit-depth=12, 50 frames.
      - Unit test passed.
      
      Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
      977dccd1