1. 30 Apr, 2016 1 commit
    • Yi Luo's avatar
      HBD hybrid transform 8x8 SSE4.1 optimization · 299c5fc2
      Yi Luo authored
      - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
      - Update bit-exact unit test against current C version.
      - HBD encoder speed improves ~3.8%.
      
      Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
      299c5fc2
  2. 25 Apr, 2016 1 commit
    • Yi Luo's avatar
      HBD hybrid transform 4x4 SSE4.1 optimization · a4593f17
      Yi Luo authored
      - Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
      - Overall encoder speed improves ~4.5%-6%.
      - Update bit-exact unit test against current C version.
      
      Change-Id: If751c030612245b1c2470200c9570cf40d655504
      a4593f17
  3. 15 Apr, 2016 1 commit
  4. 23 Mar, 2016 2 commits
    • Yi Luo's avatar
      Misc. updates for highbd changes · 659c2c98
      Yi Luo authored
      - Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
      - Fixed hybrid transform (HT) types due to recent update.
      - Added new unit test cases for highbd HT.
      
      Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
      659c2c98
    • Yi Luo's avatar
      Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode · 977dccd1
      Yi Luo authored
      - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
        intrinsics optimization.
      - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
        and fdct4x4_sse4_1().
      - Used logic right shift to avoid coeff memory write/read.
      - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
      - Improved overall encoding performance >2.3% for 50 frames
        sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
        --bit-depth=12, 50 frames.
      - Unit test passed.
      
      Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
      977dccd1