1. 06 May, 2016 1 commit
  2. 30 Apr, 2016 1 commit
    • Yi Luo's avatar
      HBD hybrid transform 8x8 SSE4.1 optimization · 299c5fc2
      Yi Luo authored
      - Update bit-exact unit test against current C version.
      - HBD encoder speed improves ~3.8%.
      Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
  3. 25 Apr, 2016 1 commit
    • Yi Luo's avatar
      HBD hybrid transform 4x4 SSE4.1 optimization · a4593f17
      Yi Luo authored
      - Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
      - Overall encoder speed improves ~4.5%-6%.
      - Update bit-exact unit test against current C version.
      Change-Id: If751c030612245b1c2470200c9570cf40d655504
  4. 15 Apr, 2016 1 commit
  5. 30 Mar, 2016 1 commit
    • Geza Lore's avatar
      Extend superblock size fo 128x128 pixels. · 552d5cd7
      Geza Lore authored
      If --enable-ext-partition is used at build time, the superblock size
      (sometimes also referred to as coding unit (CU) size) is extended to
      128x128 pixels.
      Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
  6. 25 Mar, 2016 1 commit
    • Yi Luo's avatar
      8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization · 770bf715
      Yi Luo authored
      - Wrote function: fidtx8_sse2() and fidtx16_sse2().
      - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
      - Updated 8x8/16x16 unit tests for accuracy/speed.
      - Running 20K times with random numbers and getting through
        tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
        8x8: ~131%
        16x16: ~66%
      Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
  7. 24 Mar, 2016 1 commit
    • Yi Luo's avatar
      4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization · 4970388c
      Yi Luo authored
      - Added function fidtx4_sse2().
      - Turned on vp10_fht4x4_sse2() for these tx types.
      - Updated 4x4 unit test for speed/accuracy.
      - 4x4 Unit test passed.
      - Running 20K times with random numbers for tx type from
        V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
      Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
  8. 23 Mar, 2016 2 commits
    • Yi Luo's avatar
      Misc. updates for highbd changes · 659c2c98
      Yi Luo authored
      - Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
      - Fixed hybrid transform (HT) types due to recent update.
      - Added new unit test cases for highbd HT.
      Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
    • Yi Luo's avatar
      Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode · 977dccd1
      Yi Luo authored
      - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
        intrinsics optimization.
      - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
        and fdct4x4_sse4_1().
      - Used logic right shift to avoid coeff memory write/read.
      - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
      - Improved overall encoding performance >2.3% for 50 frames
        sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
        --bit-depth=12, 50 frames.
      - Unit test passed.
      Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
  9. 21 Mar, 2016 1 commit
    • Debargha Mukherjee's avatar
      Adds 1D transforms for ADST/FlipADST to make 16 · 1b175593
      Debargha Mukherjee authored
      Makes a set of 16 transforms total, adding all 1D
      combinations of ADST and FlipADST, and removng all DST
      lowres, midres both improve by about 0.1% and hdres by
      -0.378% in BDRATE but with fewer transforms that are also
      Further experiments to continue later.
      Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
  10. 08 Mar, 2016 1 commit
    • Yi Luo's avatar
      Implemented DST 16x16 SSE2 intrinsics optimization · 50a164a1
      Yi Luo authored
      - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
      - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
      - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
      - Added vp10_fht16x16_sse2() unit test against C version:
        vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
      - Unit test passed.
      - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
        and mobile_cif.y4m.
      Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
  11. 02 Mar, 2016 1 commit
    • Yi Luo's avatar
      Fixed a computation bug in fdct16_sse2() · 68d6a507
      Yi Luo authored
      fdct16_sse2() was not bit-exact with C reference, fdct16().
      The inconsistency was found by writing a unit test for
      vp10_fht16x16_sse2().  Since the unit test needs a pending
      change on the inherited base class.  I will commit this unit
      test after making a header file for this base class.
      Passed the uncommitted unit test: vp10_fht16x16_test.cc.
      Change-Id: If2b617883c633a3ea90c19e1d018240c8007102b
  12. 24 Feb, 2016 1 commit
    • Yi Luo's avatar
      Implemented DST 8x8 with SSE2 intrinsics. · 0353f596
      Yi Luo authored
      Implemented fdst8_sse2() function against C version: fdst8().
      Added seven DST related hybrid transform types in vp10_fht8x8_sse2().
      Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8().
      Speedup: 18.1%, 11.5%, 22.0% based on speed test from
      city_cif.y4m, garden_sif.y4m, mobile_cif.y4m.
      Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3
  13. 19 Feb, 2016 1 commit
    • Yi Luo's avatar
      Initial SSE2 function fdst4_sse2(). · 5456aee6
      Yi Luo authored
      Applied DST sse2 to 4x4 transform.
      Fixed DST coefficient packing to satisfy 4x4 transpose requirement.
      Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d
  14. 14 Dec, 2015 1 commit
  15. 09 Nov, 2015 1 commit
    • Johann's avatar
      Release v1.5.0 · cbecf57f
      Johann authored
      Javan Whistling Duck release.
      Change-Id: If44c9ca16a8188b68759325fbacc771365cb4af8
  16. 03 Nov, 2015 1 commit
    • Geza Lore's avatar
      Eliminate copying for FLIPADST in fwd transforms. · 01bb4a31
      Geza Lore authored
      This patch eliminates the copying of data when using FLIPADST forward
      transforms, by incorporating the necessary data flipping into the
      load_buffer_* functions of the SSE2 optimized forward transforms. The
      load_buffer_* functions are normally inlined, so the overhead of copying
      the data is removed and the overhead of flipping is minimized. Left to
      right flipping is still not free, as the columns need to be shuffled in
      To preserve identity between the C and SSE2 implementations, the
      appropriate C implementations now also do the data flipping as part of
      the transform, rather than relying on the caller for flipping the input.
      Overall speedup is about 1.5-2% in encode on my tests. Note that these
      are only the forward transforms. Inverse transforms to come in a later
      There are also a few code hygiene changes:
      - Fixed some indents of switch statements.
      - DCT_DCT transform now always use vp10_fht* functions, which dispatch
        to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
        directly, some of them used to call vp10_fht*).
      Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
  17. 12 Aug, 2015 2 commits