1. 30 Nov, 2016 1 commit
    • Jingning Han's avatar
      Add 2x2 fwd transform · 12402227
      Jingning Han authored
      Add a 2x2 forward transform function for 4x4 coding block unit.
      Change-Id: I44c8f0d55f371db68541e7e5f7cbd340a82cd788
  2. 09 Nov, 2016 1 commit
  3. 03 Nov, 2016 1 commit
  4. 02 Nov, 2016 1 commit
  5. 20 Oct, 2016 1 commit
    • Yi Luo's avatar
      Fix the overflow of av1_fht32x32() in 2D DCT_DCT · 157e45a4
      Yi Luo authored
      - Use range check function to avoid DCT_DCT overflow.
        We need to re-develop the column txfm side scaling/rounding. Now,
        we prefer to maintain the current BDRate level.
      - Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
      - Add MemCheck unit test and fdct32() unit test.
      Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
  6. 12 Oct, 2016 1 commit
    • Yi Luo's avatar
      Hybrid forward transform 32x32 AVX2 optimization · fed8e1c0
      Yi Luo authored
      - av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
      - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
        But function replacement must go with the corresponding inverse txfm.
      - No obvious user level time reduction due to 32x32 TX_TYPE selection.
      - Zero high 128b YMM to avoid AVX-SSE transition penalties
        (fix 16x16 case).
      - Added 32x32 AVX2 unit tests to verify bitexact.
      - AVX2 optimization summary:
        On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
        C to AVX2: function level time reduction, ~86-89%.
        SSE2 to AVX2: function level time reduction, ~51%.
      Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
  7. 02 Sep, 2016 1 commit
  8. 01 Sep, 2016 2 commits
  9. 18 Aug, 2016 1 commit
  10. 15 Aug, 2016 1 commit
  11. 12 Aug, 2016 1 commit
  12. 21 Jul, 2016 1 commit
    • Debargha Mukherjee's avatar
      Rectangular transforms 4x8 & 8x4 · e5848dea
      Debargha Mukherjee authored
      Added a new expt rect-tx to be used in conjunction with ext-tx.
      [rect-tx is a temporary config flag and will eventually be
      merged into ext-tx once it works correctly with all other
      Added 4x8 and 8x4 tranforms for use initially with rectangular
      sub8x8 y blocks as part of this experiment.
      There is about a -0.2% BDRATE improvement on lowres, others pending.
      When var-tx is on rectangular transforms are currently not used.
      That will be enabled in a subsequent patch.
      Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
  13. 23 Jun, 2016 1 commit
  14. 18 May, 2016 1 commit
    • Yi Luo's avatar
      Integrate HBD row/column flip fwd txfm SSE4.1 optimization · 1d307368
      Yi Luo authored
      - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
        block, for experiment, EXT_TX.
      - Encoder speed improves about 12%-15%.
      - Update the unit tests for bit-exact result against C.
      Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
  15. 11 May, 2016 1 commit
  16. 10 May, 2016 1 commit
  17. 22 Apr, 2016 1 commit
    • Yi Luo's avatar
      Change hybrid transform function argument from TXFM_2D_CFG* to int · cf7f0069
      Yi Luo authored
        Unit test shows manually developed SSE4.1 code would performs ~30%
        better if TXFM_2D_CFG configuration is set in lower level. This
        change only updates function signature. There is no performance
      Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
  18. 14 Apr, 2016 1 commit
  19. 04 Apr, 2016 1 commit
  20. 28 Mar, 2016 1 commit
  21. 25 Mar, 2016 1 commit
    • Yi Luo's avatar
      8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization · 770bf715
      Yi Luo authored
      - Wrote function: fidtx8_sse2() and fidtx16_sse2().
      - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
      - Updated 8x8/16x16 unit tests for accuracy/speed.
      - Running 20K times with random numbers and getting through
        tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
        8x8: ~131%
        16x16: ~66%
      Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
  22. 24 Mar, 2016 1 commit
    • Yi Luo's avatar
      4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization · 4970388c
      Yi Luo authored
      - Added function fidtx4_sse2().
      - Turned on vp10_fht4x4_sse2() for these tx types.
      - Updated 4x4 unit test for speed/accuracy.
      - 4x4 Unit test passed.
      - Running 20K times with random numbers for tx type from
        V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
      Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
  23. 23 Mar, 2016 2 commits
    • Angie Chiang's avatar
      Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10 · d9a0cbb1
      Angie Chiang authored
      Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c
    • Yi Luo's avatar
      Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode · 977dccd1
      Yi Luo authored
      - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
        intrinsics optimization.
      - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
        and fdct4x4_sse4_1().
      - Used logic right shift to avoid coeff memory write/read.
      - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
      - Improved overall encoding performance >2.3% for 50 frames
        sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
        --bit-depth=12, 50 frames.
      - Unit test passed.
      Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
  24. 21 Mar, 2016 1 commit
    • Debargha Mukherjee's avatar
      Adds 1D transforms for ADST/FlipADST to make 16 · 1b175593
      Debargha Mukherjee authored
      Makes a set of 16 transforms total, adding all 1D
      combinations of ADST and FlipADST, and removng all DST
      lowres, midres both improve by about 0.1% and hdres by
      -0.378% in BDRATE but with fewer transforms that are also
      Further experiments to continue later.
      Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
  25. 15 Mar, 2016 1 commit
    • Debargha Mukherjee's avatar
      Refactor 1D transforms · 9b88762b
      Debargha Mukherjee authored
      In preparation for adding more 1D variants with ADST/FlipADST/etc.
      BDRATE actually improves by 0.21% on lowres.
      Change-Id: I2fa4720c69fe001fa666119a284dfc6b17fffab2
  26. 10 Mar, 2016 1 commit
    • Jingning Han's avatar
      Enable hybrid 1-D/2-D transform coding for highbd setting · c453ae53
      Jingning Han authored
      This commit enables the hybrid 1-D/2-D transform coding scheme for
      high bit-depth setting. It improves the compression performance of
      ext-tx experiment by 0.98% for lowres_all set.
      Change-Id: Ic27f5037f2c36b095a93b9f15dbae34bdcdf00aa
  27. 08 Mar, 2016 1 commit
    • Yi Luo's avatar
      Implemented DST 16x16 SSE2 intrinsics optimization · 50a164a1
      Yi Luo authored
      - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
      - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
      - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
      - Added vp10_fht16x16_sse2() unit test against C version:
        vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
      - Unit test passed.
      - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
        and mobile_cif.y4m.
      Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
  28. 07 Mar, 2016 1 commit
    • Jingning Han's avatar
      Hybrid 1-D/2-D transform coding · a8dc9694
      Jingning Han authored
      This commit enables a hybrid 1-D/2-D transform coding scheme and
      the accompany entropy coding system. It currently uses hybrid
      1-D/2-D DCT transform coding. It provides coding performance gains:
      lowres_all  0.55%
      hdres_all   0.43%
      Change-Id: I2b30dcafd21eb2bb3371f6e854cbab440a4dfa78
  29. 24 Feb, 2016 2 commits
    • Yi Luo's avatar
      Implemented DST 8x8 with SSE2 intrinsics. · 0353f596
      Yi Luo authored
      Implemented fdst8_sse2() function against C version: fdst8().
      Added seven DST related hybrid transform types in vp10_fht8x8_sse2().
      Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8().
      Speedup: 18.1%, 11.5%, 22.0% based on speed test from
      city_cif.y4m, garden_sif.y4m, mobile_cif.y4m.
      Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3
    • Debargha Mukherjee's avatar
      Hooks to use 32x32 masked transforms for ext-tx · da2d4a7a
      Debargha Mukherjee authored
      Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked
      transforms for 32x32.
      Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked
      transforms for ext-tx.
      Change-Id: Ie6564830266651fcafae2d536c274dafd664ce17
  30. 19 Feb, 2016 1 commit
    • Yi Luo's avatar
      Initial SSE2 function fdst4_sse2(). · 5456aee6
      Yi Luo authored
      Applied DST sse2 to 4x4 transform.
      Fixed DST coefficient packing to satisfy 4x4 transpose requirement.
      Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d
  31. 20 Jan, 2016 1 commit
    • Julia Robson's avatar
      Making the forward transform consistent with high bit depth · c178b2d1
      Julia Robson authored
      This patch changes the code for 16bit buffers to use the same
      optimisation as is used for 8bit buffers. (See change-Id:
      I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1 for more information
      about the optimisation)
      Change-Id: I5f327a13a7b01fc356114a2aa9d1261bf76d8d69
  32. 25 Nov, 2015 1 commit
    • Angie Chiang's avatar
      Create hybrid_fwd_txfm.c · 96baa73e
      Angie Chiang authored
      Move txfm functions from encodemb to hybrid_twd_txfm.c
      to make encodemb's code flow clear
      Change-Id: If174d8ddb490d149c103e5127d30ef19adfbed13