1. 10 Jan, 2017 1 commit
    • Angie Chiang's avatar
      Fix RunAccuracyCheck failure · e6aece86
      Angie Chiang authored
      Measure the accuracy of each transform in terms of per coefficient basis.
      Set up a accuracy limit corresponding to current transform
      implementation.
      
      Change-Id: Ib7db9680c963427e94e728bf453b66180ce30b89
      e6aece86
  2. 29 Nov, 2016 1 commit
  3. 28 Nov, 2016 1 commit
    • Yaowu Xu's avatar
      Fix compiling of tests with emulate-hardware · 46f0f299
      Yaowu Xu authored
      CONFIG_EMULATE_HARDWARE disable SIMD versions of transform functions.
      This commits added !CONFIG_EMULATE_HARDWARE to get tests that use
      SIMD versions of transforms to compile.
      
      Change-Id: I4b9ef5a46ae8f12c439f4fe18766b95f8a520d34
      46f0f299
  4. 22 Nov, 2016 1 commit
  5. 01 Nov, 2016 1 commit
    • Yi Luo's avatar
      Hybrid inverse transforms 16x16 AVX2 optimization · 73172000
      Yi Luo authored
      - Add unit tests to verify the bit-exact result.
      - User level time reduction (EXT_TX):
          encoder: 3.63%
          decoder: 2.36%
      - Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.
      
      Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
      73172000
  6. 13 Oct, 2016 1 commit
  7. 06 Oct, 2016 1 commit
    • Yi Luo's avatar
      Hybrid forward transforms 16x16 AVX2 optimization · e8e8cd8f
      Yi Luo authored
      - Unit tests are added for AVX2 SIMD.
      - Encoder speed improvement:
        AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
        800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
        user level time reduction: 3.86%.
      
      Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
      e8e8cd8f
  8. 01 Sep, 2016 2 commits
  9. 12 Aug, 2016 1 commit
  10. 18 May, 2016 2 commits
    • Angie Chiang's avatar
      Turn on flip in inverse txfm2d · 6f28581b
      Angie Chiang authored
      Fix build failed
      Reduce txfm test time
      
      Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
      6f28581b
    • Yi Luo's avatar
      Integrate HBD row/column flip fwd txfm SSE4.1 optimization · 1d307368
      Yi Luo authored
      - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
        block, for experiment, EXT_TX.
      - Encoder speed improves about 12%-15%.
      - Update the unit tests for bit-exact result against C.
      
      Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
      1d307368
  11. 09 May, 2016 1 commit
    • Yi Luo's avatar
      HBD hybrid transform 16x16 SSE4.1 optimization · 412ad22f
      Yi Luo authored
      - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
      - Update vp10_fht16x16_test.cc to do bit-exact test against
        latest C version.
      - HBD encoder speed improves ~1.8%.
      
      Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6
      412ad22f
  12. 27 Apr, 2016 1 commit
  13. 25 Mar, 2016 1 commit
    • Yi Luo's avatar
      8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization · 770bf715
      Yi Luo authored
      - Wrote function: fidtx8_sse2() and fidtx16_sse2().
      - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
      - Updated 8x8/16x16 unit tests for accuracy/speed.
      - Running 20K times with random numbers and getting through
        tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
        8x8: ~131%
        16x16: ~66%
      
      Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
      770bf715
  14. 21 Mar, 2016 1 commit
    • Debargha Mukherjee's avatar
      Adds 1D transforms for ADST/FlipADST to make 16 · 1b175593
      Debargha Mukherjee authored
      Makes a set of 16 transforms total, adding all 1D
      combinations of ADST and FlipADST, and removng all DST
      transforms.
      
      lowres, midres both improve by about 0.1% and hdres by
      -0.378% in BDRATE but with fewer transforms that are also
      simpler.
      
      Further experiments to continue later.
      
      Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
      1b175593
  15. 08 Mar, 2016 1 commit
    • Yi Luo's avatar
      Implemented DST 16x16 SSE2 intrinsics optimization · 50a164a1
      Yi Luo authored
      - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
      - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
      - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
        fwd_txfm_16x16().
      - Added vp10_fht16x16_sse2() unit test against C version:
        vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
      - Unit test passed.
      - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
        and mobile_cif.y4m.
      
      Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
      50a164a1