1. 26 Jul, 2017 1 commit
  2. 20 Jul, 2017 1 commit
    • Sarah Parker's avatar
      Add new MRC_DCT tx type · 53f93dbd
      Sarah Parker authored
      This adds the new transform to the list of possible transforms.
      The impact on performance is in the noise range because the transform
      implementation currently performs DCT as a placeholder. This transform
      will initially only have an implementation for TX_32X32 and it is
      skipped in the tx search for smaller transform sizes.
      Change-Id: Iab2faddc525b478ca06972a753428a4f4ef53ac6
  3. 17 Jul, 2017 1 commit
    • Lester Lu's avatar
      Unify FWD_TXFM_PARAM and INV_TXFM_PARAM · 27319b6e
      Lester Lu authored
      Change two similar structs, FWD_TXFM_PARAM and INV_TXFM_PARAM,
      into a common struct: TxfmParam. Its definition is moved to
      aom_dsp/txfm_common.h to simplify dependency.
      This change is made so that, in later changes of the LGT
      experiment, functions requiring FWD_TXFM_PARAM and
      INV_TXFM_PARAM, such as get_fwd_lgt4 and get_inv_lgt4, can
      also be unified.
      Change-Id: I756b0176a02314005060adbf8e62386f10eeb344
  4. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
  5. 28 Jun, 2017 1 commit
  6. 26 Jun, 2017 1 commit
    • Lester Lu's avatar
      New experiment: LGT · ad8290b8
      Lester Lu authored
      In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
      8/16/32, respectively. In this LGT experiment we explore transforms
      between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
      are replaced by lgt4 and lgt8, the intermediate transforms with
      pre-chosen parameters.
      The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
      mean the self-loop weights times 100. The associated values for DST-7
      and DST-4 are 100 and 200.
      lowres: -0.140
      midres: -0.131
      hdres: -0.078
      These changes are not applied to the highbd scenario in the
      current version.
      Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
  7. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
  8. 01 Jun, 2017 1 commit
    • Timothy B. Terriberry's avatar
      cb4x4: Move sub-4X4 TX sizes behind CONFIG_CHROMA_2X2. · fe67ed6a
      Timothy B. Terriberry authored
      cb4x4 itself should not require these sizes.
      This simplifies compatibility with other experiments, since we can
      first make them work with cb4x4 (which is now on by default), and
      then worry about chroma_2x2 (which is not) in separate steps.
      Encoder and decoder output should remain unchanged.
      Change-Id: I4e9fcdae49f238b5099a3c74a398fe993c2545f8
  9. 19 May, 2017 1 commit
  10. 12 Apr, 2017 1 commit
  11. 31 Mar, 2017 1 commit
  12. 28 Mar, 2017 1 commit
  13. 28 Feb, 2017 1 commit
  14. 13 Jan, 2017 1 commit
  15. 09 Jan, 2017 1 commit
  16. 04 Jan, 2017 2 commits
  17. 20 Dec, 2016 1 commit
  18. 30 Nov, 2016 1 commit
    • Jingning Han's avatar
      Add 2x2 fwd transform · 12402227
      Jingning Han authored
      Add a 2x2 forward transform function for 4x4 coding block unit.
      Change-Id: I44c8f0d55f371db68541e7e5f7cbd340a82cd788
  19. 09 Nov, 2016 1 commit
  20. 03 Nov, 2016 1 commit
  21. 02 Nov, 2016 1 commit
  22. 20 Oct, 2016 1 commit
    • Yi Luo's avatar
      Fix the overflow of av1_fht32x32() in 2D DCT_DCT · 157e45a4
      Yi Luo authored
      - Use range check function to avoid DCT_DCT overflow.
        We need to re-develop the column txfm side scaling/rounding. Now,
        we prefer to maintain the current BDRate level.
      - Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
      - Add MemCheck unit test and fdct32() unit test.
      Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
  23. 12 Oct, 2016 1 commit
    • Yi Luo's avatar
      Hybrid forward transform 32x32 AVX2 optimization · fed8e1c0
      Yi Luo authored
      - av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
      - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
        But function replacement must go with the corresponding inverse txfm.
      - No obvious user level time reduction due to 32x32 TX_TYPE selection.
      - Zero high 128b YMM to avoid AVX-SSE transition penalties
        (fix 16x16 case).
      - Added 32x32 AVX2 unit tests to verify bitexact.
      - AVX2 optimization summary:
        On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
        C to AVX2: function level time reduction, ~86-89%.
        SSE2 to AVX2: function level time reduction, ~51%.
      Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
  24. 02 Sep, 2016 1 commit
  25. 01 Sep, 2016 2 commits
  26. 18 Aug, 2016 1 commit
  27. 15 Aug, 2016 1 commit
  28. 12 Aug, 2016 1 commit
  29. 21 Jul, 2016 1 commit
    • Debargha Mukherjee's avatar
      Rectangular transforms 4x8 & 8x4 · e5848dea
      Debargha Mukherjee authored
      Added a new expt rect-tx to be used in conjunction with ext-tx.
      [rect-tx is a temporary config flag and will eventually be
      merged into ext-tx once it works correctly with all other
      Added 4x8 and 8x4 tranforms for use initially with rectangular
      sub8x8 y blocks as part of this experiment.
      There is about a -0.2% BDRATE improvement on lowres, others pending.
      When var-tx is on rectangular transforms are currently not used.
      That will be enabled in a subsequent patch.
      Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
  30. 23 Jun, 2016 1 commit
  31. 18 May, 2016 1 commit
    • Yi Luo's avatar
      Integrate HBD row/column flip fwd txfm SSE4.1 optimization · 1d307368
      Yi Luo authored
      - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
        block, for experiment, EXT_TX.
      - Encoder speed improves about 12%-15%.
      - Update the unit tests for bit-exact result against C.
      Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
  32. 11 May, 2016 1 commit
  33. 10 May, 2016 1 commit
  34. 22 Apr, 2016 1 commit
    • Yi Luo's avatar
      Change hybrid transform function argument from TXFM_2D_CFG* to int · cf7f0069
      Yi Luo authored
        Unit test shows manually developed SSE4.1 code would performs ~30%
        better if TXFM_2D_CFG configuration is set in lower level. This
        change only updates function signature. There is no performance
      Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
  35. 14 Apr, 2016 1 commit
  36. 04 Apr, 2016 1 commit
  37. 28 Mar, 2016 1 commit
  38. 25 Mar, 2016 1 commit
    • Yi Luo's avatar
      8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization · 770bf715
      Yi Luo authored
      - Wrote function: fidtx8_sse2() and fidtx16_sse2().
      - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
      - Updated 8x8/16x16 unit tests for accuracy/speed.
      - Running 20K times with random numbers and getting through
        tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
        8x8: ~131%
        16x16: ~66%
      Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a