1. 15 Sep, 2017 1 commit
    • Nathan E. Egge's avatar
      Force C implementation of 16-point Daala TX's. · 34e1201a
      Nathan E. Egge authored
      This patch fixes a regression introduced in 1d190950 where the encoder
       was using the 16x16 VP9/AV1 transforms for RDO, but then used the Daala
       transforms for encoding.
      
      subset1:
      
      master-daala_dct16@2017-09-13T12:05:18.013Z ->
        master_daala_dct16_use_c@2017-09-13T13:05:02.252Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.3654 | -0.7634 | -0.7407 |  -0.4884 | -0.4699 | -0.4945 |    -0.5104
      
      master-no_rect_tx-no_var_tx@2017-09-12T00:23:18.153Z ->
        master_daala_dct16_use_c@2017-09-13T13:05:02.252Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0133 |  0.1040 | -0.0440 |  -0.0492 | -0.0151 | -0.0120 |     0.0699
      
      Change-Id: Id1830d0975db4bd0320a47fdf45b4bca20881cfb
      34e1201a
  2. 25 Aug, 2017 1 commit
    • Nathan E. Egge's avatar
      Force C implementations when using Daala DCT's. · e030936c
      Nathan E. Egge authored
      This patch fixes a regression introduced in 1d190950 where the encoder
       was using the 4x4 VP9/AV1 transforms for RDO, but then used the Daala
       transforms for encoding.
      The ~2% improvement below comes from forcing the C implementation of the
       4x4 and 8x8 transforms to be used when CONFIG_DAALA_DCT4 and
       CONFIG_DAALA_DCT8 are enabled respectively.
      
      subset-1 (--enable-experimental --enable-daala_dct4):
      
      master@2017-08-21T21:41:18.302Z ->
       master_daala_dct4_use_c@2017-08-22T02:39:14.457Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -2.1953 | -1.2044 | -1.1865 |  -1.6173 | -1.7029 | -1.6784 |    -1.7235
      
      Change-Id: I44d2b24094e89b2857ae03d743180e706cef45eb
      e030936c
  3. 26 Jul, 2017 1 commit
  4. 20 Jul, 2017 1 commit
    • Sarah Parker's avatar
      Add new MRC_DCT tx type · 53f93dbd
      Sarah Parker authored
      This adds the new transform to the list of possible transforms.
      The impact on performance is in the noise range because the transform
      implementation currently performs DCT as a placeholder. This transform
      will initially only have an implementation for TX_32X32 and it is
      skipped in the tx search for smaller transform sizes.
      
      Change-Id: Iab2faddc525b478ca06972a753428a4f4ef53ac6
      53f93dbd
  5. 17 Jul, 2017 1 commit
    • Lester Lu's avatar
      Unify FWD_TXFM_PARAM and INV_TXFM_PARAM · 27319b6e
      Lester Lu authored
      Change two similar structs, FWD_TXFM_PARAM and INV_TXFM_PARAM,
      into a common struct: TxfmParam. Its definition is moved to
      aom_dsp/txfm_common.h to simplify dependency.
      
      This change is made so that, in later changes of the LGT
      experiment, functions requiring FWD_TXFM_PARAM and
      INV_TXFM_PARAM, such as get_fwd_lgt4 and get_inv_lgt4, can
      also be unified.
      
      Change-Id: I756b0176a02314005060adbf8e62386f10eeb344
      27319b6e
  6. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
      d8b1ddce
  7. 28 Jun, 2017 1 commit
  8. 26 Jun, 2017 1 commit
    • Lester Lu's avatar
      New experiment: LGT · ad8290b8
      Lester Lu authored
      In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
      8/16/32, respectively. In this LGT experiment we explore transforms
      between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
      are replaced by lgt4 and lgt8, the intermediate transforms with
      pre-chosen parameters.
      
      The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
      mean the self-loop weights times 100. The associated values for DST-7
      and DST-4 are 100 and 200.
      
      ovr_psnr:
      lowres: -0.140
      midres: -0.131
      hdres: -0.078
      
      These changes are not applied to the highbd scenario in the
      current version.
      
      Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
      ad8290b8
  9. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  10. 01 Jun, 2017 1 commit
    • Timothy B. Terriberry's avatar
      cb4x4: Move sub-4X4 TX sizes behind CONFIG_CHROMA_2X2. · fe67ed6a
      Timothy B. Terriberry authored
      cb4x4 itself should not require these sizes.
      
      This simplifies compatibility with other experiments, since we can
      first make them work with cb4x4 (which is now on by default), and
      then worry about chroma_2x2 (which is not) in separate steps.
      
      Encoder and decoder output should remain unchanged.
      
      Change-Id: I4e9fcdae49f238b5099a3c74a398fe993c2545f8
      fe67ed6a
  11. 19 May, 2017 1 commit
  12. 12 Apr, 2017 1 commit
  13. 31 Mar, 2017 1 commit
  14. 28 Mar, 2017 1 commit
  15. 28 Feb, 2017 1 commit
  16. 13 Jan, 2017 1 commit
  17. 09 Jan, 2017 1 commit
  18. 04 Jan, 2017 2 commits
  19. 20 Dec, 2016 1 commit
  20. 30 Nov, 2016 1 commit
    • Jingning Han's avatar
      Add 2x2 fwd transform · 12402227
      Jingning Han authored
      Add a 2x2 forward transform function for 4x4 coding block unit.
      
      Change-Id: I44c8f0d55f371db68541e7e5f7cbd340a82cd788
      12402227
  21. 09 Nov, 2016 1 commit
  22. 03 Nov, 2016 1 commit
  23. 02 Nov, 2016 1 commit
  24. 20 Oct, 2016 1 commit
    • Yi Luo's avatar
      Fix the overflow of av1_fht32x32() in 2D DCT_DCT · 157e45a4
      Yi Luo authored
      - Use range check function to avoid DCT_DCT overflow.
        We need to re-develop the column txfm side scaling/rounding. Now,
        we prefer to maintain the current BDRate level.
      - Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
      - Add MemCheck unit test and fdct32() unit test.
      
      Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
      157e45a4
  25. 12 Oct, 2016 1 commit
    • Yi Luo's avatar
      Hybrid forward transform 32x32 AVX2 optimization · fed8e1c0
      Yi Luo authored
      - av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
      
      - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
        But function replacement must go with the corresponding inverse txfm.
      
      - No obvious user level time reduction due to 32x32 TX_TYPE selection.
      
      - Zero high 128b YMM to avoid AVX-SSE transition penalties
        (fix 16x16 case).
      
      - Added 32x32 AVX2 unit tests to verify bitexact.
      
      - AVX2 optimization summary:
        On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
        C to AVX2: function level time reduction, ~86-89%.
        SSE2 to AVX2: function level time reduction, ~51%.
      
      Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
      fed8e1c0
  26. 02 Sep, 2016 1 commit
  27. 01 Sep, 2016 2 commits
  28. 18 Aug, 2016 1 commit
  29. 15 Aug, 2016 1 commit
  30. 12 Aug, 2016 1 commit
  31. 21 Jul, 2016 1 commit
    • Debargha Mukherjee's avatar
      Rectangular transforms 4x8 & 8x4 · e5848dea
      Debargha Mukherjee authored
      Added a new expt rect-tx to be used in conjunction with ext-tx.
      [rect-tx is a temporary config flag and will eventually be
      merged into ext-tx once it works correctly with all other
      experiments].
      
      Added 4x8 and 8x4 tranforms for use initially with rectangular
      sub8x8 y blocks as part of this experiment.
      
      There is about a -0.2% BDRATE improvement on lowres, others pending.
      
      When var-tx is on rectangular transforms are currently not used.
      That will be enabled in a subsequent patch.
      
      Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
      e5848dea
  32. 23 Jun, 2016 1 commit
  33. 18 May, 2016 1 commit
    • Yi Luo's avatar
      Integrate HBD row/column flip fwd txfm SSE4.1 optimization · 1d307368
      Yi Luo authored
      - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
        block, for experiment, EXT_TX.
      - Encoder speed improves about 12%-15%.
      - Update the unit tests for bit-exact result against C.
      
      Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
      1d307368
  34. 11 May, 2016 1 commit
  35. 10 May, 2016 1 commit
  36. 22 Apr, 2016 1 commit
    • Yi Luo's avatar
      Change hybrid transform function argument from TXFM_2D_CFG* to int · cf7f0069
      Yi Luo authored
        Unit test shows manually developed SSE4.1 code would performs ~30%
        better if TXFM_2D_CFG configuration is set in lower level. This
        change only updates function signature. There is no performance
        impact.
      
      Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
      cf7f0069
  37. 14 Apr, 2016 1 commit
  38. 04 Apr, 2016 1 commit