1. 11 Oct, 2017 1 commit
  2. 10 Oct, 2017 4 commits
    • Jingning Han's avatar
      Format clean-up av1_rtcd_defs.pl · 3ba27237
      Jingning Han authored
      Change-Id: I7a94cdef41e5e451247de939313feb58cd991e7f
      3ba27237
    • Lester Lu's avatar
      lgt-from-pred: transforms based on prediction · 432012f6
      Lester Lu authored
      In this experiment, sharp image discontinuity in the predicted
      block is detected. Based on this discontinuity, we choose
      particular LGTs as row and column transforms.
      
      Bitstream syntax, entropy coding, and RD search for LGT are added.
      One binary symbol is used to signal whether LGT is used. This
      experiment can work independently with the lgt experiment.
      
      lowres: -0.414% for key frames, -0.151% overall
      midres: -0.413% for key frames, -0.161% overall
      
      Change-Id: Iaa2f2c2839c34ca4134fa55e77870dc3f1fa879f
      432012f6
    • Rupert Swarbrick's avatar
      Add an SSE4.1 implementation of av1_highbd_convolve_2d_scale · 724d31eb
      Rupert Swarbrick authored
      For large blocks this is about 8x the speed of the C version. The code
      needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit
      multiplies.
      
      The patch uses av1_convolve_scale_test (written already to test the
      low bit depth path) to make sure the optimised code matches the C
      version.
      
      Change-Id: I9304d6bb3d2cb31390de93ed08ff1a852e3ace86
      724d31eb
    • Rupert Swarbrick's avatar
      Add an SSE4.1 implementation of av1_convolve_2d_scale · 98dc22b8
      Rupert Swarbrick authored
      For large blocks this is almost 8x the speed of the C version. The
      code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD
      32-bit multiplies.
      
      This patch also makes av1_convolve_scale_test actually test something,
      making sure the optimised code matches the C version. The slightly
      excessive generality in the test (all the templating) is because of a
      following patch, which is for the high bit depth path and can then use
      most of the same test code.
      
      Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
      98dc22b8
  3. 05 Oct, 2017 1 commit
  4. 02 Oct, 2017 3 commits
  5. 01 Oct, 2017 1 commit
  6. 28 Sep, 2017 1 commit
    • Monty Montgomery's avatar
      Remove dead av1_dct8x8_quant_xxxx functions · 7f7dd08a
      Monty Montgomery authored
      They're unused, disabled in the prototype setup, but still built and
      complicating the already convoluted ifdef mess in TX experiment
      configuration.
      
      Don't leave dead code in the sourcebase.  That's what SCM is for.
      
      Change-Id: Idb2adf597ac064c7b5027df8af1cf65054984aa4
      7f7dd08a
  7. 27 Sep, 2017 1 commit
  8. 20 Sep, 2017 1 commit
    • Joe Young's avatar
      [intra-edge] Vectorize upsampling · ad0196b8
      Joe Young authored
      Add sse4_1 functions for Intra-edge experiment:
        av1_upsample_intra_edge_sse4_1()
        av1_upsample_intra_edge_high_sse4_1()
      
      Approx cycle reduction at qp 20, 1 kf:
        Enc:  0.5% to 0.3%
        Dec:  0.4% to 0.2%
      
      Change-Id: I97f0eee09b78218b418b484d80c338cec037f1b9
      ad0196b8
  9. 16 Sep, 2017 2 commits
    • Joe Young's avatar
      [intra-edge] Vectorize edge filtering functions · 89d321f7
      Joe Young authored
      Add sse4_1 functions for Intra-edge experiment:
        av1_filter_intra_edge_sse4_1()
        av1_filter_intra_edge_high_sse4_1()
      
      Approx cycle reduction at qp 20, 1 kf:
        Enc (lbd) 1.4% to 0.3%
        Dec (lbd) 0.4% to 0.1%
        Enc (hbd) 1.1% to 0.2%
        Dec (hbd) 0.6% to 0.1%
      
      No change to bitstream
      
      Change-Id: I176b2d125424d7d226114c807915c33dde5c3720
      89d321f7
    • Tom Finegan's avatar
      Fix CMake mips32 build with DSPR2 enabled. · db724cf0
      Tom Finegan authored
      - Add aom_scale dspr2 sources to the correct target (aom).
      - Fix an inverted high bit depth condition.
      - Remove claims that dspr2 variants of av1_iht16x16_256_add_dspr2,
        av1_iht8x8_64_add_dspr2, av1_iht4x4_16_add_dspr2 from
        av1_rtcd_defs.pl exist in low bit depth configs.
      
      Change-Id: Ibdd42e475b81c2491f02ba10ca0d461f7ff15bc5
      db724cf0
  10. 10 Sep, 2017 1 commit
  11. 18 Aug, 2017 1 commit
    • Hui Su's avatar
      Remove dpcm-intra experiment · 400bf651
      Hui Su authored
      Coding gain becomes tiny on top of other experiments.
      
      Change-Id: Ia89b1c2a2653f3833dff8ac8bb612eaa3ba18446
      400bf651
  12. 15 Aug, 2017 2 commits
  13. 11 Aug, 2017 2 commits
    • Urvang Joshi's avatar
      tx64x64: Use C version for DCT/IDCT transform. · 900643be
      Urvang Joshi authored
      The SSE4 function does not support 64x64 size, and was triggering an
      assertion failure lowbitdepth is disabled.
      
      BUG=aomedia:672
      
      Change-Id: Id14e76b5c180a211a84c2e933b07e8acf72dddbc
      900643be
    • Steinar Midtskogen's avatar
      Add experiment CONFIG_CDEF_SINGLEPASS: Make CDEF single pass · 5978212b
      Steinar Midtskogen authored
      Low latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.3162 | -0.6719 | -0.6535 |   0.0089 | -0.3890 | -0.1515 |    -0.6682
      
      High latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0293 | -0.3556 | -0.5505 |   0.0684 | -0.0862 |  0.0513 |    -0.2765
      
      Low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.2248 | -0.7764 | -0.6630 |  -0.2109 | -0.3240 | -0.2532 |    -0.6980
      
      High latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1118 | -0.5841 | -0.7406 |  -0.0463 | -0.2442 | -0.1064 |    -0.4187
      
      Change-Id: I9ca8399c8f45489541a66f535fb3d771eb1d59ab
      5978212b
  14. 08 Aug, 2017 1 commit
    • Thomas Davies's avatar
      Refactor quantization C code. · f3b5ee14
      Thomas Davies authored
      This commit de-duplicates C reference quantization code
      and unifies quantization matrix (QM) and non-QM code
      paths when there is no SIMD.
      
      The reorganisation also will facilitate re-using SIMD quant
      functions for QM when the matrix is flat, as is the
      default when AOM_QM is enabled.
      
      Change-Id: Idbfdac9eb9a31adcffe734aac1877d58b86fab77
      f3b5ee14
  15. 04 Aug, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF cleanup · 94de0aaa
      Steinar Midtskogen authored
      Name changes and code moves to bring code more in line with the
      design doc and an upcoming single-pass patch.  No functional changes.
      
      Change-Id: I2bccd58c644e534b139f420b623390aa971fbdb0
      94de0aaa
  16. 31 Jul, 2017 1 commit
    • Peter de Rivaz's avatar
      Unified warp_affine and warp_affine_post_round · b6a31753
      Peter de Rivaz authored
      This patch removes the need for a separate warp_affine_post_round
      function by adding the functionality to the warp_affine function.
      
      The encoded output should remain unchanged, but the encoder/decoder
      should operate faster because the sse2 and ssse3 warp implementation
      can now be used when post_rounding is being used.
      
      Change-Id: Ide52cae55de59a9da9c27c5793e17390f6d2c03e
      b6a31753
  17. 24 Jul, 2017 1 commit
    • Urvang Joshi's avatar
      filter-intra: Support rectangular blocks. · 6a99691d
      Urvang Joshi authored
      - Use 'tx_size' in function signatures.
      - filter_intra_taps_3 and filter_intra_taps_4 updated to support
        TX_SIZES_ALL (thanks to yuec@)
      
      With these changes, filter-intra works correctly with rect-intra-pred.
      So, we remove the temporary workaround for this.
      
      Change-Id: Ide0f593419c21a74c08c61859f8dad918ca169fa
      6a99691d
  18. 17 Jul, 2017 1 commit
    • Lester Lu's avatar
      Unify FWD_TXFM_PARAM and INV_TXFM_PARAM · 27319b6e
      Lester Lu authored
      Change two similar structs, FWD_TXFM_PARAM and INV_TXFM_PARAM,
      into a common struct: TxfmParam. Its definition is moved to
      aom_dsp/txfm_common.h to simplify dependency.
      
      This change is made so that, in later changes of the LGT
      experiment, functions requiring FWD_TXFM_PARAM and
      INV_TXFM_PARAM, such as get_fwd_lgt4 and get_inv_lgt4, can
      also be unified.
      
      Change-Id: I756b0176a02314005060adbf8e62386f10eeb344
      27319b6e
  19. 13 Jul, 2017 1 commit
    • Yi Luo's avatar
      Speed up convolve_round post-rounding by avx2 · 04cef497
      Yi Luo authored
      - Decoder convolve rounding cycle percentage drops from
        2.75% to 0.91% by using avx2 function on i7-6700.
      
      Change-Id: I34ae48f45c0b4073f8962647d2181365ffe3325b
      04cef497
  20. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
      d8b1ddce
  21. 28 Jun, 2017 1 commit
  22. 27 Jun, 2017 1 commit
    • Yi Luo's avatar
      Fix inv txfm low/high bitdepth selection logic · 51281095
      Yi Luo authored
      We are going to have several commits to setup new low/high
      bitdepth data path selection logic. This patch is for inverse
      transform. Let me summarize the ideas as following.
      
      - For low/high bitdepth selection, encoder depends on
        input configuration, e.g., video sequence bitdepth,
        profile. Decoder depends on input bitstream. This has
        nothing to do with compiler/build  configuration.
      
      - Typical encoder usage for sampling format 4:2:0.
        1) 8-bit video sequence:
         a) --profile=0
         Fastest encoding/decoding pipeline on speedup.
      
         b) --profile=2 --bit-depth=10
         Image pixels are left shifted by 2 bits. It
         employs 16-bit reference frame buffer and has high
         calculation precision. It usually enjoys higher
         compression performance.
      
        2) 10/12-bit video sequence (HDR):
         --profile=2 --bit-depth=10/12
      
      - Transform coefficient type:
        Lowbitdepth:  int16_t
        Highbitdepth: int32_t
      
      - The type, tran_low_t is still used in codebase,
        Which is int32_t, defining the data path capacity.
        Naturally, it is high bitdepth.
      
      Eventually we shall remove the configuration flags,
      CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
      low and high bitdepth data path. Two data paths co-exist
      in the same build environment.
      
      Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
      51281095
  23. 21 Jun, 2017 1 commit
  24. 20 Jun, 2017 3 commits
  25. 16 Jun, 2017 1 commit
  26. 13 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add fast path quantizer AVX2 · 2d44b697
      Yi Luo authored
      - Function level improves 36% against sse2.
      - Encoder speeds up 2.6% at user level on i7-6700.
      
      Change-Id: I9e43ce60b1e0de8f532249e5c035851463d75dbb
      2d44b697
  27. 09 Jun, 2017 1 commit
  28. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  29. 07 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add HBD data path for av1_block_error_avx2 · d61e608d
      Yi Luo authored
      - Add unit test for av1_block_error.
      - Fix av1_dist_block logic for calling av1_block_error.
      
      Change-Id: Id8a47ee113417360a29fc2334d9ca72b5793e2d7
      d61e608d
  30. 25 May, 2017 1 commit
    • Yi Luo's avatar
      Add HBD build to av1_quantize_fp_sse2 · bf8af7e6
      Yi Luo authored
      - This change turns on low bit depth data path for
        this function under default HBD build.
      - Encoder user level encoding time reduces ~12%
        on i7-6700.
      
      Change-Id: I7ce21e8db1a379f972e51c3b4ab305ca10e41efb
      bf8af7e6