1. 05 Oct, 2017 1 commit
  2. 02 Oct, 2017 3 commits
  3. 01 Oct, 2017 1 commit
  4. 28 Sep, 2017 1 commit
    • Monty Montgomery's avatar
      Remove dead av1_dct8x8_quant_xxxx functions · 7f7dd08a
      Monty Montgomery authored
      They're unused, disabled in the prototype setup, but still built and
      complicating the already convoluted ifdef mess in TX experiment
      configuration.
      
      Don't leave dead code in the sourcebase.  That's what SCM is for.
      
      Change-Id: Idb2adf597ac064c7b5027df8af1cf65054984aa4
      7f7dd08a
  5. 27 Sep, 2017 1 commit
  6. 20 Sep, 2017 1 commit
    • Joe Young's avatar
      [intra-edge] Vectorize upsampling · ad0196b8
      Joe Young authored
      Add sse4_1 functions for Intra-edge experiment:
        av1_upsample_intra_edge_sse4_1()
        av1_upsample_intra_edge_high_sse4_1()
      
      Approx cycle reduction at qp 20, 1 kf:
        Enc:  0.5% to 0.3%
        Dec:  0.4% to 0.2%
      
      Change-Id: I97f0eee09b78218b418b484d80c338cec037f1b9
      ad0196b8
  7. 16 Sep, 2017 2 commits
    • Joe Young's avatar
      [intra-edge] Vectorize edge filtering functions · 89d321f7
      Joe Young authored
      Add sse4_1 functions for Intra-edge experiment:
        av1_filter_intra_edge_sse4_1()
        av1_filter_intra_edge_high_sse4_1()
      
      Approx cycle reduction at qp 20, 1 kf:
        Enc (lbd) 1.4% to 0.3%
        Dec (lbd) 0.4% to 0.1%
        Enc (hbd) 1.1% to 0.2%
        Dec (hbd) 0.6% to 0.1%
      
      No change to bitstream
      
      Change-Id: I176b2d125424d7d226114c807915c33dde5c3720
      89d321f7
    • Tom Finegan's avatar
      Fix CMake mips32 build with DSPR2 enabled. · db724cf0
      Tom Finegan authored
      - Add aom_scale dspr2 sources to the correct target (aom).
      - Fix an inverted high bit depth condition.
      - Remove claims that dspr2 variants of av1_iht16x16_256_add_dspr2,
        av1_iht8x8_64_add_dspr2, av1_iht4x4_16_add_dspr2 from
        av1_rtcd_defs.pl exist in low bit depth configs.
      
      Change-Id: Ibdd42e475b81c2491f02ba10ca0d461f7ff15bc5
      db724cf0
  8. 10 Sep, 2017 1 commit
  9. 18 Aug, 2017 1 commit
    • Hui Su's avatar
      Remove dpcm-intra experiment · 400bf651
      Hui Su authored
      Coding gain becomes tiny on top of other experiments.
      
      Change-Id: Ia89b1c2a2653f3833dff8ac8bb612eaa3ba18446
      400bf651
  10. 15 Aug, 2017 2 commits
  11. 11 Aug, 2017 2 commits
    • Urvang Joshi's avatar
      tx64x64: Use C version for DCT/IDCT transform. · 900643be
      Urvang Joshi authored
      The SSE4 function does not support 64x64 size, and was triggering an
      assertion failure lowbitdepth is disabled.
      
      BUG=aomedia:672
      
      Change-Id: Id14e76b5c180a211a84c2e933b07e8acf72dddbc
      900643be
    • Steinar Midtskogen's avatar
      Add experiment CONFIG_CDEF_SINGLEPASS: Make CDEF single pass · 5978212b
      Steinar Midtskogen authored
      Low latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.3162 | -0.6719 | -0.6535 |   0.0089 | -0.3890 | -0.1515 |    -0.6682
      
      High latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0293 | -0.3556 | -0.5505 |   0.0684 | -0.0862 |  0.0513 |    -0.2765
      
      Low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.2248 | -0.7764 | -0.6630 |  -0.2109 | -0.3240 | -0.2532 |    -0.6980
      
      High latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1118 | -0.5841 | -0.7406 |  -0.0463 | -0.2442 | -0.1064 |    -0.4187
      
      Change-Id: I9ca8399c8f45489541a66f535fb3d771eb1d59ab
      5978212b
  12. 08 Aug, 2017 1 commit
    • Thomas Davies's avatar
      Refactor quantization C code. · f3b5ee14
      Thomas Davies authored
      This commit de-duplicates C reference quantization code
      and unifies quantization matrix (QM) and non-QM code
      paths when there is no SIMD.
      
      The reorganisation also will facilitate re-using SIMD quant
      functions for QM when the matrix is flat, as is the
      default when AOM_QM is enabled.
      
      Change-Id: Idbfdac9eb9a31adcffe734aac1877d58b86fab77
      f3b5ee14
  13. 04 Aug, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF cleanup · 94de0aaa
      Steinar Midtskogen authored
      Name changes and code moves to bring code more in line with the
      design doc and an upcoming single-pass patch.  No functional changes.
      
      Change-Id: I2bccd58c644e534b139f420b623390aa971fbdb0
      94de0aaa
  14. 31 Jul, 2017 1 commit
    • Peter de Rivaz's avatar
      Unified warp_affine and warp_affine_post_round · b6a31753
      Peter de Rivaz authored
      This patch removes the need for a separate warp_affine_post_round
      function by adding the functionality to the warp_affine function.
      
      The encoded output should remain unchanged, but the encoder/decoder
      should operate faster because the sse2 and ssse3 warp implementation
      can now be used when post_rounding is being used.
      
      Change-Id: Ide52cae55de59a9da9c27c5793e17390f6d2c03e
      b6a31753
  15. 24 Jul, 2017 1 commit
    • Urvang Joshi's avatar
      filter-intra: Support rectangular blocks. · 6a99691d
      Urvang Joshi authored
      - Use 'tx_size' in function signatures.
      - filter_intra_taps_3 and filter_intra_taps_4 updated to support
        TX_SIZES_ALL (thanks to yuec@)
      
      With these changes, filter-intra works correctly with rect-intra-pred.
      So, we remove the temporary workaround for this.
      
      Change-Id: Ide0f593419c21a74c08c61859f8dad918ca169fa
      6a99691d
  16. 17 Jul, 2017 1 commit
    • Lester Lu's avatar
      Unify FWD_TXFM_PARAM and INV_TXFM_PARAM · 27319b6e
      Lester Lu authored
      Change two similar structs, FWD_TXFM_PARAM and INV_TXFM_PARAM,
      into a common struct: TxfmParam. Its definition is moved to
      aom_dsp/txfm_common.h to simplify dependency.
      
      This change is made so that, in later changes of the LGT
      experiment, functions requiring FWD_TXFM_PARAM and
      INV_TXFM_PARAM, such as get_fwd_lgt4 and get_inv_lgt4, can
      also be unified.
      
      Change-Id: I756b0176a02314005060adbf8e62386f10eeb344
      27319b6e
  17. 13 Jul, 2017 1 commit
    • Yi Luo's avatar
      Speed up convolve_round post-rounding by avx2 · 04cef497
      Yi Luo authored
      - Decoder convolve rounding cycle percentage drops from
        2.75% to 0.91% by using avx2 function on i7-6700.
      
      Change-Id: I34ae48f45c0b4073f8962647d2181365ffe3325b
      04cef497
  18. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
      d8b1ddce
  19. 28 Jun, 2017 1 commit
  20. 27 Jun, 2017 1 commit
    • Yi Luo's avatar
      Fix inv txfm low/high bitdepth selection logic · 51281095
      Yi Luo authored
      We are going to have several commits to setup new low/high
      bitdepth data path selection logic. This patch is for inverse
      transform. Let me summarize the ideas as following.
      
      - For low/high bitdepth selection, encoder depends on
        input configuration, e.g., video sequence bitdepth,
        profile. Decoder depends on input bitstream. This has
        nothing to do with compiler/build  configuration.
      
      - Typical encoder usage for sampling format 4:2:0.
        1) 8-bit video sequence:
         a) --profile=0
         Fastest encoding/decoding pipeline on speedup.
      
         b) --profile=2 --bit-depth=10
         Image pixels are left shifted by 2 bits. It
         employs 16-bit reference frame buffer and has high
         calculation precision. It usually enjoys higher
         compression performance.
      
        2) 10/12-bit video sequence (HDR):
         --profile=2 --bit-depth=10/12
      
      - Transform coefficient type:
        Lowbitdepth:  int16_t
        Highbitdepth: int32_t
      
      - The type, tran_low_t is still used in codebase,
        Which is int32_t, defining the data path capacity.
        Naturally, it is high bitdepth.
      
      Eventually we shall remove the configuration flags,
      CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
      low and high bitdepth data path. Two data paths co-exist
      in the same build environment.
      
      Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
      51281095
  21. 21 Jun, 2017 1 commit
  22. 20 Jun, 2017 3 commits
  23. 16 Jun, 2017 1 commit
  24. 13 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add fast path quantizer AVX2 · 2d44b697
      Yi Luo authored
      - Function level improves 36% against sse2.
      - Encoder speeds up 2.6% at user level on i7-6700.
      
      Change-Id: I9e43ce60b1e0de8f532249e5c035851463d75dbb
      2d44b697
  25. 09 Jun, 2017 1 commit
  26. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  27. 07 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add HBD data path for av1_block_error_avx2 · d61e608d
      Yi Luo authored
      - Add unit test for av1_block_error.
      - Fix av1_dist_block logic for calling av1_block_error.
      
      Change-Id: Id8a47ee113417360a29fc2334d9ca72b5793e2d7
      d61e608d
  28. 25 May, 2017 1 commit
    • Yi Luo's avatar
      Add HBD build to av1_quantize_fp_sse2 · bf8af7e6
      Yi Luo authored
      - This change turns on low bit depth data path for
        this function under default HBD build.
      - Encoder user level encoding time reduces ~12%
        on i7-6700.
      
      Change-Id: I7ce21e8db1a379f972e51c3b4ab305ca10e41efb
      bf8af7e6
  29. 20 May, 2017 1 commit
    • hui su's avatar
      DPCM intra coding experiment · b8a6fd6b
      hui su authored
      Encode a block line by line, horizontally or vertically. In the vertical
      mode, each row is predicted by the reconsturcted row above;
      in the horizontal mode, each column is predicted by the reconstructed
      column to the left.
      
      The DPCM modes are enabled automatically for blocks with horizontal or
      vertical prediction mode, and 1D transform types (ext-tx).
      
      Change-Id: I133ab6b537fa24a6e314ee1ef1d2fe9bd9d56c13
      b8a6fd6b
  30. 11 May, 2017 2 commits
    • Alex Converse's avatar
      Fix build with global motion disabled · ea166870
      Alex Converse authored
      Change-Id: I1c00925f83c6a858b0e799ddd90f241570a40575
      ea166870
    • David Barker's avatar
      Vectorize corner matching function · ee674323
      David Barker authored
      Add an SSE4 version of compute_cross_correlation() from
      corner_match.c. This function is about 3.4x the speed of
      the scalar code; determine_correspondence as a whole is about
      2.5-3x the speed it was previously.
      
      BUG=aomedia:487
      
      Change-Id: I707b7cfd5c513c025d3ee7fb6a5f1fa335ecd495
      ee674323
  31. 05 May, 2017 1 commit
  32. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6