1. 12 Jul, 2017 1 commit
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT4 experiment. · 02078a38
      Monty Montgomery authored
      This experiment replaces the 4-point Type-II scaled-output vp9 DCT
       transform with the 4-point Type-II orthonormal Daala DCT transform.
      Right now the CONFIG_DAALA_DCT4 experiment depends on CONFIG_DCT_ONLY
       as it does not add an orthonormal 4-point DST.
      
      subset-1:
      
      monty-baseline-dctonly-squaretx-subset1 ->
        monty-dct4-dctonly-squaretx-subset1-rerun
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0055 | -0.0132 | -0.0405 |   0.0261 | 0.0005 |  0.0246 |     0.0226
      
      objective-1-fast:
      
      monty-baseline-dctonly-squaretx-o1f ->
        monty-dct4-dctonly-squaretx-o1f
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0215 | -0.1573 |     N/A |  -0.0131 | -0.0347 | -0.0390 |    -0.1121
      
      Change-Id: Idef8f6e5525037d5bbb2d0927675c21d1922d69a
      02078a38
  2. 27 Jun, 2017 1 commit
    • Yi Luo's avatar
      Fix inv txfm low/high bitdepth selection logic · 51281095
      Yi Luo authored
      We are going to have several commits to setup new low/high
      bitdepth data path selection logic. This patch is for inverse
      transform. Let me summarize the ideas as following.
      
      - For low/high bitdepth selection, encoder depends on
        input configuration, e.g., video sequence bitdepth,
        profile. Decoder depends on input bitstream. This has
        nothing to do with compiler/build  configuration.
      
      - Typical encoder usage for sampling format 4:2:0.
        1) 8-bit video sequence:
         a) --profile=0
         Fastest encoding/decoding pipeline on speedup.
      
         b) --profile=2 --bit-depth=10
         Image pixels are left shifted by 2 bits. It
         employs 16-bit reference frame buffer and has high
         calculation precision. It usually enjoys higher
         compression performance.
      
        2) 10/12-bit video sequence (HDR):
         --profile=2 --bit-depth=10/12
      
      - Transform coefficient type:
        Lowbitdepth:  int16_t
        Highbitdepth: int32_t
      
      - The type, tran_low_t is still used in codebase,
        Which is int32_t, defining the data path capacity.
        Naturally, it is high bitdepth.
      
      Eventually we shall remove the configuration flags,
      CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
      low and high bitdepth data path. Two data paths co-exist
      in the same build environment.
      
      Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
      51281095
  3. 09 Jun, 2017 1 commit
  4. 25 May, 2017 1 commit
  5. 18 May, 2017 1 commit
    • Sarah Parker's avatar
      Refactor hbd txfm configurations to be 1D · eec47e65
      Sarah Parker authored
      The hbd transform configurations were originally written for all possible
      2d transforms. Now that there are many more possible 2d transforms
      due to EXT_TX and RECT_TX, it is simpler to write the cfg for the
      4 1D transform types and compose them to make all new possible transform
      types. This will allow for an easier integration of the identity transform
      for EXT_TX and rectangular transforms for RECT_TX into the current
      hbd transform codepath and facilitate the removal of obsolete transforms.
      This has no impact on performance.
      
      BUG=aomedia:524
      
      Change-Id: I1e217bcd217fd637b1df94fae62d9c59a0523c1a
      eec47e65
  6. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  7. 24 Apr, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Custom block-level DC_PRED · f8164157
      Luc Trudeau authored
      Adds the CfL experiment flag and computes a block-level DC_PRED that is
      required by CfL in order to compute alpha_cb and alpha_cr.
      
      The rate-distorsion impact of computing DC_PRED at the prediction block level
      for chroma planes is rather small
      
      Subset 1:
      master_no_cdef@2017-04-18T20:37:05.712Z
        -> block_DCPRED_no_cdef@2017-04-18T20:38:07.381
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0712 |  0.0337 | -0.1692 |   0.0693 | 0.0814 |  0.0710 |    -0.0063
      Note: CDEF was disabled because of problematic asserts.
      
      Change-Id: I44d1cde8605b108366f4bd4cedbf5159dbbb5880
      f8164157
  8. 21 Apr, 2017 1 commit
  9. 12 Apr, 2017 1 commit
  10. 10 Apr, 2017 1 commit
    • Fergus Simpson's avatar
      frame-superres: Move resize from encoder to common · d0565006
      Fergus Simpson authored
      The resizing functions in resize.h and resize.c are useful for the
      frame super-res experiment. These functions will be needed in both the
      encoder and decoder, so the files have been moved into av1/common.
      
      Change-Id: I66154b7ec0eade0df460c4f4cf8eaa5f663c8904
      d0565006
  11. 06 Apr, 2017 1 commit
  12. 05 Apr, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF: Add damping to dering · 8ff52fcc
      Steinar Midtskogen authored
      high-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1650 |  0.2545 |  0.2977 |  -0.0423 | -0.0947 | -0.0725 |    -0.0365
      
      low-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.4006 |  0.0501 | -0.0108 |  -0.1790 | -0.1660 | -0.1992 |    -0.2135
      
      low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.5508 | -0.2445 | -0.2762 |  -0.1981 | -0.2878 | -0.2228 |    -0.3733
      
      Change-Id: Ia20df28c8bbb6182215b02016053af33bd498145
      8ff52fcc
  13. 01 Apr, 2017 1 commit
  14. 29 Mar, 2017 1 commit
  15. 22 Mar, 2017 1 commit
  16. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  17. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  18. 01 Mar, 2017 2 commits
  19. 28 Feb, 2017 1 commit
    • Michael Bebenita's avatar
      Add SIMD code for PVQ search · 3a88de8f
      Michael Bebenita authored
      This reduces the runtime profile of pvq_search_rdo_double from 37%
      to 15% and improves overall encoding speed when PVQ is enabled by ~40%.
      The SIMD code is not bit accurate with the C version and introduces a
      slight PSNR regression on AWCY:
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
      0.0607 |  0.1044 |     N/A |   0.0126 |  N/A | -0.0309 |        N/A
      
      Change-Id: Ie22cebc62df2e72618305f2268668d79167860c6
      3a88de8f
  20. 24 Feb, 2017 2 commits
    • Angie Chiang's avatar
      Add txb_common.h · 971a5963
      Angie Chiang authored
      This file includes common context generating functions of lv_map.
      
      Change-Id: I7aea78e48cd5003738445b5635120cbc3825ef05
      971a5963
    • Angie Chiang's avatar
      Let hbd conv func be flexible · 0a2c0cbc
      Angie Chiang authored
      This CL allow us to change filter coefficients easily for SIMD
      implementation of high bitdepth convolution functions
      
      Change-Id: I454a5c76d3ba9e4454118c6a9d87737b3aa24898
      0a2c0cbc
  21. 18 Feb, 2017 1 commit
  22. 13 Feb, 2017 1 commit
  23. 10 Feb, 2017 1 commit
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
  24. 12 Jan, 2017 2 commits
    • David Barker's avatar
      Add SSE2 vectorized warp filter for lowbd · d5dfa96e
      David Barker authored
      End-to-end speed improvements: (measured on tempete_cif.y4m,
      20 frames for encoder and all 260 frames for decoder)
      
      * GLOBAL_MOTION encoder: ~10% faster
      * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
      * WARPED_MOTION encoder: ~2.5% faster
      * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
      
      The improvement in the GLOBAL_MOTION decoder is particularly
      large because its runtime is dominated by calls to warp_plane().
      
      This introduces minor changes to the output of the warp filter,
      but these should be rare.
      
      Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
      d5dfa96e
    • Yi Luo's avatar
      High bit depth 32x32 inverse DCT_DCT transform, AVX2 · 3bd83775
      Yi Luo authored
      - Witness the follow user-level speedup on AV1 baseline:
       Encoding time reduction: 4.26%
       Decoding time reduction: 25.35%
      
      Change-Id: Ideaf3cd473ad45ed9256c80d5a5daed0a6e098cf
      3bd83775
  25. 29 Nov, 2016 1 commit
    • Angie Chiang's avatar
      Add av1_convolve_init() · e067de00
      Angie Chiang authored
      Generate simd filter structure in av1_convolve_init()
      This will provide flexibility of changing filter coefficients.
      
      Change-Id: If79f84c56483aa08c894d6b12e2b6ce10147f0ce
      e067de00
  26. 07 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 77bba8d3
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      77bba8d3
  27. 04 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 09705fe7
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      09705fe7
  28. 01 Nov, 2016 3 commits
  29. 20 Oct, 2016 3 commits
  30. 19 Oct, 2016 2 commits
    • Steinar Midtskogen's avatar
      Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention · f250e20d
      Steinar Midtskogen authored
      Change-Id: Ia9adc46b8a4d08c5b8e0089ea1a1526df4f1e1dc
      f250e20d
    • Michael Bebenita's avatar
      Bit accounting. · 6048d052
      Michael Bebenita authored
      This patch adds bit account infrastructure to the bit reader API.
      When configured with --enable-accounting, every bit reader API
      function records the number of bits necessary to decoding a symbol.
      Accounting symbol entries are collected in global accounting data
      structure, that can be used to understand exactly where bits are
      spent (http://aomanalyzer.org). The data structure is cleared and
      reused each frame to reduce memory usage. When configured without
      --enable-accounting, bit accounting does not incur any runtime
      overhead.
      
      All aom_read_xxx functions now have an additional string parameter
      that specifies the symbol name. By default, the ACCT_STR macro is
      used (which expands to __func__). For more precise accounting,
      these should be replaced with more descriptive names.
      
      Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
      6048d052
  31. 18 Oct, 2016 1 commit
  32. 13 Oct, 2016 1 commit