1. 12 Apr, 2017 1 commit
  2. 10 Apr, 2017 1 commit
    • Fergus Simpson's avatar
      frame-superres: Move resize from encoder to common · d0565006
      Fergus Simpson authored
      The resizing functions in resize.h and resize.c are useful for the
      frame super-res experiment. These functions will be needed in both the
      encoder and decoder, so the files have been moved into av1/common.
      
      Change-Id: I66154b7ec0eade0df460c4f4cf8eaa5f663c8904
      d0565006
  3. 06 Apr, 2017 1 commit
  4. 05 Apr, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF: Add damping to dering · 8ff52fcc
      Steinar Midtskogen authored
      high-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1650 |  0.2545 |  0.2977 |  -0.0423 | -0.0947 | -0.0725 |    -0.0365
      
      low-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.4006 |  0.0501 | -0.0108 |  -0.1790 | -0.1660 | -0.1992 |    -0.2135
      
      low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.5508 | -0.2445 | -0.2762 |  -0.1981 | -0.2878 | -0.2228 |    -0.3733
      
      Change-Id: Ia20df28c8bbb6182215b02016053af33bd498145
      8ff52fcc
  5. 01 Apr, 2017 1 commit
  6. 29 Mar, 2017 1 commit
  7. 22 Mar, 2017 1 commit
  8. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  9. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  10. 01 Mar, 2017 2 commits
  11. 28 Feb, 2017 1 commit
    • Michael Bebenita's avatar
      Add SIMD code for PVQ search · 3a88de8f
      Michael Bebenita authored
      This reduces the runtime profile of pvq_search_rdo_double from 37%
      to 15% and improves overall encoding speed when PVQ is enabled by ~40%.
      The SIMD code is not bit accurate with the C version and introduces a
      slight PSNR regression on AWCY:
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
      0.0607 |  0.1044 |     N/A |   0.0126 |  N/A | -0.0309 |        N/A
      
      Change-Id: Ie22cebc62df2e72618305f2268668d79167860c6
      3a88de8f
  12. 24 Feb, 2017 2 commits
    • Angie Chiang's avatar
      Add txb_common.h · 971a5963
      Angie Chiang authored
      This file includes common context generating functions of lv_map.
      
      Change-Id: I7aea78e48cd5003738445b5635120cbc3825ef05
      971a5963
    • Angie Chiang's avatar
      Let hbd conv func be flexible · 0a2c0cbc
      Angie Chiang authored
      This CL allow us to change filter coefficients easily for SIMD
      implementation of high bitdepth convolution functions
      
      Change-Id: I454a5c76d3ba9e4454118c6a9d87737b3aa24898
      0a2c0cbc
  13. 18 Feb, 2017 1 commit
  14. 13 Feb, 2017 1 commit
  15. 10 Feb, 2017 1 commit
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
  16. 12 Jan, 2017 2 commits
    • David Barker's avatar
      Add SSE2 vectorized warp filter for lowbd · d5dfa96e
      David Barker authored
      End-to-end speed improvements: (measured on tempete_cif.y4m,
      20 frames for encoder and all 260 frames for decoder)
      
      * GLOBAL_MOTION encoder: ~10% faster
      * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
      * WARPED_MOTION encoder: ~2.5% faster
      * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
      
      The improvement in the GLOBAL_MOTION decoder is particularly
      large because its runtime is dominated by calls to warp_plane().
      
      This introduces minor changes to the output of the warp filter,
      but these should be rare.
      
      Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
      d5dfa96e
    • Yi Luo's avatar
      High bit depth 32x32 inverse DCT_DCT transform, AVX2 · 3bd83775
      Yi Luo authored
      - Witness the follow user-level speedup on AV1 baseline:
       Encoding time reduction: 4.26%
       Decoding time reduction: 25.35%
      
      Change-Id: Ideaf3cd473ad45ed9256c80d5a5daed0a6e098cf
      3bd83775
  17. 29 Nov, 2016 1 commit
    • Angie Chiang's avatar
      Add av1_convolve_init() · e067de00
      Angie Chiang authored
      Generate simd filter structure in av1_convolve_init()
      This will provide flexibility of changing filter coefficients.
      
      Change-Id: If79f84c56483aa08c894d6b12e2b6ce10147f0ce
      e067de00
  18. 07 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 77bba8d3
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      77bba8d3
  19. 04 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 09705fe7
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      09705fe7
  20. 01 Nov, 2016 3 commits
  21. 20 Oct, 2016 3 commits
  22. 19 Oct, 2016 2 commits
    • Steinar Midtskogen's avatar
      Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention · f250e20d
      Steinar Midtskogen authored
      Change-Id: Ia9adc46b8a4d08c5b8e0089ea1a1526df4f1e1dc
      f250e20d
    • Michael Bebenita's avatar
      Bit accounting. · 6048d052
      Michael Bebenita authored
      This patch adds bit account infrastructure to the bit reader API.
      When configured with --enable-accounting, every bit reader API
      function records the number of bits necessary to decoding a symbol.
      Accounting symbol entries are collected in global accounting data
      structure, that can be used to understand exactly where bits are
      spent (http://aomanalyzer.org). The data structure is cleared and
      reused each frame to reduce memory usage. When configured without
      --enable-accounting, bit accounting does not incur any runtime
      overhead.
      
      All aom_read_xxx functions now have an additional string parameter
      that specifies the symbol name. By default, the ACCT_STR macro is
      used (which expands to __func__). For more precise accounting,
      these should be replaced with more descriptive names.
      
      Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
      6048d052
  23. 18 Oct, 2016 1 commit
  24. 13 Oct, 2016 1 commit
  25. 10 Oct, 2016 1 commit
  26. 07 Oct, 2016 1 commit
  27. 06 Oct, 2016 1 commit
  28. 04 Oct, 2016 1 commit
  29. 03 Oct, 2016 1 commit
  30. 29 Sep, 2016 1 commit
    • Yue Chen's avatar
      Fix compiler error for GLOBAL_MOTION+WARPED_MOTION · 235133a2
      Yue Chen authored
      Fix the logical OR computation in .mk file. Otherwise, when both
      experiments are on, the output of $(filter... will be two 'yes',
      which will cause missing library issue.
      
      Change-Id: I53c44e925dc9ea77c7467217c20e4f1bc7e20fc3
      235133a2
  31. 19 Sep, 2016 1 commit
    • Alex Converse's avatar
      Move ANS to aom_dsp. · 1ac1ae73
      Alex Converse authored
      That's where it lives in aom/master.
      
      Change-Id: I38f405827d9c2d0b06ef5f3bfd7cadc35d5991ef
      1ac1ae73
  32. 08 Sep, 2016 1 commit