1. 24 Feb, 2017 1 commit
    • Angie Chiang's avatar
      Let hbd conv func be flexible · 0a2c0cbc
      Angie Chiang authored
      This CL allow us to change filter coefficients easily for SIMD
      implementation of high bitdepth convolution functions
      
      Change-Id: I454a5c76d3ba9e4454118c6a9d87737b3aa24898
      0a2c0cbc
  2. 18 Feb, 2017 1 commit
  3. 13 Feb, 2017 1 commit
  4. 10 Feb, 2017 1 commit
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
  5. 12 Jan, 2017 2 commits
    • David Barker's avatar
      Add SSE2 vectorized warp filter for lowbd · d5dfa96e
      David Barker authored
      End-to-end speed improvements: (measured on tempete_cif.y4m,
      20 frames for encoder and all 260 frames for decoder)
      
      * GLOBAL_MOTION encoder: ~10% faster
      * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
      * WARPED_MOTION encoder: ~2.5% faster
      * WARPED_MOTION decoder: ~20-40% faster depending on bitrate
      
      The improvement in the GLOBAL_MOTION decoder is particularly
      large because its runtime is dominated by calls to warp_plane().
      
      This introduces minor changes to the output of the warp filter,
      but these should be rare.
      
      Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
      d5dfa96e
    • Yi Luo's avatar
      High bit depth 32x32 inverse DCT_DCT transform, AVX2 · 3bd83775
      Yi Luo authored
      - Witness the follow user-level speedup on AV1 baseline:
       Encoding time reduction: 4.26%
       Decoding time reduction: 25.35%
      
      Change-Id: Ideaf3cd473ad45ed9256c80d5a5daed0a6e098cf
      3bd83775
  6. 29 Nov, 2016 1 commit
    • Angie Chiang's avatar
      Add av1_convolve_init() · e067de00
      Angie Chiang authored
      Generate simd filter structure in av1_convolve_init()
      This will provide flexibility of changing filter coefficients.
      
      Change-Id: If79f84c56483aa08c894d6b12e2b6ce10147f0ce
      e067de00
  7. 07 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 77bba8d3
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      77bba8d3
  8. 04 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 09705fe7
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      09705fe7
  9. 01 Nov, 2016 3 commits
  10. 20 Oct, 2016 3 commits
  11. 19 Oct, 2016 2 commits
    • Steinar Midtskogen's avatar
      Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention · f250e20d
      Steinar Midtskogen authored
      Change-Id: Ia9adc46b8a4d08c5b8e0089ea1a1526df4f1e1dc
      f250e20d
    • Michael Bebenita's avatar
      Bit accounting. · 6048d052
      Michael Bebenita authored
      This patch adds bit account infrastructure to the bit reader API.
      When configured with --enable-accounting, every bit reader API
      function records the number of bits necessary to decoding a symbol.
      Accounting symbol entries are collected in global accounting data
      structure, that can be used to understand exactly where bits are
      spent (http://aomanalyzer.org). The data structure is cleared and
      reused each frame to reduce memory usage. When configured without
      --enable-accounting, bit accounting does not incur any runtime
      overhead.
      
      All aom_read_xxx functions now have an additional string parameter
      that specifies the symbol name. By default, the ACCT_STR macro is
      used (which expands to __func__). For more precise accounting,
      these should be replaced with more descriptive names.
      
      Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
      6048d052
  12. 18 Oct, 2016 1 commit
  13. 13 Oct, 2016 1 commit
  14. 10 Oct, 2016 1 commit
  15. 07 Oct, 2016 1 commit
  16. 06 Oct, 2016 1 commit
  17. 04 Oct, 2016 1 commit
  18. 03 Oct, 2016 1 commit
  19. 29 Sep, 2016 1 commit
    • Yue Chen's avatar
      Fix compiler error for GLOBAL_MOTION+WARPED_MOTION · 235133a2
      Yue Chen authored
      Fix the logical OR computation in .mk file. Otherwise, when both
      experiments are on, the output of $(filter... will be two 'yes',
      which will cause missing library issue.
      
      Change-Id: I53c44e925dc9ea77c7467217c20e4f1bc7e20fc3
      235133a2
  20. 19 Sep, 2016 1 commit
    • Alex Converse's avatar
      Move ANS to aom_dsp. · 1ac1ae73
      Alex Converse authored
      That's where it lives in aom/master.
      
      Change-Id: I38f405827d9c2d0b06ef5f3bfd7cadc35d5991ef
      1ac1ae73
  21. 08 Sep, 2016 1 commit
  22. 07 Sep, 2016 1 commit
    • Michael Bebenita's avatar
      Bit accounting. · e6b12944
      Michael Bebenita authored
      This patch adds bit account infrastructure to the bit reader API.
      When configured with --enable-accounting, every bit reader API
      function records the number of bits necessary to decoding a symbol.
      Accounting symbol entries are collected in global accounting data
      structure, that can be used to understand exactly where bits are
      spent (http://aomanalyzer.org). The data structure is cleared and
      reused each frame to reduce memory usage. When configured without
      --enable-accounting, bit accounting does not incur any runtime
      overhead.
      
      All aom_read_xxx functions now have an additional string parameter
      that specifies the symbol name. By default, the ACCT_STR macro is
      used (which expands to __func__). For more precise accounting,
      these should be replaced with more descriptive names.
      
      Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
      e6b12944
  23. 02 Sep, 2016 1 commit
  24. 01 Sep, 2016 2 commits
  25. 22 Jul, 2016 1 commit
  26. 15 Jul, 2016 1 commit
  27. 09 Jun, 2016 1 commit
  28. 11 May, 2016 1 commit
  29. 29 Mar, 2016 1 commit
  30. 25 Mar, 2016 2 commits
  31. 22 Mar, 2016 1 commit
    • Yaowu Xu's avatar
      vp10/ -> av1/ · cfea7dd7
      Yaowu Xu authored
      Change-Id: Ia055d03656ad1580447eced8687949583fdf4089
      cfea7dd7
  32. 16 Mar, 2016 1 commit
    • Nathan Egge's avatar
      Replace divides by small values with multiplies. · 03122298
      Nathan Egge authored
      This ports the OD_DIVU_SMALL code from Daala to AOM so that divides by
       constants smaller than OD_DIVU_DMAX (1024) are done using a multiply.
      Added a unit test for OD_DIVU_SMALL in test/divu_small_test.cc.
      
      Change-Id: Id9fee172d54477355571c5d6c12c584fb65769e5
      03122298