1. 30 Mar, 2017 1 commit
  2. 29 Mar, 2017 2 commits
  3. 23 Mar, 2017 1 commit
    • Jean-Marc Valin's avatar
      Do real chroma RDO search for CDEF · e9f77424
      Jean-Marc Valin authored
      Chroma now has a list of strenghts too, with the superblock signalling
      shared between luma and chroma.
      
      low-latency, cpu=4:
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      -0.0114 | -1.4626 | -1.4745 |  -0.0423 | 0.0430 | -0.0001 |    -0.7416
      
      Change-Id: I389c77f1d80020f810e45f8502c656ad9d397c8c
      e9f77424
  4. 21 Mar, 2017 2 commits
  5. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  6. 27 Feb, 2017 1 commit
  7. 18 Feb, 2017 1 commit
  8. 10 Feb, 2017 1 commit
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of t...
      4f0b3ed8
  9. 08 Feb, 2017 1 commit
  10. 27 Jan, 2017 1 commit
    • Johann's avatar
      highbitdepth + loop restoration: fix build on x86 32 bit · cda0b5e4
      Johann authored
      When the functions were added in
      https://aomedia-review.googlesource.com/6545 they were not restricted to
      x86_64 builds.
      
      Fixes "undefined reference to
      `aom_highbd_convolve8_add_src_sse2'" for --target=x86-linux-gcc
      
      Also remove SSE2 specializations from
      `aom_highbd_convolve8_add_src[_horiz/_vert]`, since those functions
      don't actually have SSE2 versions (this was left in by accident
      in the original patch).
      
      Change-Id: I9f7d0c11b58b6f5a0e6a1fdaed0f92175bdeab34
      cda0b5e4
  11. 24 Jan, 2017 1 commit
  12. 07 Jan, 2017 2 commits
  13. 03 Jan, 2017 1 commit
    • David Barker's avatar
      Add new convolve variant for loop-restoration · be6cc07d
      David Barker authored
      The convolve filters generated by loop_wiener_filter_tile
      are not compatible with some existing convolve implementations
      (they can have coefficients >128, sums of (certain subsets of)
      coefficients >128, etc.)
      
      So we implement a new variant, which takes a filter with 128
      subtracted from its central element and which adds an extra copy
      of the source just before clipping to a pixel (reinstating the
      128 we subtracted). This should be easy to adapt from the existing
      convolve functions, and this patch includes SSE2 highbd and
      SSSE3 lowbd implementations.
      
      Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
      be6cc07d
  14. 20 Dec, 2016 2 commits
  15. 15 Dec, 2016 1 commit
  16. 09 Dec, 2016 1 commit
    • Yi Luo's avatar
      High bit depth motion search SAD optimization on avx2 · e9832584
      Yi Luo authored
      - For all blocks with width >= 16.
      - Add test_count to make the unit tests harder to pass.
      - Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
        User level time reduction:
         baseline:                  3.68%
         baseline + ext-partition: 36.12%
      
      Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
      e9832584
  17. 07 Dec, 2016 1 commit
  18. 02 Dec, 2016 1 commit
    • Jingning Han's avatar
      Enable 2x2 intra prediction · 7833d2bf
      Jingning Han authored
      Bring 2x2 intra prediction online for chroma components.
      
      Change-Id: Ia56af9101b2a977691bca4156a6dcf89e644b4a7
      7833d2bf
  19. 28 Nov, 2016 2 commits
    • Yi Luo's avatar
      SAD avg and 4D avx2 optimization for ext-partition · 9e218747
      Yi Luo authored
      - User level time reduction <1% on i7-6700 cpu
      
      Change-Id: I8f15bde07dddd938df0b065e20ae94109e7b3b5b
      9e218747
    • Urvang Joshi's avatar
      Add a new intra prediction mode "smooth". · 6be4a54b
      Urvang Joshi authored
      This is added as part of ALT_INTRA experiment.
      
      This uses interpolation between top row and estimated bottom row; as
      well as left column and estimated right column to generate the
      predicted block.The interpolation is done using a predefined weight
      array.
      
      Based on experiments, the currently chosen weight array was created
      to represent a quadratic curve, but can be tuned further if needed.
      
      Improvement from baseline on Derf set:
      ALL Keyframes: 1.279%
      
      Improvement from existing ALT_INTRA:
      ALL Keyframes: 1.146%
      
      Change-Id: I12637fa1b91bd836f1c59b27d6caee2004acbdd4
      6be4a54b
  20. 21 Nov, 2016 1 commit
  21. 10 Nov, 2016 1 commit
  22. 09 Nov, 2016 1 commit
  23. 07 Nov, 2016 1 commit
  24. 04 Nov, 2016 1 commit
    • Yushin Cho's avatar
      New experiment: Perceptual Vector Quantization from Daala · 09705fe7
      Yushin Cho authored
      PVQ replaces the scalar quantizer and coefficient coding with a new
      design originally developed in Daala. It currently depends on the
      Daala entropy coder although it could be adapted to work with another
      entropy coder if needed:
      ./configure --enable-experimental --enable-daala_ec --enable-pvq
      
      The version of PVQ in this commit is adapted from the following
      revision of Daala:
      https://github.com/xiph/daala/commit/fb51c1ade6a31b668a0157d89de8f0a4493162a8
      
      More information about PVQ:
      - https://people.xiph.org/~jm/daala/pvq_demo/
      - https://jmvalin.ca/papers/spie_pvq.pdf
      
      The following files are copied as-is from Daala with minimal
      adaptations, therefore we disable clang-format on those files
      to make it easier to synchronize the AV1 and Daala codebases in the future:
       av1/common/generic_code.c
       av1/common/generic_code.h
       av1/common/laplace_tables.c
       av1/common/partition.c
       av1/common/partition.h
       av1/common/pvq.c
       av1/common/pvq.h
       av1/common/state.c
       av1/common/state.h
       av1/common/zigzag.h
       av1/common/zigzag16.c
       av1/common/zigzag32.c
       av1/common/zigzag4.c
       av1/common/zigzag64.c
       av1/common/zigzag8.c
       av1/decoder/decint.h
       av1/decoder/generic_decoder.c
       av1/decoder/laplace_decoder.c
       av1/decoder/pvq_decoder.c
       av1/decoder/pvq_decoder.h
       av1/encoder/daala_compat_enc.c
       av1/encoder/encint.h
       av1/encoder/generic_encoder.c
       av1/encoder/laplace_encoder.c
       av1/encoder/pvq_encoder.c
       av1/encoder/pvq_encoder.h
      
      Known issues:
      - Lossless mode is not supported, '--lossless=1' will give the same result as
      '--end-usage=q --cq-level=1'.
      - High bit depth is not supported by PVQ.
      
      Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
      09705fe7
  25. 28 Oct, 2016 1 commit
  26. 26 Oct, 2016 1 commit
  27. 19 Oct, 2016 1 commit
  28. 14 Oct, 2016 1 commit
  29. 13 Oct, 2016 2 commits
    • Yue Chen's avatar
      Renamings for OBMC experiment · cb60b185
      Yue Chen authored
      To get ready for pulling AV1 to nextgenv2
      Replace the experimental flag by MOTION_VAR. Rename major variables.
      
      Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
      cb60b185
    • Jingning Han's avatar
      Sync 2x2 intra predictors · e3954d83
      Jingning Han authored
      Add 2x2 DC, V, H, TM intra predictors.
      
      Change-Id: I2a614adde553f821c45bc5a9bf09800a9f0aaa26
      e3954d83
  30. 12 Oct, 2016 3 commits
    • Yi Luo's avatar
      Hybrid forward transform 32x32 AVX2 optimization · fed8e1c0
      Yi Luo authored
      - av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
      
      - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
        But function replacement must go with the corresponding inverse txfm.
      
      - No obvious user level time reduction due to 32x32 TX_TYPE selection.
      
      - Zero high 128b YMM to avoid AVX-SSE transition penalties
        (fix 16x16 case).
      
      - Added 32x32 AVX2 unit tests to verify bitexact.
      
      - AVX2 optimization summary:
        On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
        C to AVX2: function level time reduction, ~86-89%.
        SSE2 to AVX2: function level time reduction, ~51%.
      
      Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
      fed8e1c0
    • Yaowu Xu's avatar
      port changes on lpf from libvpx/nextgenv2 · 57ad0a05
      Yaowu Xu authored
      Manually cherry-picked the following commits:
      4b5e462d Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      3ea537c0 lpf_8_test: remove unneeded function wrapper
      110d3778 remove loopfilter 'count' param TODOs
      9b44d9d0 split vpx_highbd_lpf_horizontal_16 in two
      1b519fb6 split vpx_lpf_horizontal_16 in two
      e7a23d70 vpx_highbd_lpf_horizontal_4: remove unused count param
      51718573 vpx_highbd_lpf_horizontal_8: remove unused count param
      3c1019e4 vpx_highbd_lpf_vertical_4: remove unused count param
      72a9f06a vpx_highbd_lpf_vertical_8: remove unused count param
      b1e97c6a vpx_lpf_horizontal_4: remove unused count param
       ab25e46pgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      bd5a5bb5 vpx_lpf_horizontal_8: remove unused count param
      109a47b3 vpx_lpf_vertical_4: remove unused count param
      37225744 vpx_lpf_vertical_8: remove unused count param
      47dee375 lpf_8_test: add missing dspr2 tests
      4fec4a8e lpf_8_test: add missing vpx_lpf_horizontal_4 tests
      c3f2c8ad lpf_8_test: add missing vpx_lpf_vertical_4 tests
      45a7b5eb lpf_8_test: simplify function wrapper generation
      
      Change-Id: I0e9212497bbf30de37b19cd2d6ea63b505abe06d
      57ad0a05
    • Yaowu Xu's avatar
      minor updates · f36d0b46
      Yaowu Xu authored
      1. vp8->aom
      2. removed no-effect statements and spaces
      
      Change-Id: I367d05ff9bf1b9f3c71c517c45d8049d9d4236ec
      f36d0b46
  31. 10 Oct, 2016 2 commits