1. 30 May, 2017 1 commit
  2. 26 May, 2017 1 commit
    • David Barker's avatar
      ext-inter: Vectorize new masked SAD/SSE functions · 0aa39ff0
      David Barker authored
      We would expect that these new functions would be slower than
      the old masked SAD/SSE functions, as they do additional work
      (blending two inputs and comparing to a third, rather than
      just comparing two inputs).
      
      This is true for the SAD functions, which are about 50% slower
      (depending on block size and bit depth). However, the sub-pixel
      SSE functions are comparable to the old speed for the accelerated
      special cases (xoffset or yoffset = 0 or 4), and are
      between 40-90% faster for the generic case.
      
      Change-Id: I1a296ed8fc9e3edc313a6add516ff76b17cd3e9f
      0aa39ff0
  3. 24 May, 2017 1 commit
    • David Barker's avatar
      ext-inter: Further cleanup · f19f35f7
      David Barker authored
      * Rename the 'masked_compound_*' functions to just 'masked_*'.
        The previous names were intended to be temporary, to distinguish
        the old and new masked motion search pipelines. But now that the
        old pipeline has been removed, we can reuse the old names.
      
      * Simplify the new ext-inter compound motion search pipeline
        a bit.
      
      * Harmonize names: Rename
        aom_highbd_masked_compound_sub_pixel_variance* to
        aom_highbd_8_masked_sub_pixel_variance*, to match the naming of
        the corresponding non-masked functions
      
      Change-Id: I988768ffe2f42a942405b7d8e93a2757a012dca3
      f19f35f7
  4. 23 May, 2017 2 commits
    • David Barker's avatar
      Vectorize high-precision convolve filter · 5d34e6a7
      David Barker authored
      Add SSE2 lowbd and SSSE3 highbd versions of the filters
      introduced in https://aomedia-review.googlesource.com/c/11962/ .
      
      These filters are equivalent in speed to the SSE2 implementations
      of the regular convolve filter. The average time to filter a
      64x64 block is:
      
      lowbd C: 52us
      lowbd SSE2: 5.6us
      highbd C: 53us
      highbd SSSE3: 5.8us
      
      Also add a correctness test based on the warp filter tests.
      
      Change-Id: Ia0d81100e8a414bbfc2b5f664d751cf24765299e
      5d34e6a7
    • David Barker's avatar
      ext-inter: Delete dead code · 0f3c94e1
      David Barker authored
      Patches https://aomedia-review.googlesource.com/c/11987/
      and https://aomedia-review.googlesource.com/c/11988/
      replaced the old masked motion search pipeline with
      a new one which uses different SAD/SSE functions.
      This resulted in a lot of dead code.
      
      This patch removes the now-dead code. Note that this
      includes vectorized SAD/SSE functions, which will need
      to be rewritten at some point for the new pipeline. It
      also includes the masked_compound_variance_* functions
      since these turned out not to be used by the new pipeline.
      
      To help with the later addition of vectorized functions, the
      masked_sad/variance_test.cc files are kept but are modified
      to work with the new functions. The tests are then disabled
      until we actually have the vectorized functions.
      
      Change-Id: I61b686abd14bba5280bed94e1be62eb74ea23d89
      0f3c94e1
  5. 22 May, 2017 1 commit
  6. 18 May, 2017 1 commit
    • David Barker's avatar
      ext-inter: Use joint_motion_search for masked compounds · c155e018
      David Barker authored
      Add functions which take both components of a masked compound and
      compute the resulting SAD/SSE. Extend joint_motion_search to understand
      masked compounds, and use it to evaluate NEW_NEWMV modes.
      
      Change-Id: I782199a20d119a6c61c6567df157508125ac7ce7
      c155e018
  7. 15 May, 2017 2 commits
    • Debargha Mukherjee's avatar
      Experimental high precision convolve for Wiener · 28d15c71
      Debargha Mukherjee authored
      Improves coding efficiency.
      
      Change-Id: I7bb12190cdc4581097809a020355cdc8867fc1ad
      28d15c71
    • Ralph Giles's avatar
      Remove armv6 media-extension assembly. · be111b38
      Ralph Giles authored
      Libvpx dropped armv6 support sometime after the aom fork.
      
      We don't intend to support this platform, which is likely
      too slow in any case. Remove the assembly and intrinsics
      optimized routines, their tests, cpu feature detection,
      and rtcd specialization for this instruction set extension.
      
      Change-Id: If44ec28e5ddafc6af179c5d1982ac7e81fe54d5e
      be111b38
  8. 11 May, 2017 1 commit
    • Yi Luo's avatar
      Partial IDCT 32x32 avx2 · 40f22ef8
      Yi Luo authored
      - Function level improvement (ms):
      Functions       ssse3  avx2   Percentage
      idct32x32_1024  794    374    52.9%
      idct32x32_135   354    169    52.2%
      idct32x32_34    197    142    27.9%
      idct32x32_1     n/a     26    n/a
      
      - Integrating in default scan order.
      
      Change-Id: I84815112b26b8a8cb800281a1cfb1706342af57d
      40f22ef8
  9. 08 May, 2017 2 commits
    • Yi Luo's avatar
      Partial IDCT 16x16 avx2 · f6176abb
      Yi Luo authored
      - Function level improvement:
      functions      sse2  avx2  percentage
      idct16x16_256  365   226   38%
      idct16x16_38   n/a   136   n/a
      idct16x16_10   171   110   35%
      idct16x16_1     34    26   23%
      
      - Integrated in AV1 for default scan order.
      
      Change-Id: Ieb1a8e730bea9c371ebc0e5f4a748640d8f5e921
      f6176abb
    • Urvang Joshi's avatar
      Add a new experiment SMOOTH_HV. · e6ca8e83
      Urvang Joshi authored
      This experiment extends ALT_INTRA by adding two new modes:
      smooth horizontal and smooth vertical.
      
      Improvement on *intra frames* in BDRate (PSNR):
      ===============================================
      
      AWCY (high latency): -0.46%
      (Also, -1.0% or more on PSNR Cb,Cr and APSNR Cb,Cr).
      
      AWCY (low latency): -0.43%
      (Also, -0.88% to -0.94% on PSNR Cb,Cr and APSNR Cb,Cr).
      
      Google sets:
      lowres: -0.454
      midres: -0.484
      hdres:  -0.525
      
      Improvement on *video overall* in BDRate (PSNR):
      ================================================
      
      AWCY (high latency): -0.15%
      
      Google sets:
      lowres: -0.085
      midres: -0.079
      
      Change-Id: I9f4e7c1b8ded1fe244c72838f336103ccc715d50
      e6ca8e83
  10. 27 Apr, 2017 1 commit
    • Frederic Barbier's avatar
      Cleanup dead high-bitdepth inverse-tx functions · 4fc8df67
      Frederic Barbier authored
      This patch removes dead code and prevents future implementations
      to rely on obsolete transforms. Future optimizations and tests should
      be based on latest C-functions (av1/common/av1_inv_txfm1d.c)
      
      Cleanup related last unit-test callers.
      BUG=aomedia:442
      
      Change-Id: I24953cc1baf30dd7b720df8a72dd91b356b74cad
      4fc8df67
  11. 26 Apr, 2017 1 commit
    • Yi Luo's avatar
      Update partial inverse DCT according to VP9 · 3fcb356e
      Yi Luo authored
      - Partial inverse DCT unit tests have been enhanced.
      - IDCT x86_64 assembly code has been removed.
      
      Change-Id: Ic3bed2c0e70abdfd642a4f74fa969cc672d4795f
      3fcb356e
  12. 25 Apr, 2017 2 commits
    • James Zern's avatar
      remove remaining refs to aom_highbd_idct8x8_64_add · 4a2e3b2d
      James Zern authored
      fixes high-bitdepth build:
      ./libaom.a(aom_dsp_rtcd.c.o): In function `setup_rtcd_internal':
        ./aom_dsp_rtcd.h:2614: undefined reference to
        `aom_highbd_idct8x8_64_add_c'
      
      missed in:
      c756e4d0 Cleanup dead high-bitdepth inverse-tx functions
      
      BUG=aomedia:442
      
      Change-Id: I63ee6fc5dbf85fd48efd9ff721868df6fb05eb09
      4a2e3b2d
    • Urvang Joshi's avatar
      Intra prediction: Remove unused variants. · c3bcf3be
      Urvang Joshi authored
      Directional predictors for 45, 63 and 207 angle had 2 or 3 variants
      each, and only one of them was actually being used. So, removed the
      C, sse2, ssse3 and neon versions of the unused ones.
      
      Updates to the test:
      - test_intra_pred_speed was testing the unused versions, so changed
        it to use the version actually used by code. This meant updating
        some golden MD5 values.
      - test_intra_pred_speed was NOT filling up bottom-left and top-right
        pixels randomly, so the predictors using these pixels weren't tested
        properly. This was fixed.
      
      BUG=aomedia:442
      
      Change-Id: I09725d593408b81e0cd636e70a88c28eea5f2222
      c3bcf3be
  13. 24 Apr, 2017 1 commit
  14. 20 Apr, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Drop support for CONFIG_EMULATE_HARDWARE · c6a48a25
      Sebastien Alaiwan authored
      This experiment complexifies DSP function dispatch, without bringing
      any real value (it's non-normative arbitrary behaviour).
      Moreover, it only has an effect on obsolete transforms, the new ones
      don't implement this mechanism.
      
      Change-Id: Idaccdd0c14ed6b7008cd4f365c7f017ba8ccacf5
      c6a48a25
  15. 12 Apr, 2017 1 commit
  16. 03 Apr, 2017 1 commit
  17. 31 Mar, 2017 1 commit
    • Urvang Joshi's avatar
      RTCD defs: Remove empty specialize statements once and for all. · 5ddac0aa
      Urvang Joshi authored
      A similar cleanup happened before, but the empty statements have since
      reappeared. I added a check in 'specialize' subroutine to die whenever
      such an empty specialize call is found, so that config+make would fail.
      
      Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9
      5ddac0aa
  18. 30 Mar, 2017 2 commits
  19. 29 Mar, 2017 2 commits
  20. 23 Mar, 2017 1 commit
    • Jean-Marc Valin's avatar
      Do real chroma RDO search for CDEF · e9f77424
      Jean-Marc Valin authored
      Chroma now has a list of strenghts too, with the superblock signalling
      shared between luma and chroma.
      
      low-latency, cpu=4:
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      -0.0114 | -1.4626 | -1.4745 |  -0.0423 | 0.0430 | -0.0001 |    -0.7416
      
      Change-Id: I389c77f1d80020f810e45f8502c656ad9d397c8c
      e9f77424
  21. 21 Mar, 2017 2 commits
  22. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  23. 27 Feb, 2017 1 commit
  24. 18 Feb, 2017 1 commit
  25. 10 Feb, 2017 1 commit
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
  26. 08 Feb, 2017 1 commit
  27. 27 Jan, 2017 1 commit
    • Johann's avatar
      highbitdepth + loop restoration: fix build on x86 32 bit · cda0b5e4
      Johann authored
      When the functions were added in
      https://aomedia-review.googlesource.com/6545 they were not restricted to
      x86_64 builds.
      
      Fixes "undefined reference to
      `aom_highbd_convolve8_add_src_sse2'" for --target=x86-linux-gcc
      
      Also remove SSE2 specializations from
      `aom_highbd_convolve8_add_src[_horiz/_vert]`, since those functions
      don't actually have SSE2 versions (this was left in by accident
      in the original patch).
      
      Change-Id: I9f7d0c11b58b6f5a0e6a1fdaed0f92175bdeab34
      cda0b5e4
  28. 24 Jan, 2017 1 commit
  29. 07 Jan, 2017 2 commits
  30. 03 Jan, 2017 1 commit
    • David Barker's avatar
      Add new convolve variant for loop-restoration · be6cc07d
      David Barker authored
      The convolve filters generated by loop_wiener_filter_tile
      are not compatible with some existing convolve implementations
      (they can have coefficients >128, sums of (certain subsets of)
      coefficients >128, etc.)
      
      So we implement a new variant, which takes a filter with 128
      subtracted from its central element and which adds an extra copy
      of the source just before clipping to a pixel (reinstating the
      128 we subtracted). This should be easy to adapt from the existing
      convolve functions, and this patch includes SSE2 highbd and
      SSSE3 lowbd implementations.
      
      Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
      be6cc07d
  31. 20 Dec, 2016 2 commits