1. 15 Feb, 2017 5 commits
    • Zoe Liu's avatar
      Make convolve_round compiled without dual_filter · 1b672d3f
      Zoe Liu authored
      Change-Id: I532e46b3947ca3f5898a2da61fb6b82c2f4bd5c6
      1b672d3f
    • Tom Finegan's avatar
      Add MSVC win32 support to the cmake build. · 1ba9bd89
      Tom Finegan authored
      BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76
      
      Change-Id: I3179fe9ec45ff1aab06cc8828d2bb34c141cca55
      1ba9bd89
    • Tom Finegan's avatar
      Correct cmake intrinsic flag translation. · 0b3c9052
      Tom Finegan authored
      MSVC only. Use the AVX/AVX2 flags only for AVX and AVX2. Ignore
      the SSE flags since they're not needed with MSVC.
      
      BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76
      
      Change-Id: I0f3ac40ffb1f9c53a16272f0781df176317732f6
      0b3c9052
    • David Barker's avatar
      Speed up global motion determination · 15338d5f
      David Barker authored
      When global-motion is enabled, a considerable amount
      of encoder time is spent in the functions in corner_match.c.
      This patch optimizes those functions to be 3.5-4x as fast,
      leading to an end-to-end encoder speed improvement
      (on 20 frames of tempete_cif.y4m) of:
      
       200kbps: ~26% faster
       800kbps: ~19% faster
      2800kbps: ~12% faster
      
      Change-Id: I04d3f87484c36c41eb5a1e86e814f2accbe86297
      15338d5f
    • Nathan E. Egge's avatar
      Add flag for RAWBITS to use raw bits with DAALA_EC. · 24f1a904
      Nathan E. Egge authored
      The use of raw bits is now disabled by default and can be turned on with:
       ./configure --enable-experimental --enable-rawbits
      This commit has a negligible impact on rate.
      
      subset1:
      
      master@2017-02-14T18:57:22.282Z -> no_rawbits@2017-02-14T18:57:41.977Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0000 | -0.0000 | -0.0000 |  -0.0000 | -0.0000 | -0.0000 |    -0.0000
      
      objective-1-fast:
      
      master@2017-02-14T18:52:48.425Z -> no_rawbits@2017-02-14T18:52:04.489Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0001 | -0.0001 | -0.0001 |  -0.0001 | -0.0001 | -0.0001 |    -0.0001
      
      Change-Id: I01e79e9f314565a64b224ca41047f7bd7fe33f70
      24f1a904
  2. 14 Feb, 2017 10 commits
  3. 13 Feb, 2017 18 commits
  4. 12 Feb, 2017 4 commits
    • Jingning Han's avatar
      Make adapt-scan support multi-thread encoding · 5d0b310b
      Jingning Han authored
      This commit makes the adaptive scan order system support multi-
      thread encoding. It fixes unit test failure issue associated with
      AV1/AVxEncoderThreadTest.EncoderResultTest/0.
      
      BUG=aomedia:353
      
      Change-Id: I61cbf9531c8deab97fb3bb17428d0b2a63cf309a
      5d0b310b
    • Jingning Han's avatar
      Separate intra tx_size logic between var-tx and rect-tx · cb512283
      Jingning Han authored
      Skip rectangular transform block size coding for intra coded block
      in var-tx mode, when the rect-tx is disabled.
      
      Change-Id: If3a091d25f19bf4a67485b5d235bb3d7d0c2cd03
      cb512283
    • Angie Chiang's avatar
      Implement shorter-tap first in convolve_round · 118bf67c
      Angie Chiang authored
      The performance change is 0.004% on lowres
      
      Change-Id: If3702ba6377ac42997e7d49b8959ff16fb182daa
      118bf67c
    • David Barker's avatar
      Fix segfault with loop-restoration on x86. · befcc425
      David Barker authored
      The WienerInfo struct requires a 16-byte alignment on x86,
      since it contains filter coefficients which are loaded using
      SSE aligned load instructions. But on 32-bit x86, the default
      alignment of aom_malloc/aom_realloc is only 8 bytes, leading
      to occasional segfaults.
      
      To fix this, rather than using aom_realloc to resize WienerInfo
      structures, we always free and re-allocate them using aom_memalign
      
      BUG=aomedia:345
      
      Change-Id: Ib1b2a42d4a2fa215dcc81ea481c51271ab068a37
      befcc425
  5. 11 Feb, 2017 1 commit
    • Zoe Liu's avatar
      Add a new experiment of REF_ADAPT · b05e5d10
      Zoe Liu authored
      Noticed that some ALTREF_FRAMEs could have used compound modes for its
      prediction but have been labeled as SINGLE_REFERENCE mode in the frame
      header. This experiment is to remove the COMPOUND_REFERENCE mode from
      the frame-level reference mode choices and only leave SINGLE_REFERENCE
      and REFERENCE_MODE_SELECT the two choices in the frame header.
      
      When turning on both ext-refs and ref-adapt, compared against ext-refs
      itself, a small gain is achieved. In PSNR, the bitrate saving gains are
      as follows:
      
      lowres: Avg -0.120%; BDRate -0.128%
      midres: Avg -0.155%; BDRate -0.128%
      
      Change-Id: I2cfff8a6b7eaa65ef863dbdbc4dd086d3b586f8c
      b05e5d10
  6. 10 Feb, 2017 2 commits
    • Steinar Midtskogen's avatar
      Speed up CLPF when there's nothing to clip · f844e6ef
      Steinar Midtskogen authored
      Gives 7% speed-up in the CLPF processing (measured on SSE4.2).
      
      Change-Id: I934ad85ef2066086a44387030b42e14301b3d428
      f844e6ef
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8