1. 29 Mar, 2017 1 commit
  2. 21 Mar, 2017 2 commits
  3. 19 Mar, 2017 1 commit
  4. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  5. 27 Feb, 2017 1 commit
  6. 10 Feb, 2017 2 commits
    • Steinar Midtskogen's avatar
      Speed up CLPF when there's nothing to clip · f844e6ef
      Steinar Midtskogen authored
      Gives 7% speed-up in the CLPF processing (measured on SSE4.2).
      
      Change-Id: I934ad85ef2066086a44387030b42e14301b3d428
      f844e6ef
    • Steinar Midtskogen's avatar
      Retune the CLPF kernel · 4f0b3ed8
      Steinar Midtskogen authored
      CLPF performance had degraded by about 0.5% over the past six months,
      which isn't totally surprising since the codec is a moving target.
      About half of that degradation comes from the improved 7 bit filter
      coefficients.  Therefore, CLPF needs to be retuned for the current
      codec.
      
      This patch makes two (normative) changes to the CLPF kernel:
      
      * The clipping function was changed from clamp(x, -s, s) to
            sign(x) * max(0, abs(x) - max(0, abs(x) - s +
                   (abs(x) >> (bitdepth - 3 - log2(s)))))
        This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
        for 10 bit, etc), so large differences are ignored.
      
      * 8 taps instead of 6 taps:
                     1
          4          3
        13 31  ->  13 31
          4          3
                     1
      
      AWCY results: low delay  high delay
      PSNR:           -0.40%     -0.47%
      PSNR HVS:        0.00%     -0.11%
      SSIM:           -0.31%     -0.39%
      CIEDE 2000:     -0.22%     -0.31%
      APSNR:          -0.40%     -0.48%
      MS SSIM:         0.01%     -0.12%
      
      About 3/4 of the gains come from the new clipping function.
      
      Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
      4f0b3ed8
  7. 08 Feb, 2017 1 commit
  8. 11 Oct, 2016 1 commit
    • Steinar Midtskogen's avatar
      Clean up and speed up CLPF clipping · e66fc87c
      Steinar Midtskogen authored
      * Move clipping tests from inside to outside loops
      * Let sizex and sizey to clpf_block() be the clipped block size rather
        than both just bs
      * Make fallback tests to C more accurate
      
      Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
      e66fc87c
  9. 10 Oct, 2016 5 commits
  10. 28 Sep, 2016 1 commit
    • Steinar Midtskogen's avatar
      Clean up and speed up CLPF clipping · ae95e6db
      Steinar Midtskogen authored
      * Move clipping tests from inside to outside loops
      * Let sizex and sizey to clpf_block() be the clipped block size rather
        than both just bs
      * Make fallback tests to C more accurate
      
      Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
      ae95e6db
  11. 16 Sep, 2016 1 commit
    • Steinar Midtskogen's avatar
      Extend CLPF to chroma. · a25c6c3b
      Steinar Midtskogen authored
      Objective quality impact (low latency):
      
      PSNR YCbCr:      0.13%     -1.37%     -1.79%
         PSNRHVS:      0.03%
            SSIM:      0.24%
          MSSSIM:      0.10%
       CIEDE2000:     -0.83%
      
      Change-Id: I8ddf0def569286775f0f9d4d4005932766a7fc27
      a25c6c3b
  12. 13 Sep, 2016 1 commit
  13. 08 Sep, 2016 1 commit
    • Steinar Midtskogen's avatar
      Reduce memory footprint for CLPF decoding. · eb5794da
      Steinar Midtskogen authored
      Instead of having CLPF write to an entire new frame and
      copy the result back into the original frame, make the
      filter able to work in-place by keeping a buffer of size
      frame_width*filter_block_size and delay the write-back
      by one filter_block_size row.
      
      This reduces the cycles spent in the filter to ~75%.
      
      Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
      eb5794da
  14. 07 Sep, 2016 1 commit
  15. 01 Sep, 2016 1 commit