1. 20 Jun, 2017 2 commits
  2. 16 Jun, 2017 1 commit
  3. 13 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add fast path quantizer AVX2 · 2d44b697
      Yi Luo authored
      - Function level improves 36% against sse2.
      - Encoder speeds up 2.6% at user level on i7-6700.
      
      Change-Id: I9e43ce60b1e0de8f532249e5c035851463d75dbb
      2d44b697
  4. 09 Jun, 2017 1 commit
  5. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  6. 07 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add HBD data path for av1_block_error_avx2 · d61e608d
      Yi Luo authored
      - Add unit test for av1_block_error.
      - Fix av1_dist_block logic for calling av1_block_error.
      
      Change-Id: Id8a47ee113417360a29fc2334d9ca72b5793e2d7
      d61e608d
  7. 25 May, 2017 1 commit
    • Yi Luo's avatar
      Add HBD build to av1_quantize_fp_sse2 · bf8af7e6
      Yi Luo authored
      - This change turns on low bit depth data path for
        this function under default HBD build.
      - Encoder user level encoding time reduces ~12%
        on i7-6700.
      
      Change-Id: I7ce21e8db1a379f972e51c3b4ab305ca10e41efb
      bf8af7e6
  8. 20 May, 2017 1 commit
    • hui su's avatar
      DPCM intra coding experiment · b8a6fd6b
      hui su authored
      Encode a block line by line, horizontally or vertically. In the vertical
      mode, each row is predicted by the reconsturcted row above;
      in the horizontal mode, each column is predicted by the reconstructed
      column to the left.
      
      The DPCM modes are enabled automatically for blocks with horizontal or
      vertical prediction mode, and 1D transform types (ext-tx).
      
      Change-Id: I133ab6b537fa24a6e314ee1ef1d2fe9bd9d56c13
      b8a6fd6b
  9. 11 May, 2017 2 commits
    • Alex Converse's avatar
      Fix build with global motion disabled · ea166870
      Alex Converse authored
      Change-Id: I1c00925f83c6a858b0e799ddd90f241570a40575
      ea166870
    • David Barker's avatar
      Vectorize corner matching function · ee674323
      David Barker authored
      Add an SSE4 version of compute_cross_correlation() from
      corner_match.c. This function is about 3.4x the speed of
      the scalar code; determine_correspondence as a whole is about
      2.5-3x the speed it was previously.
      
      BUG=aomedia:487
      
      Change-Id: I707b7cfd5c513c025d3ee7fb6a5f1fa335ecd495
      ee674323
  10. 05 May, 2017 1 commit
  11. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  12. 20 Apr, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Drop support for CONFIG_EMULATE_HARDWARE · c6a48a25
      Sebastien Alaiwan authored
      This experiment complexifies DSP function dispatch, without bringing
      any real value (it's non-normative arbitrary behaviour).
      Moreover, it only has an effect on obsolete transforms, the new ones
      don't implement this mechanism.
      
      Change-Id: Idaccdd0c14ed6b7008cd4f365c7f017ba8ccacf5
      c6a48a25
  13. 12 Apr, 2017 1 commit
  14. 10 Apr, 2017 1 commit
  15. 06 Apr, 2017 2 commits
  16. 05 Apr, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF: Add damping to dering · 8ff52fcc
      Steinar Midtskogen authored
      high-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1650 |  0.2545 |  0.2977 |  -0.0423 | -0.0947 | -0.0725 |    -0.0365
      
      low-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.4006 |  0.0501 | -0.0108 |  -0.1790 | -0.1660 | -0.1992 |    -0.2135
      
      low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.5508 | -0.2445 | -0.2762 |  -0.1981 | -0.2878 | -0.2228 |    -0.3733
      
      Change-Id: Ia20df28c8bbb6182215b02016053af33bd498145
      8ff52fcc
  17. 04 Apr, 2017 1 commit
  18. 01 Apr, 2017 2 commits
  19. 31 Mar, 2017 1 commit
    • Urvang Joshi's avatar
      RTCD defs: Remove empty specialize statements once and for all. · 5ddac0aa
      Urvang Joshi authored
      A similar cleanup happened before, but the empty statements have since
      reappeared. I added a check in 'specialize' subroutine to die whenever
      such an empty specialize call is found, so that config+make would fail.
      
      Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9
      5ddac0aa
  20. 30 Mar, 2017 2 commits
  21. 29 Mar, 2017 1 commit
  22. 21 Mar, 2017 1 commit
  23. 20 Mar, 2017 1 commit
    • David Barker's avatar
      Fix two bugs in highbitdepth self-guided filter · 7e08ac3f
      David Barker authored
      This filter was temporarily removed due to test failures.
      This patch reintroduces the filter and fixes two bugs:
      
      * The test cases would occasionally segfault on x86, since
        the highbd filter requires its inputs to be aligned to
        16 bytes. This will always be true when used on real videos,
        so adjust the test cases to match.
      
      * The function calc_block was incorrect for bit_depth > 8,
        due to passing an incorrect argument to _mm_srl_epi32().
        This was the cause of the original test failures.
      
      BUG=aomedia:392
      
      Change-Id: Ia06b76c3e6122eebadd0995fb62f32c2fcab8b3e
      7e08ac3f
  24. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  25. 13 Mar, 2017 1 commit
    • Yaowu Xu's avatar
      Remove a sse4_1 function · def28b24
      Yaowu Xu authored
      Function apply_selfguided_restoration_highbd_sse4_1() is producing
      mismatch to c version, it is removed for now, allowing investigation
      and fix.
      
      BUG=aomedia:392
      
      Change-Id: Ic55e7a6958112c02930b1d5f3af2e2ea089fe500
      def28b24
  26. 10 Mar, 2017 2 commits
    • David Barker's avatar
      Vectorize new highpass filter for loop-restoration · eed824ef
      David Barker authored
      Change-Id: Ibe5d4933f599456cb496f636de244694bc786a4c
      eed824ef
    • Debargha Mukherjee's avatar
      Replace one self guided filter with highpass · b7bb0976
      Debargha Mukherjee authored
      Adds an option controlled by a macro to replace one of
      the guided filters in the self-guided tool with a simple
      bandpass filtered version generated with a 3x3 kernel.
      By default the macro USE_HIGHPASS_IN_SGRPROJ is 0 (turned
      off), that defaults us to the dual self-guided filter.
      When the macro is turned on, the larger radius guided
      filter is replaced by a simpler filter that is much faster.
      
      Results (if USE_HIGHPASS_IN_SGRPROJ is on vs. off):
      lowres: performance drop by +0.14% (BDRATE)
      midres: performance drop by +0.27% (BDRATE)
      
      Further experiments on this variation of guided filters is
      pending.
      
      Change-Id: I7bbcfcad7ee266cd49a8dc6d96795a454feb1a94
      b7bb0976
  27. 09 Mar, 2017 1 commit
  28. 08 Mar, 2017 1 commit
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
  29. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5
  30. 02 Mar, 2017 1 commit
    • Yue Chen's avatar
      Use 3-tap spatial filter in FILTER_INTRA experiment · 8d8638a1
      Yue Chen authored
      3-tap recursive intra prediction filters are added.
      Macro USE_3TAP_INTRA_FILTER is set to 1 to use 3-tap by default.
      Coding gain of FILTER_INTRA experiment in AWCY, high delay 150f
      3-tap: 0.51%
      4-tap: 0.68%
      
      Change-Id: I44192dd08bfd8155f58a9b0b5cf1de88fceb762e
      8d8638a1
  31. 28 Feb, 2017 1 commit
    • Michael Bebenita's avatar
      Add SIMD code for PVQ search · 3a88de8f
      Michael Bebenita authored
      This reduces the runtime profile of pvq_search_rdo_double from 37%
      to 15% and improves overall encoding speed when PVQ is enabled by ~40%.
      The SIMD code is not bit accurate with the C version and introduces a
      slight PSNR regression on AWCY:
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
      0.0607 |  0.1044 |     N/A |   0.0126 |  N/A | -0.0309 |        N/A
      
      Change-Id: Ie22cebc62df2e72618305f2268668d79167860c6
      3a88de8f
  32. 24 Feb, 2017 1 commit
    • Angie Chiang's avatar
      Let hbd conv func be flexible · 0a2c0cbc
      Angie Chiang authored
      This CL allow us to change filter coefficients easily for SIMD
      implementation of high bitdepth convolution functions
      
      Change-Id: I454a5c76d3ba9e4454118c6a9d87737b3aa24898
      0a2c0cbc
  33. 18 Feb, 2017 1 commit
  34. 19 Jan, 2017 1 commit