1. 07 Jul, 2017 1 commit
    • Lester Lu's avatar
      Signature changes for the LGT experiment · d8b1ddce
      Lester Lu authored
      The input arguments of av1_fht* and av1_iht* functions (and their
      HBD versions) are slightly changed. Input arguments tx_type and
      bd are carried by a struct fwd_txfm_param/inv_txfm_param. This
      struct is meant to later on carry other prediction information,
      such as intra top/left boundaries to the transform level, so
      that the choice of transforms can be more adaptive to the
      prediction mode and local video content.
      
      Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
      d8b1ddce
  2. 28 Jun, 2017 1 commit
  3. 27 Jun, 2017 1 commit
    • Yi Luo's avatar
      Fix inv txfm low/high bitdepth selection logic · 51281095
      Yi Luo authored
      We are going to have several commits to setup new low/high
      bitdepth data path selection logic. This patch is for inverse
      transform. Let me summarize the ideas as following.
      
      - For low/high bitdepth selection, encoder depends on
        input configuration, e.g., video sequence bitdepth,
        profile. Decoder depends on input bitstream. This has
        nothing to do with compiler/build  configuration.
      
      - Typical encoder usage for sampling format 4:2:0.
        1) 8-bit video sequence:
         a) --profile=0
         Fastest encoding/decoding pipeline on speedup.
      
         b) --profile=2 --bit-depth=10
         Image pixels are left shifted by 2 bits. It
         employs 16-bit reference frame buffer and has high
         calculation precision. It usually enjoys higher
         compression performance.
      
        2) 10/12-bit video sequence (HDR):
         --profile=2 --bit-depth=10/12
      
      - Transform coefficient type:
        Lowbitdepth:  int16_t
        Highbitdepth: int32_t
      
      - The type, tran_low_t is still used in codebase,
        Which is int32_t, defining the data path capacity.
        Naturally, it is high bitdepth.
      
      Eventually we shall remove the configuration flags,
      CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
      low and high bitdepth data path. Two data paths co-exist
      in the same build environment.
      
      Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
      51281095
  4. 21 Jun, 2017 1 commit
  5. 20 Jun, 2017 3 commits
  6. 16 Jun, 2017 1 commit
  7. 13 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add fast path quantizer AVX2 · 2d44b697
      Yi Luo authored
      - Function level improves 36% against sse2.
      - Encoder speeds up 2.6% at user level on i7-6700.
      
      Change-Id: I9e43ce60b1e0de8f532249e5c035851463d75dbb
      2d44b697
  8. 09 Jun, 2017 1 commit
  9. 08 Jun, 2017 1 commit
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  10. 07 Jun, 2017 1 commit
    • Yi Luo's avatar
      Add HBD data path for av1_block_error_avx2 · d61e608d
      Yi Luo authored
      - Add unit test for av1_block_error.
      - Fix av1_dist_block logic for calling av1_block_error.
      
      Change-Id: Id8a47ee113417360a29fc2334d9ca72b5793e2d7
      d61e608d
  11. 25 May, 2017 1 commit
    • Yi Luo's avatar
      Add HBD build to av1_quantize_fp_sse2 · bf8af7e6
      Yi Luo authored
      - This change turns on low bit depth data path for
        this function under default HBD build.
      - Encoder user level encoding time reduces ~12%
        on i7-6700.
      
      Change-Id: I7ce21e8db1a379f972e51c3b4ab305ca10e41efb
      bf8af7e6
  12. 20 May, 2017 1 commit
    • hui su's avatar
      DPCM intra coding experiment · b8a6fd6b
      hui su authored
      Encode a block line by line, horizontally or vertically. In the vertical
      mode, each row is predicted by the reconsturcted row above;
      in the horizontal mode, each column is predicted by the reconstructed
      column to the left.
      
      The DPCM modes are enabled automatically for blocks with horizontal or
      vertical prediction mode, and 1D transform types (ext-tx).
      
      Change-Id: I133ab6b537fa24a6e314ee1ef1d2fe9bd9d56c13
      b8a6fd6b
  13. 11 May, 2017 2 commits
    • Alex Converse's avatar
      Fix build with global motion disabled · ea166870
      Alex Converse authored
      Change-Id: I1c00925f83c6a858b0e799ddd90f241570a40575
      ea166870
    • David Barker's avatar
      Vectorize corner matching function · ee674323
      David Barker authored
      Add an SSE4 version of compute_cross_correlation() from
      corner_match.c. This function is about 3.4x the speed of
      the scalar code; determine_correspondence as a whole is about
      2.5-3x the speed it was previously.
      
      BUG=aomedia:487
      
      Change-Id: I707b7cfd5c513c025d3ee7fb6a5f1fa335ecd495
      ee674323
  14. 05 May, 2017 1 commit
  15. 04 May, 2017 1 commit
    • David Barker's avatar
      Add SSSE3 warp filter + const-ify warp filters · d8a423c6
      David Barker authored
      The SSSE3 filter is very similar to the SSE2 filter, but
      the horizontal pass is sped up by using the 8x8->16
      multiplies added in SSSE3.
      
      Also apply const-correctness to all versions of the filter
      
      The timings of the existing filters are unchanged, and the
      lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.
      
      Timings per 8x8 block:
      lowbd SSE2: 320ns
      lowbd SSSE3: 273ns
      highbd SSSE3: 300ns
      
      Filter output is unchanged.
      
      Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
      d8a423c6
  16. 20 Apr, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Drop support for CONFIG_EMULATE_HARDWARE · c6a48a25
      Sebastien Alaiwan authored
      This experiment complexifies DSP function dispatch, without bringing
      any real value (it's non-normative arbitrary behaviour).
      Moreover, it only has an effect on obsolete transforms, the new ones
      don't implement this mechanism.
      
      Change-Id: Idaccdd0c14ed6b7008cd4f365c7f017ba8ccacf5
      c6a48a25
  17. 12 Apr, 2017 1 commit
  18. 10 Apr, 2017 1 commit
  19. 06 Apr, 2017 2 commits
  20. 05 Apr, 2017 1 commit
    • Steinar Midtskogen's avatar
      CDEF: Add damping to dering · 8ff52fcc
      Steinar Midtskogen authored
      high-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1650 |  0.2545 |  0.2977 |  -0.0423 | -0.0947 | -0.0725 |    -0.0365
      
      low-latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.4006 |  0.0501 | -0.0108 |  -0.1790 | -0.1660 | -0.1992 |    -0.2135
      
      low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.5508 | -0.2445 | -0.2762 |  -0.1981 | -0.2878 | -0.2228 |    -0.3733
      
      Change-Id: Ia20df28c8bbb6182215b02016053af33bd498145
      8ff52fcc
  21. 04 Apr, 2017 1 commit
  22. 01 Apr, 2017 2 commits
  23. 31 Mar, 2017 1 commit
    • Urvang Joshi's avatar
      RTCD defs: Remove empty specialize statements once and for all. · 5ddac0aa
      Urvang Joshi authored
      A similar cleanup happened before, but the empty statements have since
      reappeared. I added a check in 'specialize' subroutine to die whenever
      such an empty specialize call is found, so that config+make would fail.
      
      Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9
      5ddac0aa
  24. 30 Mar, 2017 2 commits
  25. 29 Mar, 2017 1 commit
  26. 21 Mar, 2017 1 commit
  27. 20 Mar, 2017 1 commit
    • David Barker's avatar
      Fix two bugs in highbitdepth self-guided filter · 7e08ac3f
      David Barker authored
      This filter was temporarily removed due to test failures.
      This patch reintroduces the filter and fixes two bugs:
      
      * The test cases would occasionally segfault on x86, since
        the highbd filter requires its inputs to be aligned to
        16 bytes. This will always be true when used on real videos,
        so adjust the test cases to match.
      
      * The function calc_block was incorrect for bit_depth > 8,
        due to passing an incorrect argument to _mm_srl_epi32().
        This was the cause of the original test failures.
      
      BUG=aomedia:392
      
      Change-Id: Ia06b76c3e6122eebadd0995fb62f32c2fcab8b3e
      7e08ac3f
  28. 17 Mar, 2017 1 commit
    • Steinar Midtskogen's avatar
      Merge dering/clpf rdo and filtering · a9d41e88
      Steinar Midtskogen authored
      * Dering and clpf were merged into a single pass.
      * 32x32 and 128x128 filter block sizes for clpf were removed.
      * RDO for dering and clpf merged and improved:
        - "0" no longer required to be in the strength selection
        - Dering strength can now be 0, 1 or 2 bits per block
      
                    LL    HL
      PSNR:       -0.04 -0.01
      PSNR HVS:   -0.27 -0.18
      SSIM:       -0.15 +0.01
      CIEDE 2000: -0.11 -0.03
      APSNR:      -0.03 -0.00
      MS SSIM:    -0.18 -0.11
      
      Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
      a9d41e88
  29. 13 Mar, 2017 1 commit
    • Yaowu Xu's avatar
      Remove a sse4_1 function · def28b24
      Yaowu Xu authored
      Function apply_selfguided_restoration_highbd_sse4_1() is producing
      mismatch to c version, it is removed for now, allowing investigation
      and fix.
      
      BUG=aomedia:392
      
      Change-Id: Ic55e7a6958112c02930b1d5f3af2e2ea089fe500
      def28b24
  30. 10 Mar, 2017 2 commits
    • David Barker's avatar
      Vectorize new highpass filter for loop-restoration · eed824ef
      David Barker authored
      Change-Id: Ibe5d4933f599456cb496f636de244694bc786a4c
      eed824ef
    • Debargha Mukherjee's avatar
      Replace one self guided filter with highpass · b7bb0976
      Debargha Mukherjee authored
      Adds an option controlled by a macro to replace one of
      the guided filters in the self-guided tool with a simple
      bandpass filtered version generated with a 3x3 kernel.
      By default the macro USE_HIGHPASS_IN_SGRPROJ is 0 (turned
      off), that defaults us to the dual self-guided filter.
      When the macro is turned on, the larger radius guided
      filter is replaced by a simpler filter that is much faster.
      
      Results (if USE_HIGHPASS_IN_SGRPROJ is on vs. off):
      lowres: performance drop by +0.14% (BDRATE)
      midres: performance drop by +0.27% (BDRATE)
      
      Further experiments on this variation of guided filters is
      pending.
      
      Change-Id: I7bbcfcad7ee266cd49a8dc6d96795a454feb1a94
      b7bb0976
  31. 09 Mar, 2017 1 commit
  32. 08 Mar, 2017 1 commit
    • David Barker's avatar
      Make encoder use vectorized self-guided filter · 506eb723
      David Barker authored
      By rearranging the code in restoration.c, we can allow the
      encoder to use the SSE4.1 version of the self-guided filter
      while picking the loop-restoration filter.
      
      This also helps us prepare for adding a highbitdepth SSE4.1
      version of the self-guided filter.
      
      No effect on encoder output, but gives an end-to-end speedup
      of 1-2%.
      
      Change-Id: Id17ba4a0963ddce9f70a7cae666e212e138d5f2c
      506eb723
  33. 06 Mar, 2017 1 commit
    • David Barker's avatar
      Vectorize self-guided filter · ce110cc5
      David Barker authored
      Add an SSE4.1 lowbd version of the self-guided filter for
      loop-restoration, and apply some optimizations to the C
      version.
      
      Approximate times per 128x128 / 256x256 tile on the machine
      this was developed on:
      Previous C:  620us / 2800us
      Optimized C: 500us / 2200us ( 24% /  27% faster)
      SSE4.1:      147us / 600us  (320% / 370% faster)
      
      Change-Id: I23ff5a5482a191aeb06f9d1f767a9f036bb357fe
      ce110cc5