1. 02 Feb, 2018 1 commit
    • Imdad Sardharwalla's avatar
      AVX2 implementation of the Wiener filter · aab6aee3
      Imdad Sardharwalla authored
      Added an AVX2 version of the Wiener filter, along with associated tests. Speed
      tests have been added for all implementations of the Wiener filter.
      
      Speed Test results
      ==================
      
      GCC
      ---
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~92% less time
      - AVX2 vs C: AVX2 takes ~96% less time
      - SSE2 vs AVX2: AVX2 takes ~43% less time (~74% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~92% less time
      - AVX2  vs C: AVX2  takes ~96% less time
      - SSSE3 vs AVX2: AVX2 takes ~46% less time (~84% faster)
      
      CLANG
      -----
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~84% less time
      - AVX2 vs C: AVX2 takes ~88% less time
      - SSE2 vs AVX2: AVX2 takes ~27% less time (~36% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~85% less time
      - AVX2  vs C: AVX2  takes ~89% less time
      - SSS3  vs AVX2: AVX2 takes ~24% less time (~31% faster)
      
      Change-Id: Ide22d7c09c0be61483e9682caf17a39438e4a208
      aab6aee3
  2. 29 Jan, 2018 1 commit
  3. 17 Jan, 2018 1 commit
    • Neil Birkbeck's avatar
      Add utilities to aom_dsp for modeling correlated noise. · ed25a610
      Neil Birkbeck authored
      The auto-regressive model allows for different window shapes
      and different lag sizes.
      
      Although most likely to be used as a reference for modeling
      noise in AV1, the model is currently parameterized more generally
      than AV1 needs.
      
      I will add an example (hopefully with a denoiser) in future
      commits.
      
      Change-Id: I1ba1067543601c2c01db4970d42766bb35da77f0
      ed25a610
  4. 08 Jan, 2018 1 commit
  5. 27 Dec, 2017 1 commit
  6. 05 Dec, 2017 1 commit
  7. 25 Nov, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Split big file into two · 20dadeae
      Sebastien Alaiwan authored
      The file sad.c alone takes 35 seconds to compile.
      This often happens to be on the build critical path.
      Split it into two source files so they can be compiled in parallel.
      
      Change-Id: I35636d8a3da9d67edb8dbf202fd5e7a687a6aaa9
      20dadeae
  8. 22 Nov, 2017 2 commits
    • Cheng Chen's avatar
      JNT_COMP: add ssse3 implementations for sad_avg · d0179a6b
      Cheng Chen authored
      Add ssse3 implementations for the sad_avg c function at low bit-depth.
      With this, aom_jnt_sad c functions can all have simd implementations.
      This CL follows existing MACRO definitions for multiple combinations
      of block sizes.
      
      Change-Id: I882343684026525f5589a239337cfac2dd411e11
      d0179a6b
    • Cheng Chen's avatar
      JNT_COMP: SIMD implementation for aom_jnt_sub_pixel_avg · d286443c
      Cheng Chen authored
      Change function names and add SIMD implementation for two c functions:
      (1) var_filter_block2d_bil_first_pass
      (2) var_filter_block2d_bil_second_pass
      
      This CL allows aom_jnt_sub_pixel_avg_variance now in SIMD.
      
      Change-Id: Ib41ef13d62ae91a0ca481bcebb24568dcd4722c4
      d286443c
  9. 21 Nov, 2017 1 commit
  10. 09 Nov, 2017 1 commit
    • Tom Finegan's avatar
      cmake: Silence ranlib warnings when jnt_comp is disabled. · 2ca19f90
      Tom Finegan authored
      Build aom_dsp/x86/variance_ssse3.c and
      av1/common/x86/convolve_2d_sse4.c only when jnt_comp is enabled.
      
      When jnt_comp is disabled the sources define no symbols and
      ranlib complains.
      
      Change-Id: I3de42be6a88bd65459799d3a523e307c40a36a72
      2ca19f90
  11. 06 Nov, 2017 1 commit
  12. 31 Oct, 2017 1 commit
  13. 10 Oct, 2017 2 commits
    • Yi Luo's avatar
      Highbd D45E intrapred SSE2/AVX2 speedup · 56ad3dd3
      Yi Luo authored
      Function  SSE2 vs C  AVX2 vs C
      4x4       ~4.5x
      4x8       ~4.5x
      8x4       ~11.7x
      8x8       ~12.7x
      8x16      ~14.0x
      16x8                 ~21.7x
      16x16                ~24.0x
      16x32                ~28.7x
      32x16                ~20.5x
      32x32                ~24.4x
      
      Change-Id: Iaca49727d8df17b7f793b774a8d51a401ef8a8d1
      56ad3dd3
    • Yi Luo's avatar
      Migrate some vp9 highbd intrapred x86 speedup to av1 · 71b6e043
      Yi Luo authored
      Function speedup on i7-6700:
      D117   sse2   ssse3
      4x4    ~1.8x
      8x8           ~3.4x
      16x16         ~5.5x
      32x32         ~2.9x
      
      D135   sse2   ssse3
      4x4    ~1.9
      8x8           ~3.3x
      16x16         ~5.3x
      32x32         ~3.6x
      
      D153   sse2   ssse3
      4x4    ~1.9x
      8x8           ~2.8x
      16x16         ~5.5x
      32x32         ~3.6x
      
      Change-Id: I43ab5fa8dcbcfa51acbde554abf3e5d7d336f391
      71b6e043
  14. 08 Oct, 2017 1 commit
  15. 06 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd SMOOTH_PRED intrapred ssse3 optimization · 46ae1ea3
      Yi Luo authored
      On i7-6700:
      Predictor    ssse3 v. C
      4x4          ~1.3x
      4x8          ~1.9x
      8x4          ~2.3x
      8x8          ~3.4x
      8x16         ~4.1x
      16x8         ~4.6x
      16x16        ~5.2x
      16x32        ~5.6x
      32x16        ~4.2x
      32x32        ~4.7x
      
      Change-Id: Ic12383cf9d4446361d6355eb8a480a3c7602060e
      46ae1ea3
  16. 02 Oct, 2017 1 commit
  17. 29 Sep, 2017 2 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
      a0f66fc0
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
      23c61903
  18. 27 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd rect intrapred DC/LEFT/TOP/128 sse2 optimization · 39bdf36a
      Yi Luo authored
      Add lowbd unit test functionality to intrapred_test.cc
      Function speedup against C (i7-6700):
      Predictor   DC     LEFT   TOP    128
      4x8        ~1.4x  ~1.4x  ~1.7x  ~1.9x
      8x4        ~1.2x  ~1.6x  ~1.6x  ~2.6x
      8x16       ~1.4x  ~1.3x  ~1.4x  ~2.1x
      16x8       ~2.0x  ~1.8x  ~2.3x  ~2.1x
      16x32      ~2.0x  ~1.9x  ~1.8x  ~2.2x
      32x16      ~2.0x  ~2.0x  ~1.9x  ~2.2x
      
      Change-Id: I33db512020ca3c6853a9205a8079f3d00134f584
      39bdf36a
  19. 18 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd intra pred H_PRED sse2 optimization · 23b9b317
      Yi Luo authored
      sse2 v. C speedup:
      4x4   ~8.0x
      8x8   ~8.2x
      16x16 ~6.5x
      32x32 ~3.8x
      Blocksize:
      4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32
      Square blocksize code is from libvpx:
      "30d9a1916 vpxdsp: [x86] add highbd_h_predictor functions",
      Credit goes to Scott LaVarnway. Speed tests do not support
      rectangular blocksize yet.
      
      Change-Id: I9a1f24aecab8de94f8ea59ec8748fe3537d721ae
      23b9b317
  20. 07 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd parallel_deblocking sse2 optimization · ea8a0d52
      Yi Luo authored
      Baseline + parallel_deblocking:
      
      - Passed unit tests *SSE2/Loop8Test6*, *AVX2/Loop8Test6*.
      - 1080p, 25 frames, profile=0, encoding/decoding, output match.
      - Decoder frame rate increases from 54.15 to 65.84.
      
      Change-Id: I55938c94961066594f4b9080192c7268c19d9bf9
      ea8a0d52
  21. 10 Aug, 2017 1 commit
    • Yi Luo's avatar
      Highbd loop filter AVX2 · 6ae0054c
      Yi Luo authored
      - Speed test (ms) on i7-6700, Linux x86_64
        FUNCTION             SSE2    AVX2
        horizontal_edge_16   55      28
        vertical_16_dual     84      47
        horizontal_4_dual    27      13
        horizontal_8_dual    36      15
        vertical_4_dual      38      25
        vertical_8_dual      44      27
      - Decoder frame rate improves around 1.2% - 2.8%.
      
      Change-Id: I9c4123869bac9b6d32e626173c2a8e7eb0cf49e7
      6ae0054c
  22. 11 Jul, 2017 1 commit
  23. 26 Jun, 2017 1 commit
  24. 20 Jun, 2017 2 commits
    • Tom Finegan's avatar
      Build static libaom without internal deps in CMake. · 78516fca
      Tom Finegan authored
      Change the internal lib targets so that external apps
      need link only libaom instead of all internal library
      targets and libaom.
      
      BUG=aomedia:76,aomedia:609
      
      Change-Id: I38862fcd90cb585300b6b23e8558f78a1934750f
      78516fca
    • Tom Finegan's avatar
      Add shared library support to the CMake build. · 84f2d796
      Tom Finegan authored
      This is enabled via:
      $ cmake path/to/aom -DBUILD_SHARED_LIBS=1
      
      Currently supports only Linux and MacOS targets. Symbol visibility
      is handled by exports.cmake and its helpers exports_sources.cmake
      and generate_exports.cmake.
      
      Some sweeping changes were required to properly support shared libs
      and control symbol visibility:
      
      - Object libraries are always linked privately into static
        libraries.
      - Static libraries are always linked privately into eachother
        in the many cases where the CMake build merges multiple library
        targets.
      - aom_dsp.cmake now links all its targets into the aom_dsp static
        library target, and privately links aom_dsp into the aom target.
      - av1.cmake now links all its targets into the aom_av1 static library
        target, and privately links in aom_dsp and aom_scale as well. It
        then privately links aom_av1 into the aom target.
      - The aom_mem, aom_ports, aom_scale, and aom_util targets are now
        static libs that are privately linked into the aom target.
      - In CMakeLists.txt libyuv and libwebm are now privately linked into
        app targets.
      - The ASM and intrinsic library functions in aom_optimization.cmake
        now both require a dependent target argument. This facilitates the
        changes noted above regarding new privately linked static library
        targets for ASM and intrinsics sources.
      
      BUG=aomedia:76,aomedia:556
      
      Change-Id: I4892059880c5de0f479da2e9c21d8ba2fa7390c3
      84f2d796
  25. 16 Jun, 2017 1 commit
    • Tom Finegan's avatar
      Sync CMake build with the configure build. · 3613c517
      Tom Finegan authored
      - Added: CONFIG_COLORSPACE_HEADERS CONFIG_SPEED_REFS
               CONFIG_LGT CONFIG_SBL_SYMBOL
      - Removed: CONFIG_RECT_INTRA_PRED
      - Changed, 0 => 1: CONFIG_EXT_INTER CONFIG_INTERINTRA
                         CONFIG_WEDGE CONFIG_COMPOUND_SEGMENT
                 1 => 0: CONFIG_ONE_SIDED_COMPOUND
      
      BUG=aomedia:76
      
      Change-Id: If9ebd068d0014386ec25d91226a577c591f5a774
      3613c517
  26. 02 Jun, 2017 2 commits
    • Tom Finegan's avatar
      Sync CMake build defaults with the configure build. · 6f9dfa51
      Tom Finegan authored
      - Added: CONFIG_ONE_SIDED_COMPOUND CONFIG_VAR_REFS
      - Removed: CONFIG_SUB8X8_MC CONFIG_EC_MULTISYMBOL
                 CONFIG_DAALA_EC CONFIG_LOWDELAY_COMPOUND
      - Changed, 0 => 1: CONFIG_VAR_TX CONFIG_EC_SMALLMUL
                         CONFIG_CHROMA_SUB8X8
                         CONFIG_LOOPFILTERING_ACROSS_TILES
                         CONFIG_TEMPMV_SIGNALING
      
      BUG=aomedia:76
      
      Change-Id: Ia010abeaf079d8c6318a5a540e9354d5455ce826
      6f9dfa51
    • Tom Finegan's avatar
      Add include guards to CMake files used as includes. · 17ccaec4
      Tom Finegan authored
      BUG=aomedia:76
      
      Change-Id: Ie34025f31a89f4991d03d5ecf03c6f6f5ab7b0a1
      17ccaec4
  27. 30 May, 2017 1 commit
  28. 25 May, 2017 2 commits
  29. 23 May, 2017 1 commit
    • David Barker's avatar
      ext-inter: Delete dead code · 0f3c94e1
      David Barker authored
      Patches https://aomedia-review.googlesource.com/c/11987/
      and https://aomedia-review.googlesource.com/c/11988/
      replaced the old masked motion search pipeline with
      a new one which uses different SAD/SSE functions.
      This resulted in a lot of dead code.
      
      This patch removes the now-dead code. Note that this
      includes vectorized SAD/SSE functions, which will need
      to be rewritten at some point for the new pipeline. It
      also includes the masked_compound_variance_* functions
      since these turned out not to be used by the new pipeline.
      
      To help with the later addition of vectorized functions, the
      masked_sad/variance_test.cc files are kept but are modified
      to work with the new functions. The tests are then disabled
      until we actually have the vectorized functions.
      
      Change-Id: I61b686abd14bba5280bed94e1be62eb74ea23d89
      0f3c94e1
  30. 22 May, 2017 1 commit
  31. 08 May, 2017 1 commit
    • Yi Luo's avatar
      Partial IDCT 16x16 avx2 · f6176abb
      Yi Luo authored
      - Function level improvement:
      functions      sse2  avx2  percentage
      idct16x16_256  365   226   38%
      idct16x16_38   n/a   136   n/a
      idct16x16_10   171   110   35%
      idct16x16_1     34    26   23%
      
      - Integrated in AV1 for default scan order.
      
      Change-Id: Ieb1a8e730bea9c371ebc0e5f4a748640d8f5e921
      f6176abb
  32. 28 Apr, 2017 1 commit
    • Tom Finegan's avatar
      Silence warning in CMake mips64 build. · 34939825
      Tom Finegan authored
      Removed source file (aom_dsp/mips/add_noise_msa.c) from aom_dsp.cmake
      to match the configure build.
      
      Change-Id: I7f966afda209cff8949441bf30d757c19bcf65e7
      34939825
  33. 27 Apr, 2017 1 commit
  34. 14 Apr, 2017 1 commit