1. 18 Dec, 2017 1 commit
    • Cheng Chen's avatar
      JNT_COMP: add SIMD and interface for high bit-depth · bf3d4964
      Cheng Chen authored
      Add high bit-depth macro definitions:
      highbd_jnt_sad
      highbd_8(10/12)_jnt_sub_pixel_avg.
      
      Add SIMD functions:
      aom_highbd_jnt_comp_avg_pred_sse2
      aom_highbd_jnt_comp_avg_upsampled_pred_sse2
      
      This patch also solves the seg fault caused by low bit-depth and
      high bit-depth paths
      
      BUG=aomedia:967
      BUG=aomedia:944
      
      Change-Id: Iea69f114e81ca226a30d84a540ad846f1b94b8d6
      bf3d4964
  2. 15 Dec, 2017 1 commit
    • Johann's avatar
      add copyright to rtcd files · aecbba6d
      Johann authored
      Allows them to pass the license check in chromium.
      
      Based on libvpx e4b3f03
      
      BUG=chromium:795297
      
      Change-Id: I2bb49ecb62f20d7bc5093a1732b6a8228ef5c87f
      aecbba6d
  3. 14 Dec, 2017 1 commit
    • Urvang Joshi's avatar
      round_shift_array: Use SSE4 version everywhere. · 1ac47a7c
      Urvang Joshi authored
      Usage of CPU by round_shift_array goes from 2.01% to 1.04%.
      Overall encoding is slightly faster (~0.05%).
      
      This means some of the intermediate array have to be aligned.
      Also, these functions were moved to common header/source files.
      
      BUG=aomedia:1106
      
      Change-Id: I492c9b1f2e7339c6cb83cfe68a61218642654d1b
      1ac47a7c
  4. 13 Dec, 2017 1 commit
  5. 04 Dec, 2017 1 commit
  6. 29 Nov, 2017 1 commit
    • James Zern's avatar
      Unify highbd loopfilter function names · 684b7bd1
      James Zern authored
      Rename aom_highbd_lpf_horizontal_edge_8() to aom_highbd_lpf_horizontal_16().
      Rename aom_highbd_lpf_horizontal_edge_16() to aom_highbd_lpf_horizontal_16_dual().
      
      based on the same change from libvpx:
      7f1f35183 Unify loopfilter function names
      
      Change-Id: I40cd587e74e0fe02bae23e6c10280c8e269df1d6
      684b7bd1
  7. 27 Nov, 2017 1 commit
    • James Zern's avatar
      Unify loopfilter function names · 1dbe80bc
      James Zern authored
      Rename aom_lpf_horizontal_edge_8() to aom_lpf_horizontal_16().
      Rename aom_lpf_horizontal_edge_16() to aom_lpf_horizontal_16_dual().
      
      based on the same change from libvpx:
      7f1f35183 Unify loopfilter function names
      
      Change-Id: I4fda7a2e3a893fc3dee0779975e2d4145c32f5d2
      1dbe80bc
  8. 25 Nov, 2017 1 commit
  9. 22 Nov, 2017 2 commits
    • Cheng Chen's avatar
      JNT_COMP: add ssse3 implementations for sad_avg · d0179a6b
      Cheng Chen authored
      Add ssse3 implementations for the sad_avg c function at low bit-depth.
      With this, aom_jnt_sad c functions can all have simd implementations.
      This CL follows existing MACRO definitions for multiple combinations
      of block sizes.
      
      Change-Id: I882343684026525f5589a239337cfac2dd411e11
      d0179a6b
    • Cheng Chen's avatar
      JNT_COMP: SIMD implementation for aom_jnt_sub_pixel_avg · d286443c
      Cheng Chen authored
      Change function names and add SIMD implementation for two c functions:
      (1) var_filter_block2d_bil_first_pass
      (2) var_filter_block2d_bil_second_pass
      
      This CL allows aom_jnt_sub_pixel_avg_variance now in SIMD.
      
      Change-Id: Ib41ef13d62ae91a0ca481bcebb24568dcd4722c4
      d286443c
  10. 10 Nov, 2017 1 commit
    • Urvang Joshi's avatar
      Remove smooth_hv experiment flag. · b7301cd6
      Urvang Joshi authored
      This experiment has been cleared by Tapas.
      
      Also, fix a couple of hash signatures in the test while we are at it.
      
      Change-Id: I1658bcb07913cf8bd47cfffadd729e16d5c55fc3
      b7301cd6
  11. 06 Nov, 2017 2 commits
    • Cheng Chen's avatar
      JNT_COMP: add SIMD implementations for c functions · ef34fff7
      Cheng Chen authored
      Add SIMD implementations for c functions for low bit-depth, making
      encoder speed faster by 3~4x than c functions.
      
      Change-Id: Icca0b07b25489759be9504aaec09d1239076fc52
      ef34fff7
    • Cheng Chen's avatar
      JNT_COMP: Refactor code · f78632e0
      Cheng Chen authored
      The refactoring serves two purposes:
      1. Separate code paths for jnt_comp and original compound average
      computation. It provides function interface for jnt_comp while leaving
      original compound average computation unchanged. In near future, SIMD
      functions can be added for jnt_comp using the interface.
      
      2. Previous implementation uses a hack on second_pred. But it may cause
      segmentation fault when the test clip is small. As reported in Issue
      944. This refactoring removes hacking and make it possible to address
      the seg fault problem in the future.
      
      Change-Id: Idd2cb99f6c77dae03d32ccfa1f9cbed1d7eed067
      f78632e0
  12. 31 Oct, 2017 1 commit
  13. 21 Oct, 2017 1 commit
  14. 20 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd D207E/D63E/D45E intrapred x86 optimization · ae676953
      Yi Luo authored
      D207E
      Predictor  SSE2 vs C
      4x4        ~2.6X
      4x8        ~2.5X
      8x4        ~8.0X
      8x8        ~9.1X
      8x16       ~11.7X
      16x8       ~16.9X
      16x16      ~17.3X
      16x32      ~17.2X
      32x16      ~30.2X
      32x32      ~35.5X
      
      D63E
      Predictor  SSE2 vs C
      4x4        ~4.7X
      4x8        ~4.9X
      8x4        ~7.8X
      8x8        ~8.9X
      8x16       ~9.3X
      16x8       ~15.7X
      16x16      ~14.7X
      16x32      ~17.3X
      32x16      ~18.0X
      32x32      ~15.7X
      
      D45E
      Predictor  SSSE3 vs C
      4x4        ~1.8X
      4x8        ~2.9X
      8x4        ~6.7X
      8x8        ~6.5X
      8x16       ~7.4X
      16x8       ~24.4X
      16x16      ~21.5X
      16x32      ~24.2X
      32x16      ~25.4X
      32x32      ~25.2X
      
      Change-Id: I8215de190e2b6314272749761600e389d1ca0fdf
      ae676953
  15. 16 Oct, 2017 1 commit
    • Yi Luo's avatar
      Highbd D207E/D63E intrapred sse2/avx2 optimization · 0b7127b3
      Yi Luo authored
      D207E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~2.7x
      4x8       ~3.0x
      8x4       ~7.2x
      8x8       ~8.5x
      8x16      ~9.4x
      16x8      ~12.8x
      16x16     ~13.0x
      16x32     ~14.3x
      32x16                 ~19.9x
      32x32                 ~23.6x
      
      D63E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~3.8x
      4x8       ~4.3x
      8x4       ~6.4x
      8x8       ~6.8x
      8x16      ~8.6x
      16x8                  ~9.0x
      16x16                 ~9.6x
      16x32                 ~10.3x
      32x16                 ~9.1x
      32x32                 ~11.0x
      
      Change-Id: I87373804c9d53276bf4d7788c4ae0d13d01c00dc
      0b7127b3
  16. 10 Oct, 2017 2 commits
    • Yi Luo's avatar
      Highbd D45E intrapred SSE2/AVX2 speedup · 56ad3dd3
      Yi Luo authored
      Function  SSE2 vs C  AVX2 vs C
      4x4       ~4.5x
      4x8       ~4.5x
      8x4       ~11.7x
      8x8       ~12.7x
      8x16      ~14.0x
      16x8                 ~21.7x
      16x16                ~24.0x
      16x32                ~28.7x
      32x16                ~20.5x
      32x32                ~24.4x
      
      Change-Id: Iaca49727d8df17b7f793b774a8d51a401ef8a8d1
      56ad3dd3
    • Yi Luo's avatar
      Migrate some vp9 highbd intrapred x86 speedup to av1 · 71b6e043
      Yi Luo authored
      Function speedup on i7-6700:
      D117   sse2   ssse3
      4x4    ~1.8x
      8x8           ~3.4x
      16x16         ~5.5x
      32x32         ~2.9x
      
      D135   sse2   ssse3
      4x4    ~1.9
      8x8           ~3.3x
      16x16         ~5.3x
      32x32         ~3.6x
      
      D153   sse2   ssse3
      4x4    ~1.9x
      8x8           ~2.8x
      16x16         ~5.5x
      32x32         ~3.6x
      
      Change-Id: I43ab5fa8dcbcfa51acbde554abf3e5d7d336f391
      71b6e043
  17. 06 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd SMOOTH_PRED intrapred ssse3 optimization · 46ae1ea3
      Yi Luo authored
      On i7-6700:
      Predictor    ssse3 v. C
      4x4          ~1.3x
      4x8          ~1.9x
      8x4          ~2.3x
      8x8          ~3.4x
      8x16         ~4.1x
      16x8         ~4.6x
      16x16        ~5.2x
      16x32        ~5.6x
      32x16        ~4.2x
      32x32        ~4.7x
      
      Change-Id: Ic12383cf9d4446361d6355eb8a480a3c7602060e
      46ae1ea3
  18. 04 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd TM_PRED intra pred avx2 optimization · 237cf1b2
      Yi Luo authored
      For block width >= 16, avx2 can further speedup the
      TM_PREM intra prediction.
      
      Function speedup on i7-6700:
      Predictor  avx2 v. ssse3
      16x8       ~1.6x
      16x16      ~1.8x
      16x32      ~1.9x
      32x16      ~1.9x
      32x32      ~1.9x
      
      Change-Id: I62c20bd7628f52251b0c051b99a9b738ee44f7e6
      237cf1b2
  19. 02 Oct, 2017 2 commits
  20. 29 Sep, 2017 3 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
      a0f66fc0
    • Rupert Swarbrick's avatar
      Add 32x128/128x32 block sizes · 2fa6e1ce
      Rupert Swarbrick authored
      Change-Id: Ieb28f40d85e4db4af33648c32c406dd2931ceb89
      2fa6e1ce
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
      23c61903
  21. 28 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd rectangle V/H intra pred sse2 optimization · 0c0fd1e5
      Yi Luo authored
      Function speedup sse2 v. C
      Predictor  V_PRED  H_PRED
      4x8        ~1.7x   ~1.8x
      8x4        ~1.8x   ~2.2x
      8x16       ~1.5x   ~1.4x
      16x8       ~1.9x   ~1.3x
      16x32      ~1.6x   ~1.4x
      32x16      ~2.0x   ~1.9x
      
      This patch disables speed tests to save Jenkins build
      time. Developer can manually enable them by using,
      --gtest_also_run_disabled_test flag in test command line.
      
      Change-Id: I81eaee5e8afc55275c7507c99774f78cc9e49f9a
      0c0fd1e5
  22. 27 Sep, 2017 2 commits
    • James Zern's avatar
      cosmetics,*rtcd*.pl: reindent · 1512fa97
      James Zern authored
      Change-Id: I612517c6218c561ee94888c8c14298964851484a
      1512fa97
    • Yi Luo's avatar
      Lowbd rect intrapred DC/LEFT/TOP/128 sse2 optimization · 39bdf36a
      Yi Luo authored
      Add lowbd unit test functionality to intrapred_test.cc
      Function speedup against C (i7-6700):
      Predictor   DC     LEFT   TOP    128
      4x8        ~1.4x  ~1.4x  ~1.7x  ~1.9x
      8x4        ~1.2x  ~1.6x  ~1.6x  ~2.6x
      8x16       ~1.4x  ~1.3x  ~1.4x  ~2.1x
      16x8       ~2.0x  ~1.8x  ~2.3x  ~2.1x
      16x32      ~2.0x  ~1.9x  ~1.8x  ~2.2x
      32x16      ~2.0x  ~2.0x  ~1.9x  ~2.2x
      
      Change-Id: I33db512020ca3c6853a9205a8079f3d00134f584
      39bdf36a
  23. 22 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd rectangle intrapred V/DC sse2 optimization · bdddf33a
      Yi Luo authored
      Function speedup (i7-6700),  sse2 verse C:
      Predictor      V_PRED    DC_PRED
      4x8            ~1.5x     ~4.9x
      8x4            ~2.5x     ~4.8x
      8x16           ~1.9x     ~9.1x
      16x8           ~1.9x     ~4.4x
      16x32          ~2.1x     ~5.8x
      32x16          ~2.0x     ~3.6x
      
      Change-Id: I6deffd0637e57ee5d0bd533502f5705148c4cdd4
      bdddf33a
  24. 19 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd intrapred DC_LEFT/TOP/128 sse2 optimization · bbf6186e
      Yi Luo authored
      Also extend intra pred speed test to rectangular block.
      Speedup (i7-6700)
      predictor      sse2 v. C
      left 4x4       ~5.6x
      top  4x4       ~7.2x
      128  4x4       ~6.9x
      left 4x8       ~7.7x
      top  4x8       ~10.1x
      128  4x8       ~10.0x
      
      left 8x4       ~8.1x
      top  8x4       ~9.1x
      128  8x4       ~10.1x
      left 8x8       ~10.3x
      top  8x8       ~13.6x
      128  8x8       ~14.8x
      left 8x16      ~12.6x
      top  8x16      ~14.0x
      128  8x16      ~15.5x
      
      left 16x8      ~6.3x
      top  16x8      ~7.0x
      128  16x8      ~6.5x
      left 16x16     ~6.5x
      top  16x16     ~7.1x
      128  16x16     ~8.2x
      left 16x32     ~5.1x
      top  16x32     ~6.4x
      128  16x32     ~5.6x
      
      left 32x16     ~4.2x
      top  32x16     ~4.3x
      128  32x16     ~4.5x
      left 32x32     ~3.8x
      top  32x32     ~3.7x
      128  32x32     ~3.9x
      
      Change-Id: Ie7fcc85b9ded3030ee904623c40e9edeec1695ae
      bbf6186e
  25. 18 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd intra pred H_PRED sse2 optimization · 23b9b317
      Yi Luo authored
      sse2 v. C speedup:
      4x4   ~8.0x
      8x8   ~8.2x
      16x16 ~6.5x
      32x32 ~3.8x
      Blocksize:
      4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32
      Square blocksize code is from libvpx:
      "30d9a1916 vpxdsp: [x86] add highbd_h_predictor functions",
      Credit goes to Scott LaVarnway. Speed tests do not support
      rectangular blocksize yet.
      
      Change-Id: I9a1f24aecab8de94f8ea59ec8748fe3537d721ae
      23b9b317
  26. 07 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd parallel_deblocking sse2 optimization · ea8a0d52
      Yi Luo authored
      Baseline + parallel_deblocking:
      
      - Passed unit tests *SSE2/Loop8Test6*, *AVX2/Loop8Test6*.
      - 1080p, 25 frames, profile=0, encoding/decoding, output match.
      - Decoder frame rate increases from 54.15 to 65.84.
      
      Change-Id: I55938c94961066594f4b9080192c7268c19d9bf9
      ea8a0d52
  27. 15 Aug, 2017 2 commits
    • Ralph Giles's avatar
      aom_dsp: regularize EXT_PARTITION_TYPES handling. · ccfdfce1
      Ralph Giles authored
      aom_dsp_rtcd_defs.pl compares most CONFIG_* keys to "yes"
      to see if they're set. The script was checking just
      
        if (aom_config("CONFIG_EXT_PARTITION_TYPES"))
      
      in some cases. The build system doesn't add disabled
      configuration options to libs.mk so this is effectively
      the same, however it means that setting the config
      key explicitly to 0 or "no" in the config headers
      was treated the same as setting it to 1 or "yes",
      and aom_dsp_rtcd.h would have opposite expections
      from aom_config.h or aom_config.asm.
      
      Treat this key similarly to others for consistency.
      
      Change-Id: I27bd7a5532ba4afc2bb289b43b57a1b1971c0348
      ccfdfce1
    • Urvang Joshi's avatar
      Remove ALT_INTRA flag. · 93b543ab
      Urvang Joshi authored
      This experiment has been adopted as it has been cleared by Tapas.
      
      Change-Id: I0682face60f62dd43091efa0a92d09d846396850
      93b543ab
  28. 10 Aug, 2017 1 commit
    • Yi Luo's avatar
      Highbd loop filter AVX2 · 6ae0054c
      Yi Luo authored
      - Speed test (ms) on i7-6700, Linux x86_64
        FUNCTION             SSE2    AVX2
        horizontal_edge_16   55      28
        vertical_16_dual     84      47
        horizontal_4_dual    27      13
        horizontal_8_dual    36      15
        vertical_4_dual      38      25
        vertical_8_dual      44      27
      - Decoder frame rate improves around 1.2% - 2.8%.
      
      Change-Id: I9c4123869bac9b6d32e626173c2a8e7eb0cf49e7
      6ae0054c
  29. 08 Aug, 2017 1 commit
    • Thomas Davies's avatar
      Refactor quantization C code. · f3b5ee14
      Thomas Davies authored
      This commit de-duplicates C reference quantization code
      and unifies quantization matrix (QM) and non-QM code
      paths when there is no SIMD.
      
      The reorganisation also will facilitate re-using SIMD quant
      functions for QM when the matrix is flat, as is the
      default when AOM_QM is enabled.
      
      Change-Id: Idbfdac9eb9a31adcffe734aac1877d58b86fab77
      f3b5ee14
  30. 04 Aug, 2017 1 commit
  31. 21 Jul, 2017 1 commit
    • Angie Chiang's avatar
      Integrate convolve_round with compound_segment · 7b517095
      Angie Chiang authored
      This integration only covers low bitdepth mode for now
      
      The performance of Convolve_round on top of compound_segment
      revives from 0.475% to 0.612% on lowres
      
      Change-Id: I21606c79d0a22c0834966730358267c082d8071e
      7b517095
  32. 12 Jul, 2017 1 commit
    • Rupert Swarbrick's avatar
      ext-partition-types: Add 4:1 partitions · 93c39e91
      Rupert Swarbrick authored
      This patch adds support for 4:1 rectangular blocks to various common
      data arrays, and adds new partition types to the EXT_PARTITION_TYPES
      experiment which will use them.
      
      This patch has the following restrictions, which can be lifted in
      future patches:
      
        * ext-partition-types is incompatible with fp_mb_stats and supertx
          for the moment
      
        * Currently only 32x32 superblocks can use the new partition types
      
      There's a slightly odd restriction about when we allow
      PARTITION_HORZ_4 or PARTITION_VERT_4. Since these both live in the
      EXT_PARTITION_TYPES CDF, read_partition() can only return them if both
      has_rows and has_cols is true. This means that at least half of the
      width and height of the block must be visible. It might be nice to
      relax that restriction but that would imply a change to how we encode
      partition types, which seems already to be in a state of flux, so
      maybe it's better to wait until that has settled down.
      
      Change-Id: Id7fc3fd0f762f35f63b3d3e3bf4e07c245c7b4fa
      93c39e91