1. 06 Oct, 2017 3 commits
  2. 05 Oct, 2017 3 commits
  3. 04 Oct, 2017 2 commits
    • Jingning Han's avatar
      Experiment probability precision for lv-map coding · 94cea4ac
      Jingning Han authored
      Experiment probability precision for binary coding in the lv-map
      coding system.
      
      Change-Id: I8d9c49eee6dc7ca7970390fa5febe25b80bfab3c
      94cea4ac
    • Yi Luo's avatar
      Lowbd TM_PRED intra pred avx2 optimization · 237cf1b2
      Yi Luo authored
      For block width >= 16, avx2 can further speedup the
      TM_PREM intra prediction.
      
      Function speedup on i7-6700:
      Predictor  avx2 v. ssse3
      16x8       ~1.6x
      16x16      ~1.8x
      16x32      ~1.9x
      32x16      ~1.9x
      32x32      ~1.9x
      
      Change-Id: I62c20bd7628f52251b0c051b99a9b738ee44f7e6
      237cf1b2
  4. 03 Oct, 2017 1 commit
  5. 02 Oct, 2017 3 commits
  6. 01 Oct, 2017 1 commit
  7. 29 Sep, 2017 5 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
      a0f66fc0
    • Angie Chiang's avatar
      Generate scan order one frame earlier · fabbd7eb
      Angie Chiang authored
      This should relief the concern of latency incurred by generating
      scan order
      
      The performance on lowres and midres remains neutral
      
      Change-Id: If155f055540126ee834f5be1ab4b23013090ee89
      fabbd7eb
    • Yaowu Xu's avatar
      Add clamp32u() function for uint32_t · 63e8db53
      Yaowu Xu authored
      replace clamp64() with clamp32u() where applicable
      
      Change-Id: I3fc97d576b3235eeda5d26a6a9692b5e51e016f3
      63e8db53
    • Rupert Swarbrick's avatar
      Add 32x128/128x32 block sizes · 2fa6e1ce
      Rupert Swarbrick authored
      Change-Id: Ieb28f40d85e4db4af33648c32c406dd2931ceb89
      2fa6e1ce
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
      23c61903
  8. 28 Sep, 2017 3 commits
    • Yi Luo's avatar
      Lowbd rectangle V/H intra pred sse2 optimization · 0c0fd1e5
      Yi Luo authored
      Function speedup sse2 v. C
      Predictor  V_PRED  H_PRED
      4x8        ~1.7x   ~1.8x
      8x4        ~1.8x   ~2.2x
      8x16       ~1.5x   ~1.4x
      16x8       ~1.9x   ~1.3x
      16x32      ~1.6x   ~1.4x
      32x16      ~2.0x   ~1.9x
      
      This patch disables speed tests to save Jenkins build
      time. Developer can manually enable them by using,
      --gtest_also_run_disabled_test flag in test command line.
      
      Change-Id: I81eaee5e8afc55275c7507c99774f78cc9e49f9a
      0c0fd1e5
    • Rupert Swarbrick's avatar
      Make yv12_buffer_config more uniform · 82529d22
      Rupert Swarbrick authored
      This patch slightly reorders the fields in yv12_buffer_config and then
      uses anonymous unions in order to make it possible to write code that
      iterates uniformly over planes.
      
      The patch also ports some code (mostly in yv12extend.c and
      aom_scale.c) to show how this can make things more concise.
      
      This should make no difference to the coded results. I think it's
      unlikely to have any significant performance impact (the reordered
      fields in a yv12_buffer_config only come to 17*4 = 68 bytes in total,
      so almost fit in a normal sized cache line).
      
      Change-Id: Iebb46344500b9df82915f34cfd193e189d712062
      82529d22
    • Ola Hugosson's avatar
      Add deblock_13tap experiment · 4ce85214
      Ola Hugosson authored
      This change enables using 13 taps for luma plane deblocking and 5 taps for
      chroma plane deblocking when pixels are in flat area.
      
      The aim for the experiment is to make sure that luma line 57 and chroma
      line 29 of the current superblock is not changed by the deblocking process
      of the superblock below. Previously this was already the case for luma
      line 56 and chroma line 28 (but not for 57 and 29).
      
      This experiment is part of an effort to reduce the overall line buffer
      size for DEBLOCK+CDEF+LR. With this change it is possible to CDEF line
      -8 to +55 direcly on the output of deblock (which require line +56 and
      +57 to be final).
      
      Change-Id: I7779a08d6ad5683bf35c3372b1526786eaac8472
      4ce85214
  9. 27 Sep, 2017 2 commits
    • James Zern's avatar
      cosmetics,*rtcd*.pl: reindent · 1512fa97
      James Zern authored
      Change-Id: I612517c6218c561ee94888c8c14298964851484a
      1512fa97
    • Yi Luo's avatar
      Lowbd rect intrapred DC/LEFT/TOP/128 sse2 optimization · 39bdf36a
      Yi Luo authored
      Add lowbd unit test functionality to intrapred_test.cc
      Function speedup against C (i7-6700):
      Predictor   DC     LEFT   TOP    128
      4x8        ~1.4x  ~1.4x  ~1.7x  ~1.9x
      8x4        ~1.2x  ~1.6x  ~1.6x  ~2.6x
      8x16       ~1.4x  ~1.3x  ~1.4x  ~2.1x
      16x8       ~2.0x  ~1.8x  ~2.3x  ~2.1x
      16x32      ~2.0x  ~1.9x  ~1.8x  ~2.2x
      32x16      ~2.0x  ~2.0x  ~1.9x  ~2.2x
      
      Change-Id: I33db512020ca3c6853a9205a8079f3d00134f584
      39bdf36a
  10. 22 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd rectangle intrapred V/DC sse2 optimization · bdddf33a
      Yi Luo authored
      Function speedup (i7-6700),  sse2 verse C:
      Predictor      V_PRED    DC_PRED
      4x8            ~1.5x     ~4.9x
      8x4            ~2.5x     ~4.8x
      8x16           ~1.9x     ~9.1x
      16x8           ~1.9x     ~4.4x
      16x32          ~2.1x     ~5.8x
      32x16          ~2.0x     ~3.6x
      
      Change-Id: I6deffd0637e57ee5d0bd533502f5705148c4cdd4
      bdddf33a
  11. 20 Sep, 2017 1 commit
    • Jingning Han's avatar
      Customize prob model control for lv-map · b3c189b9
      Jingning Han authored
      Make the probability model update system better customized for the
      level map coding scheme. This improves the level map coding
      performance by 0.2% for lowres and 0.1% for midres.
      
      Change-Id: Ib6d3abb36d50ff7485c4ceb411fe94e8fb060416
      b3c189b9
  12. 19 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd intrapred DC_LEFT/TOP/128 sse2 optimization · bbf6186e
      Yi Luo authored
      Also extend intra pred speed test to rectangular block.
      Speedup (i7-6700)
      predictor      sse2 v. C
      left 4x4       ~5.6x
      top  4x4       ~7.2x
      128  4x4       ~6.9x
      left 4x8       ~7.7x
      top  4x8       ~10.1x
      128  4x8       ~10.0x
      
      left 8x4       ~8.1x
      top  8x4       ~9.1x
      128  8x4       ~10.1x
      left 8x8       ~10.3x
      top  8x8       ~13.6x
      128  8x8       ~14.8x
      left 8x16      ~12.6x
      top  8x16      ~14.0x
      128  8x16      ~15.5x
      
      left 16x8      ~6.3x
      top  16x8      ~7.0x
      128  16x8      ~6.5x
      left 16x16     ~6.5x
      top  16x16     ~7.1x
      128  16x16     ~8.2x
      left 16x32     ~5.1x
      top  16x32     ~6.4x
      128  16x32     ~5.6x
      
      left 32x16     ~4.2x
      top  32x16     ~4.3x
      128  32x16     ~4.5x
      left 32x32     ~3.8x
      top  32x32     ~3.7x
      128  32x32     ~3.9x
      
      Change-Id: Ie7fcc85b9ded3030ee904623c40e9edeec1695ae
      bbf6186e
  13. 18 Sep, 2017 9 commits
  14. 16 Sep, 2017 1 commit
  15. 11 Sep, 2017 1 commit
    • Sarah Parker's avatar
      Tokenize and write mrc mask · 99e7daa2
      Sarah Parker authored
      This allows a mask for mrc-tx to be sent in the bitstream for
      inter or intra 32x32 transform blocks. The option to send the mask
      vs build it from the prediction signal is currently controlled with
      a macro. In the future, it is likely the macro will be removed and it
      will be possible for a block to select either method. The mask building
      functions are still placeholders and will be filled in in a followup.
      
      Change-Id: Ie27643ff172cc2b1a9b389fd503fe6bf7c9e21e3
      99e7daa2
  16. 10 Sep, 2017 1 commit
    • Jingning Han's avatar
      Rework base range entropy coding in level map system · 87b01b5a
      Jingning Han authored
      Replace the truncated geometric distribution model with the grouped
      leaves structure for more efficient probability modeling.
      Each group has its own Geometric distribution
      
      This give us 0.2% gain on lowres
      
      Change-Id: If5c73dd429bd5183a8aa81042f8f56937b1d8a6a
      87b01b5a
  17. 09 Sep, 2017 1 commit
  18. 07 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd parallel_deblocking sse2 optimization · ea8a0d52
      Yi Luo authored
      Baseline + parallel_deblocking:
      
      - Passed unit tests *SSE2/Loop8Test6*, *AVX2/Loop8Test6*.
      - 1080p, 25 frames, profile=0, encoding/decoding, output match.
      - Decoder frame rate increases from 54.15 to 65.84.
      
      Change-Id: I55938c94961066594f4b9080192c7268c19d9bf9
      ea8a0d52