1. 02 Oct, 2017 2 commits
  2. 01 Oct, 2017 1 commit
  3. 29 Sep, 2017 5 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
      a0f66fc0
    • Angie Chiang's avatar
      Generate scan order one frame earlier · fabbd7eb
      Angie Chiang authored
      This should relief the concern of latency incurred by generating
      scan order
      
      The performance on lowres and midres remains neutral
      
      Change-Id: If155f055540126ee834f5be1ab4b23013090ee89
      fabbd7eb
    • Yaowu Xu's avatar
      Add clamp32u() function for uint32_t · 63e8db53
      Yaowu Xu authored
      replace clamp64() with clamp32u() where applicable
      
      Change-Id: I3fc97d576b3235eeda5d26a6a9692b5e51e016f3
      63e8db53
    • Rupert Swarbrick's avatar
      Add 32x128/128x32 block sizes · 2fa6e1ce
      Rupert Swarbrick authored
      Change-Id: Ieb28f40d85e4db4af33648c32c406dd2931ceb89
      2fa6e1ce
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
      23c61903
  4. 28 Sep, 2017 3 commits
    • Yi Luo's avatar
      Lowbd rectangle V/H intra pred sse2 optimization · 0c0fd1e5
      Yi Luo authored
      Function speedup sse2 v. C
      Predictor  V_PRED  H_PRED
      4x8        ~1.7x   ~1.8x
      8x4        ~1.8x   ~2.2x
      8x16       ~1.5x   ~1.4x
      16x8       ~1.9x   ~1.3x
      16x32      ~1.6x   ~1.4x
      32x16      ~2.0x   ~1.9x
      
      This patch disables speed tests to save Jenkins build
      time. Developer can manually enable them by using,
      --gtest_also_run_disabled_test flag in test command line.
      
      Change-Id: I81eaee5e8afc55275c7507c99774f78cc9e49f9a
      0c0fd1e5
    • Rupert Swarbrick's avatar
      Make yv12_buffer_config more uniform · 82529d22
      Rupert Swarbrick authored
      This patch slightly reorders the fields in yv12_buffer_config and then
      uses anonymous unions in order to make it possible to write code that
      iterates uniformly over planes.
      
      The patch also ports some code (mostly in yv12extend.c and
      aom_scale.c) to show how this can make things more concise.
      
      This should make no difference to the coded results. I think it's
      unlikely to have any significant performance impact (the reordered
      fields in a yv12_buffer_config only come to 17*4 = 68 bytes in total,
      so almost fit in a normal sized cache line).
      
      Change-Id: Iebb46344500b9df82915f34cfd193e189d712062
      82529d22
    • Ola Hugosson's avatar
      Add deblock_13tap experiment · 4ce85214
      Ola Hugosson authored
      This change enables using 13 taps for luma plane deblocking and 5 taps for
      chroma plane deblocking when pixels are in flat area.
      
      The aim for the experiment is to make sure that luma line 57 and chroma
      line 29 of the current superblock is not changed by the deblocking process
      of the superblock below. Previously this was already the case for luma
      line 56 and chroma line 28 (but not for 57 and 29).
      
      This experiment is part of an effort to reduce the overall line buffer
      size for DEBLOCK+CDEF+LR. With this change it is possible to CDEF line
      -8 to +55 direcly on the output of deblock (which require line +56 and
      +57 to be final).
      
      Change-Id: I7779a08d6ad5683bf35c3372b1526786eaac8472
      4ce85214
  5. 27 Sep, 2017 2 commits
    • James Zern's avatar
      cosmetics,*rtcd*.pl: reindent · 1512fa97
      James Zern authored
      Change-Id: I612517c6218c561ee94888c8c14298964851484a
      1512fa97
    • Yi Luo's avatar
      Lowbd rect intrapred DC/LEFT/TOP/128 sse2 optimization · 39bdf36a
      Yi Luo authored
      Add lowbd unit test functionality to intrapred_test.cc
      Function speedup against C (i7-6700):
      Predictor   DC     LEFT   TOP    128
      4x8        ~1.4x  ~1.4x  ~1.7x  ~1.9x
      8x4        ~1.2x  ~1.6x  ~1.6x  ~2.6x
      8x16       ~1.4x  ~1.3x  ~1.4x  ~2.1x
      16x8       ~2.0x  ~1.8x  ~2.3x  ~2.1x
      16x32      ~2.0x  ~1.9x  ~1.8x  ~2.2x
      32x16      ~2.0x  ~2.0x  ~1.9x  ~2.2x
      
      Change-Id: I33db512020ca3c6853a9205a8079f3d00134f584
      39bdf36a
  6. 22 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd rectangle intrapred V/DC sse2 optimization · bdddf33a
      Yi Luo authored
      Function speedup (i7-6700),  sse2 verse C:
      Predictor      V_PRED    DC_PRED
      4x8            ~1.5x     ~4.9x
      8x4            ~2.5x     ~4.8x
      8x16           ~1.9x     ~9.1x
      16x8           ~1.9x     ~4.4x
      16x32          ~2.1x     ~5.8x
      32x16          ~2.0x     ~3.6x
      
      Change-Id: I6deffd0637e57ee5d0bd533502f5705148c4cdd4
      bdddf33a
  7. 20 Sep, 2017 1 commit
    • Jingning Han's avatar
      Customize prob model control for lv-map · b3c189b9
      Jingning Han authored
      Make the probability model update system better customized for the
      level map coding scheme. This improves the level map coding
      performance by 0.2% for lowres and 0.1% for midres.
      
      Change-Id: Ib6d3abb36d50ff7485c4ceb411fe94e8fb060416
      b3c189b9
  8. 19 Sep, 2017 1 commit
    • Yi Luo's avatar
      Highbd intrapred DC_LEFT/TOP/128 sse2 optimization · bbf6186e
      Yi Luo authored
      Also extend intra pred speed test to rectangular block.
      Speedup (i7-6700)
      predictor      sse2 v. C
      left 4x4       ~5.6x
      top  4x4       ~7.2x
      128  4x4       ~6.9x
      left 4x8       ~7.7x
      top  4x8       ~10.1x
      128  4x8       ~10.0x
      
      left 8x4       ~8.1x
      top  8x4       ~9.1x
      128  8x4       ~10.1x
      left 8x8       ~10.3x
      top  8x8       ~13.6x
      128  8x8       ~14.8x
      left 8x16      ~12.6x
      top  8x16      ~14.0x
      128  8x16      ~15.5x
      
      left 16x8      ~6.3x
      top  16x8      ~7.0x
      128  16x8      ~6.5x
      left 16x16     ~6.5x
      top  16x16     ~7.1x
      128  16x16     ~8.2x
      left 16x32     ~5.1x
      top  16x32     ~6.4x
      128  16x32     ~5.6x
      
      left 32x16     ~4.2x
      top  32x16     ~4.3x
      128  32x16     ~4.5x
      left 32x32     ~3.8x
      top  32x32     ~3.7x
      128  32x32     ~3.9x
      
      Change-Id: Ie7fcc85b9ded3030ee904623c40e9edeec1695ae
      bbf6186e
  9. 18 Sep, 2017 9 commits
  10. 16 Sep, 2017 1 commit
  11. 11 Sep, 2017 1 commit
    • Sarah Parker's avatar
      Tokenize and write mrc mask · 99e7daa2
      Sarah Parker authored
      This allows a mask for mrc-tx to be sent in the bitstream for
      inter or intra 32x32 transform blocks. The option to send the mask
      vs build it from the prediction signal is currently controlled with
      a macro. In the future, it is likely the macro will be removed and it
      will be possible for a block to select either method. The mask building
      functions are still placeholders and will be filled in in a followup.
      
      Change-Id: Ie27643ff172cc2b1a9b389fd503fe6bf7c9e21e3
      99e7daa2
  12. 10 Sep, 2017 1 commit
    • Jingning Han's avatar
      Rework base range entropy coding in level map system · 87b01b5a
      Jingning Han authored
      Replace the truncated geometric distribution model with the grouped
      leaves structure for more efficient probability modeling.
      Each group has its own Geometric distribution
      
      This give us 0.2% gain on lowres
      
      Change-Id: If5c73dd429bd5183a8aa81042f8f56937b1d8a6a
      87b01b5a
  13. 09 Sep, 2017 1 commit
  14. 07 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd parallel_deblocking sse2 optimization · ea8a0d52
      Yi Luo authored
      Baseline + parallel_deblocking:
      
      - Passed unit tests *SSE2/Loop8Test6*, *AVX2/Loop8Test6*.
      - 1080p, 25 frames, profile=0, encoding/decoding, output match.
      - Decoder frame rate increases from 54.15 to 65.84.
      
      Change-Id: I55938c94961066594f4b9080192c7268c19d9bf9
      ea8a0d52
  15. 06 Sep, 2017 1 commit
    • Sarah Parker's avatar
      Remove global motion from compressed header · 3e579a60
      Sarah Parker authored
      This requires making a temporary copy of the functions in
      binary_codes_writer/reader to take in the aom_write_bit_buffer type.
      
      Change-Id: Idb60b29cff69b45224535c6e6a4079a34a2c6871
      3e579a60
  16. 05 Sep, 2017 2 commits
    • Timothy B. Terriberry's avatar
      Remove the EC_SMALLMUL experimental flag. · f9ef4f6b
      Timothy B. Terriberry authored
      This experiment has been fully adopted and is now an integral part
      of the draft AV1 bitstream definition.
      
      objdump -d libaom.a gives identical output before and after this
      patch.
      
      Change-Id: I6f936f4b10de23a9471e0ccadf9cf178fb62be69
      f9ef4f6b
    • Rupert Swarbrick's avatar
      Define missing subtract_xxx functions in highbd_subtract_sse2.c · 4b5c2bb4
      Rupert Swarbrick authored
      Also, get rid of the boilerplate code using some macros. STACK_V(h,f) means
      "call f twice, stacking vertically at an offset of h". STACK_H(w,f)
      means "call f twice, stacking horizontally at an offset of w".
      
      Note that functions like subtract_128x64 are now only defined when the
      equivalent block sizes (e.g. BLOCK_128x64) are defined. As such, we
      have to fix up subtract_test.cc so it doesn't try to call
      aom_highbd_subtract_block_sse2 with unsupported sizes.
      
      BUG=aomedia:684
      
      Change-Id: I5b0fefe70e4083786d11d25cdd5dcf02823bae7b
      4b5c2bb4
  17. 30 Aug, 2017 1 commit
    • Yi Luo's avatar
      Highbd parallel_deblocking sse2 optimization · 6f5569f3
      Yi Luo authored
      - Decoder speed improves ~13.7% (baseline + parallel_deblocking).
      - Highbd loopfilter AVX2 version works when this experiment is
        disabled.
      
      Change-Id: I5d56b137a1d52236a4735656c370d57ef71ae043
      6f5569f3
  18. 22 Aug, 2017 2 commits
    • Lester Lu's avatar
      Refactor lgt · 918fe698
      Lester Lu authored
      Change get_lgt in order to integrate a later experiment
      lgt_from_pred with lgt. There are two main changes.
      
      The main purpose for this change is to unify get_fwd_lgt and
      get_inv_lgt functions into a get_lgt function so the lgt basis
      functions can always be selected through the same function in
      both forward and inverse transform paths. The structure of those
      functions will also be consistent with the get_lgt_from_pred
      functions that will be added in the lgt-from-pred experiment.
      
      These changes have no impact on the bitstream.
      
      Change-Id: Ifd3dfc1a9e1a250495830ddbf42c201e80aa913e
      918fe698
    • Jingning Han's avatar
      Initialize lv-map syntax probability model · fdaa55ed
      Jingning Han authored
      Initialize the cdf model for level map syntax elements.
      
      Change-Id: I3865e07c126eb4c856803c12485b05782dea6526
      fdaa55ed
  19. 15 Aug, 2017 4 commits
    • Monty Montgomery's avatar
      Add 4-point DST to DAALA_DCT4 experiment · 573cf25f
      Monty Montgomery authored
      CONFIG_DAALA_DCT4 currently force-enables CONFIG_DCT_ONLY due to a
      missing 4-point DST.  The DST had not been included because it was a
      significant coding performance loss; this turned out to be a bug that
      has since been corrected.
      
      This patch adds a 4-point type IV DST to the DAALA_DCT4 experiment.
      There is a small coding performance loss in using the type IV over
      AV1's current type VII.
      
      subset-1:
         monty-newdst4test-baseline-s1-F@2017-07-29T04:58:43.976Z ->
            monty-newdst4test-daala-s1-F@2017-07-29T04:59:56.094Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0336 |  0.1393 |  0.0491 |   0.4118 | -0.0439 |  0.2084 |     0.0476
      
      objective-1-fast:
         monty-newdst4test-baseline-o1f-F@2017-07-29T04:58:10.439Z ->
            monty-newdst4test-daala-o1f-F@2017-07-29T04:59:04.678Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0064 |  0.1071 | -0.0108 |   0.1133 | -0.0035 |  0.0765 |     0.0502
      
      Change-Id: Ie29835edbe0e41bc86f4b09457e88d924cc9bf7e
      573cf25f
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT64 experiment. · a4e245a9
      Monty Montgomery authored
      This experiment replaces the 64-point Type-II DCT and related
      scaling vp9 transforms with the 64-point orthonormal
      Daala transforms.
      
      subset-1:
      
          monty-square-baseline-s1-F2@2017-07-28T03:35:45.962Z ->
            monty-square-dct64-s1-F2@2017-07-29T04:50:58.412Z
      
             PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
          -0.1930 | -0.2037 | -0.0643 |  -0.1917 | -0.2331 | -0.3510 |    -0.1810
      
      objective-1-fast:
      
          monty-square-baseline-o1f-F2@2017-07-28T03:35:35.533Z ->
            monty-square-dct64-o1f-F2@2017-07-29T04:50:28.542Z
      
             PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
          -0.2557 | -0.1743 | -0.4900 |  -0.3028 | -0.4147 | -0.5764 |    -0.2864
      
      Change-Id: I1f944df29e44d2e350c42555af274f2d75a62a92
      a4e245a9
    • Ralph Giles's avatar
      aom_dsp: regularize EXT_PARTITION_TYPES handling. · ccfdfce1
      Ralph Giles authored
      aom_dsp_rtcd_defs.pl compares most CONFIG_* keys to "yes"
      to see if they're set. The script was checking just
      
        if (aom_config("CONFIG_EXT_PARTITION_TYPES"))
      
      in some cases. The build system doesn't add disabled
      configuration options to libs.mk so this is effectively
      the same, however it means that setting the config
      key explicitly to 0 or "no" in the config headers
      was treated the same as setting it to 1 or "yes",
      and aom_dsp_rtcd.h would have opposite expections
      from aom_config.h or aom_config.asm.
      
      Treat this key similarly to others for consistency.
      
      Change-Id: I27bd7a5532ba4afc2bb289b43b57a1b1971c0348
      ccfdfce1
    • Urvang Joshi's avatar
      Remove ALT_INTRA flag. · 93b543ab
      Urvang Joshi authored
      This experiment has been adopted as it has been cleared by Tapas.
      
      Change-Id: I0682face60f62dd43091efa0a92d09d846396850
      93b543ab