1. 23 Oct, 2017 1 commit
  2. 21 Oct, 2017 2 commits
  3. 20 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd D207E/D63E/D45E intrapred x86 optimization · ae676953
      Yi Luo authored
      Predictor  SSE2 vs C
      4x4        ~2.6X
      4x8        ~2.5X
      8x4        ~8.0X
      8x8        ~9.1X
      8x16       ~11.7X
      16x8       ~16.9X
      16x16      ~17.3X
      16x32      ~17.2X
      32x16      ~30.2X
      32x32      ~35.5X
      Predictor  SSE2 vs C
      4x4        ~4.7X
      4x8        ~4.9X
      8x4        ~7.8X
      8x8        ~8.9X
      8x16       ~9.3X
      16x8       ~15.7X
      16x16      ~14.7X
      16x32      ~17.3X
      32x16      ~18.0X
      32x32      ~15.7X
      Predictor  SSSE3 vs C
      4x4        ~1.8X
      4x8        ~2.9X
      8x4        ~6.7X
      8x8        ~6.5X
      8x16       ~7.4X
      16x8       ~24.4X
      16x16      ~21.5X
      16x32      ~24.2X
      32x16      ~25.4X
      32x32      ~25.2X
      Change-Id: I8215de190e2b6314272749761600e389d1ca0fdf
  4. 19 Oct, 2017 1 commit
  5. 18 Oct, 2017 1 commit
    • Yaowu Xu's avatar
      Use proper inttypes for varaiance computations · 9f78e85b
      Yaowu Xu authored
      This commit correct the integer types used in variance functions. It
      now uses same integer type when number of pixels are same, e.g
      16x64 and 64x16 use same integer types as 32x32
      8x32 and 32x8 use same integer types as 16x16
      Change-Id: I1a54ba8d73e09126e680ae5af3ee52395a41df41
  6. 16 Oct, 2017 3 commits
    • Yaowu Xu's avatar
      Align more restoration work buffers · 15269e6e
      Yaowu Xu authored
      Fixes crashes on x86-win32-vs14 build
      Change-Id: I045dd0fe4e9af3bfb80223e291617b717cbcb231
    • Yi Luo's avatar
      Highbd D207E/D63E intrapred sse2/avx2 optimization · 0b7127b3
      Yi Luo authored
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~2.7x
      4x8       ~3.0x
      8x4       ~7.2x
      8x8       ~8.5x
      8x16      ~9.4x
      16x8      ~12.8x
      16x16     ~13.0x
      16x32     ~14.3x
      32x16                 ~19.9x
      32x32                 ~23.6x
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~3.8x
      4x8       ~4.3x
      8x4       ~6.4x
      8x8       ~6.8x
      8x16      ~8.6x
      16x8                  ~9.0x
      16x16                 ~9.6x
      16x32                 ~10.3x
      32x16                 ~9.1x
      32x32                 ~11.0x
      Change-Id: I87373804c9d53276bf4d7788c4ae0d13d01c00dc
    • Sebastien Alaiwan's avatar
      Clamp inverse transform coefficients · 29504172
      Sebastien Alaiwan authored
      When --enable-coefficient-range-checking isn't specified, clamp the
      coefficients at each stage.
      This doesn't change the decoder behavior for existing AV1 streams.
      However, some AV1 bitstreams that would have been rejected by the
      decoder as illegal (range check failure) are now legal bitstreams.
      There is no impact on video quality.
      Change-Id: Ibcf1683e5c2ae9f91a7f37b468c4bc72e98e22fa
  7. 14 Oct, 2017 1 commit
  8. 13 Oct, 2017 1 commit
    • Jingning Han's avatar
      Use 7-bit precision for level-map probability model · 1c077a40
      Jingning Han authored
      Support higher hardware throughput. The coding performance loss
      as compared to 15-bit precision is 0.05% for lowres. The loss is
      smaller as frame size goes up.
      Change-Id: I2e22b156f3178cf63689df306e9da13e0e4d205b
  9. 12 Oct, 2017 2 commits
  10. 11 Oct, 2017 2 commits
  11. 10 Oct, 2017 3 commits
    • Yi Luo's avatar
      Highbd D45E intrapred SSE2/AVX2 speedup · 56ad3dd3
      Yi Luo authored
      Function  SSE2 vs C  AVX2 vs C
      4x4       ~4.5x
      4x8       ~4.5x
      8x4       ~11.7x
      8x8       ~12.7x
      8x16      ~14.0x
      16x8                 ~21.7x
      16x16                ~24.0x
      16x32                ~28.7x
      32x16                ~20.5x
      32x32                ~24.4x
      Change-Id: Iaca49727d8df17b7f793b774a8d51a401ef8a8d1
    • Lester Lu's avatar
      lgt-from-pred: transforms based on prediction · 432012f6
      Lester Lu authored
      In this experiment, sharp image discontinuity in the predicted
      block is detected. Based on this discontinuity, we choose
      particular LGTs as row and column transforms.
      Bitstream syntax, entropy coding, and RD search for LGT are added.
      One binary symbol is used to signal whether LGT is used. This
      experiment can work independently with the lgt experiment.
      lowres: -0.414% for key frames, -0.151% overall
      midres: -0.413% for key frames, -0.161% overall
      Change-Id: Iaa2f2c2839c34ca4134fa55e77870dc3f1fa879f
    • Yi Luo's avatar
      Migrate some vp9 highbd intrapred x86 speedup to av1 · 71b6e043
      Yi Luo authored
      Function speedup on i7-6700:
      D117   sse2   ssse3
      4x4    ~1.8x
      8x8           ~3.4x
      16x16         ~5.5x
      32x32         ~2.9x
      D135   sse2   ssse3
      4x4    ~1.9
      8x8           ~3.3x
      16x16         ~5.3x
      32x32         ~3.6x
      D153   sse2   ssse3
      4x4    ~1.9x
      8x8           ~2.8x
      16x16         ~5.5x
      32x32         ~3.6x
      Change-Id: I43ab5fa8dcbcfa51acbde554abf3e5d7d336f391
  12. 08 Oct, 2017 1 commit
  13. 06 Oct, 2017 3 commits
  14. 05 Oct, 2017 3 commits
  15. 04 Oct, 2017 2 commits
    • Jingning Han's avatar
      Experiment probability precision for lv-map coding · 94cea4ac
      Jingning Han authored
      Experiment probability precision for binary coding in the lv-map
      coding system.
      Change-Id: I8d9c49eee6dc7ca7970390fa5febe25b80bfab3c
    • Yi Luo's avatar
      Lowbd TM_PRED intra pred avx2 optimization · 237cf1b2
      Yi Luo authored
      For block width >= 16, avx2 can further speedup the
      TM_PREM intra prediction.
      Function speedup on i7-6700:
      Predictor  avx2 v. ssse3
      16x8       ~1.6x
      16x16      ~1.8x
      16x32      ~1.9x
      32x16      ~1.9x
      32x32      ~1.9x
      Change-Id: I62c20bd7628f52251b0c051b99a9b738ee44f7e6
  16. 03 Oct, 2017 1 commit
  17. 02 Oct, 2017 3 commits
  18. 01 Oct, 2017 1 commit
  19. 29 Sep, 2017 5 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
    • Angie Chiang's avatar
      Generate scan order one frame earlier · fabbd7eb
      Angie Chiang authored
      This should relief the concern of latency incurred by generating
      scan order
      The performance on lowres and midres remains neutral
      Change-Id: If155f055540126ee834f5be1ab4b23013090ee89
    • Yaowu Xu's avatar
      Add clamp32u() function for uint32_t · 63e8db53
      Yaowu Xu authored
      replace clamp64() with clamp32u() where applicable
      Change-Id: I3fc97d576b3235eeda5d26a6a9692b5e51e016f3
    • Rupert Swarbrick's avatar
      Add 32x128/128x32 block sizes · 2fa6e1ce
      Rupert Swarbrick authored
      Change-Id: Ieb28f40d85e4db4af33648c32c406dd2931ceb89
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
  20. 28 Sep, 2017 3 commits
    • Yi Luo's avatar
      Lowbd rectangle V/H intra pred sse2 optimization · 0c0fd1e5
      Yi Luo authored
      Function speedup sse2 v. C
      Predictor  V_PRED  H_PRED
      4x8        ~1.7x   ~1.8x
      8x4        ~1.8x   ~2.2x
      8x16       ~1.5x   ~1.4x
      16x8       ~1.9x   ~1.3x
      16x32      ~1.6x   ~1.4x
      32x16      ~2.0x   ~1.9x
      This patch disables speed tests to save Jenkins build
      time. Developer can manually enable them by using,
      --gtest_also_run_disabled_test flag in test command line.
      Change-Id: I81eaee5e8afc55275c7507c99774f78cc9e49f9a
    • Rupert Swarbrick's avatar
      Make yv12_buffer_config more uniform · 82529d22
      Rupert Swarbrick authored
      This patch slightly reorders the fields in yv12_buffer_config and then
      uses anonymous unions in order to make it possible to write code that
      iterates uniformly over planes.
      The patch also ports some code (mostly in yv12extend.c and
      aom_scale.c) to show how this can make things more concise.
      This should make no difference to the coded results. I think it's
      unlikely to have any significant performance impact (the reordered
      fields in a yv12_buffer_config only come to 17*4 = 68 bytes in total,
      so almost fit in a normal sized cache line).
      Change-Id: Iebb46344500b9df82915f34cfd193e189d712062
    • Ola Hugosson's avatar
      Add deblock_13tap experiment · 4ce85214
      Ola Hugosson authored
      This change enables using 13 taps for luma plane deblocking and 5 taps for
      chroma plane deblocking when pixels are in flat area.
      The aim for the experiment is to make sure that luma line 57 and chroma
      line 29 of the current superblock is not changed by the deblocking process
      of the superblock below. Previously this was already the case for luma
      line 56 and chroma line 28 (but not for 57 and 29).
      This experiment is part of an effort to reduce the overall line buffer
      size for DEBLOCK+CDEF+LR. With this change it is possible to CDEF line
      -8 to +55 direcly on the output of deblock (which require line +56 and
      +57 to be final).
      Change-Id: I7779a08d6ad5683bf35c3372b1526786eaac8472