1. 23 Oct, 2017 1 commit
  2. 21 Oct, 2017 2 commits
  3. 20 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd D207E/D63E/D45E intrapred x86 optimization · ae676953
      Yi Luo authored
      D207E
      Predictor  SSE2 vs C
      4x4        ~2.6X
      4x8        ~2.5X
      8x4        ~8.0X
      8x8        ~9.1X
      8x16       ~11.7X
      16x8       ~16.9X
      16x16      ~17.3X
      16x32      ~17.2X
      32x16      ~30.2X
      32x32      ~35.5X
      
      D63E
      Predictor  SSE2 vs C
      4x4        ~4.7X
      4x8        ~4.9X
      8x4        ~7.8X
      8x8        ~8.9X
      8x16       ~9.3X
      16x8       ~15.7X
      16x16      ~14.7X
      16x32      ~17.3X
      32x16      ~18.0X
      32x32      ~15.7X
      
      D45E
      Predictor  SSSE3 vs C
      4x4        ~1.8X
      4x8        ~2.9X
      8x4        ~6.7X
      8x8        ~6.5X
      8x16       ~7.4X
      16x8       ~24.4X
      16x16      ~21.5X
      16x32      ~24.2X
      32x16      ~25.4X
      32x32      ~25.2X
      
      Change-Id: I8215de190e2b6314272749761600e389d1ca0fdf
      ae676953
  4. 19 Oct, 2017 1 commit
  5. 18 Oct, 2017 1 commit
    • Yaowu Xu's avatar
      Use proper inttypes for varaiance computations · 9f78e85b
      Yaowu Xu authored
      This commit correct the integer types used in variance functions. It
      now uses same integer type when number of pixels are same, e.g
      16x64 and 64x16 use same integer types as 32x32
      8x32 and 32x8 use same integer types as 16x16
      
      Change-Id: I1a54ba8d73e09126e680ae5af3ee52395a41df41
      9f78e85b
  6. 16 Oct, 2017 3 commits
    • Yaowu Xu's avatar
      Align more restoration work buffers · 15269e6e
      Yaowu Xu authored
      Fixes crashes on x86-win32-vs14 build
      
      Change-Id: I045dd0fe4e9af3bfb80223e291617b717cbcb231
      15269e6e
    • Yi Luo's avatar
      Highbd D207E/D63E intrapred sse2/avx2 optimization · 0b7127b3
      Yi Luo authored
      D207E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~2.7x
      4x8       ~3.0x
      8x4       ~7.2x
      8x8       ~8.5x
      8x16      ~9.4x
      16x8      ~12.8x
      16x16     ~13.0x
      16x32     ~14.3x
      32x16                 ~19.9x
      32x32                 ~23.6x
      
      D63E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~3.8x
      4x8       ~4.3x
      8x4       ~6.4x
      8x8       ~6.8x
      8x16      ~8.6x
      16x8                  ~9.0x
      16x16                 ~9.6x
      16x32                 ~10.3x
      32x16                 ~9.1x
      32x32                 ~11.0x
      
      Change-Id: I87373804c9d53276bf4d7788c4ae0d13d01c00dc
      0b7127b3
    • Sebastien Alaiwan's avatar
      Clamp inverse transform coefficients · 29504172
      Sebastien Alaiwan authored
      When --enable-coefficient-range-checking isn't specified, clamp the
      coefficients at each stage.
      
      This doesn't change the decoder behavior for existing AV1 streams.
      However, some AV1 bitstreams that would have been rejected by the
      decoder as illegal (range check failure) are now legal bitstreams.
      
      There is no impact on video quality.
      
      BUG=aomedia:30
      
      Change-Id: Ibcf1683e5c2ae9f91a7f37b468c4bc72e98e22fa
      29504172
  7. 14 Oct, 2017 1 commit
  8. 13 Oct, 2017 1 commit
    • Jingning Han's avatar
      Use 7-bit precision for level-map probability model · 1c077a40
      Jingning Han authored
      Support higher hardware throughput. The coding performance loss
      as compared to 15-bit precision is 0.05% for lowres. The loss is
      smaller as frame size goes up.
      
      Change-Id: I2e22b156f3178cf63689df306e9da13e0e4d205b
      1c077a40
  9. 12 Oct, 2017 2 commits
  10. 11 Oct, 2017 2 commits
  11. 10 Oct, 2017 3 commits
    • Yi Luo's avatar
      Highbd D45E intrapred SSE2/AVX2 speedup · 56ad3dd3
      Yi Luo authored
      Function  SSE2 vs C  AVX2 vs C
      4x4       ~4.5x
      4x8       ~4.5x
      8x4       ~11.7x
      8x8       ~12.7x
      8x16      ~14.0x
      16x8                 ~21.7x
      16x16                ~24.0x
      16x32                ~28.7x
      32x16                ~20.5x
      32x32                ~24.4x
      
      Change-Id: Iaca49727d8df17b7f793b774a8d51a401ef8a8d1
      56ad3dd3
    • Lester Lu's avatar
      lgt-from-pred: transforms based on prediction · 432012f6
      Lester Lu authored
      In this experiment, sharp image discontinuity in the predicted
      block is detected. Based on this discontinuity, we choose
      particular LGTs as row and column transforms.
      
      Bitstream syntax, entropy coding, and RD search for LGT are added.
      One binary symbol is used to signal whether LGT is used. This
      experiment can work independently with the lgt experiment.
      
      lowres: -0.414% for key frames, -0.151% overall
      midres: -0.413% for key frames, -0.161% overall
      
      Change-Id: Iaa2f2c2839c34ca4134fa55e77870dc3f1fa879f
      432012f6
    • Yi Luo's avatar
      Migrate some vp9 highbd intrapred x86 speedup to av1 · 71b6e043
      Yi Luo authored
      Function speedup on i7-6700:
      D117   sse2   ssse3
      4x4    ~1.8x
      8x8           ~3.4x
      16x16         ~5.5x
      32x32         ~2.9x
      
      D135   sse2   ssse3
      4x4    ~1.9
      8x8           ~3.3x
      16x16         ~5.3x
      32x32         ~3.6x
      
      D153   sse2   ssse3
      4x4    ~1.9x
      8x8           ~2.8x
      16x16         ~5.5x
      32x32         ~3.6x
      
      Change-Id: I43ab5fa8dcbcfa51acbde554abf3e5d7d336f391
      71b6e043
  12. 08 Oct, 2017 1 commit
  13. 06 Oct, 2017 3 commits
  14. 05 Oct, 2017 3 commits
  15. 04 Oct, 2017 2 commits
    • Jingning Han's avatar
      Experiment probability precision for lv-map coding · 94cea4ac
      Jingning Han authored
      Experiment probability precision for binary coding in the lv-map
      coding system.
      
      Change-Id: I8d9c49eee6dc7ca7970390fa5febe25b80bfab3c
      94cea4ac
    • Yi Luo's avatar
      Lowbd TM_PRED intra pred avx2 optimization · 237cf1b2
      Yi Luo authored
      For block width >= 16, avx2 can further speedup the
      TM_PREM intra prediction.
      
      Function speedup on i7-6700:
      Predictor  avx2 v. ssse3
      16x8       ~1.6x
      16x16      ~1.8x
      16x32      ~1.9x
      32x16      ~1.9x
      32x32      ~1.9x
      
      Change-Id: I62c20bd7628f52251b0c051b99a9b738ee44f7e6
      237cf1b2
  16. 03 Oct, 2017 1 commit
  17. 02 Oct, 2017 3 commits
  18. 01 Oct, 2017 1 commit
  19. 29 Sep, 2017 5 commits
    • Yi Luo's avatar
      Lowbd TM_PRED intrapred ssse3 optimization · a0f66fc0
      Yi Luo authored
      Function speedup (i7-6700)
      Predictor  ssse3 v. C
      4x4        ~2.1x
      4x8        ~2.4x
      8x4        ~4.1x
      8x8        ~5.4x
      8x16       ~6.1x
      16x8       ~5.9x
      16x16      ~6.4x
      16x32      ~6.7x
      32x16      ~7.4x
      32x32      ~8.0x
      
      Change-Id: I52b8ebf8193e76f4ea1137cbad5ad7fa109d86d8
      a0f66fc0
    • Angie Chiang's avatar
      Generate scan order one frame earlier · fabbd7eb
      Angie Chiang authored
      This should relief the concern of latency incurred by generating
      scan order
      
      The performance on lowres and midres remains neutral
      
      Change-Id: If155f055540126ee834f5be1ab4b23013090ee89
      fabbd7eb
    • Yaowu Xu's avatar
      Add clamp32u() function for uint32_t · 63e8db53
      Yaowu Xu authored
      replace clamp64() with clamp32u() where applicable
      
      Change-Id: I3fc97d576b3235eeda5d26a6a9692b5e51e016f3
      63e8db53
    • Rupert Swarbrick's avatar
      Add 32x128/128x32 block sizes · 2fa6e1ce
      Rupert Swarbrick authored
      Change-Id: Ieb28f40d85e4db4af33648c32c406dd2931ceb89
      2fa6e1ce
    • Yi Luo's avatar
      Lowbd intrapred DC/TOP/LEFT/128/V/H avx2 · 23c61903
      Yi Luo authored
      For prediction block width equal to 32, avx2 can further speedup
      the prediction function (i7-6700):
      
      32x32     avx2 v. sse2
      DC        ~1.4x
      top       ~1.5x
      left      ~1.4x
      128       ~1.5x
      v         ~1.6x
      h         ~1.2x
      
      32x16     avx2 v. sse2
      DC        ~2.2x
      top       ~1.7x
      left      ~1.6x
      128       ~1.8x
      v         ~1.9x
      
      Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.
      
      Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
      23c61903
  20. 28 Sep, 2017 3 commits
    • Yi Luo's avatar
      Lowbd rectangle V/H intra pred sse2 optimization · 0c0fd1e5
      Yi Luo authored
      Function speedup sse2 v. C
      Predictor  V_PRED  H_PRED
      4x8        ~1.7x   ~1.8x
      8x4        ~1.8x   ~2.2x
      8x16       ~1.5x   ~1.4x
      16x8       ~1.9x   ~1.3x
      16x32      ~1.6x   ~1.4x
      32x16      ~2.0x   ~1.9x
      
      This patch disables speed tests to save Jenkins build
      time. Developer can manually enable them by using,
      --gtest_also_run_disabled_test flag in test command line.
      
      Change-Id: I81eaee5e8afc55275c7507c99774f78cc9e49f9a
      0c0fd1e5
    • Rupert Swarbrick's avatar
      Make yv12_buffer_config more uniform · 82529d22
      Rupert Swarbrick authored
      This patch slightly reorders the fields in yv12_buffer_config and then
      uses anonymous unions in order to make it possible to write code that
      iterates uniformly over planes.
      
      The patch also ports some code (mostly in yv12extend.c and
      aom_scale.c) to show how this can make things more concise.
      
      This should make no difference to the coded results. I think it's
      unlikely to have any significant performance impact (the reordered
      fields in a yv12_buffer_config only come to 17*4 = 68 bytes in total,
      so almost fit in a normal sized cache line).
      
      Change-Id: Iebb46344500b9df82915f34cfd193e189d712062
      82529d22
    • Ola Hugosson's avatar
      Add deblock_13tap experiment · 4ce85214
      Ola Hugosson authored
      This change enables using 13 taps for luma plane deblocking and 5 taps for
      chroma plane deblocking when pixels are in flat area.
      
      The aim for the experiment is to make sure that luma line 57 and chroma
      line 29 of the current superblock is not changed by the deblocking process
      of the superblock below. Previously this was already the case for luma
      line 56 and chroma line 28 (but not for 57 and 29).
      
      This experiment is part of an effort to reduce the overall line buffer
      size for DEBLOCK+CDEF+LR. With this change it is possible to CDEF line
      -8 to +55 direcly on the output of deblock (which require line +56 and
      +57 to be final).
      
      Change-Id: I7779a08d6ad5683bf35c3372b1526786eaac8472
      4ce85214