1. 09 Nov, 2017 4 commits
  2. 08 Nov, 2017 1 commit
    • Thomas Daede's avatar
      Use AOM_CDF* macros instead of bare AOM_ICDF macros. · e82e5774
      Thomas Daede authored
      This will facilitate later experiments reducing the precision
      of probabilities.
      
      after_cdf_table_rewrite-3 -> before_cdf_table_rewrite-3
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: Ief01b4d7fdca075c41e9add079f7ac836dafcfbe
      e82e5774
  3. 06 Nov, 2017 2 commits
    • Cheng Chen's avatar
      JNT_COMP: add SIMD implementations for c functions · ef34fff7
      Cheng Chen authored
      Add SIMD implementations for c functions for low bit-depth, making
      encoder speed faster by 3~4x than c functions.
      
      Change-Id: Icca0b07b25489759be9504aaec09d1239076fc52
      ef34fff7
    • Cheng Chen's avatar
      JNT_COMP: Refactor code · f78632e0
      Cheng Chen authored
      The refactoring serves two purposes:
      1. Separate code paths for jnt_comp and original compound average
      computation. It provides function interface for jnt_comp while leaving
      original compound average computation unchanged. In near future, SIMD
      functions can be added for jnt_comp using the interface.
      
      2. Previous implementation uses a hack on second_pred. But it may cause
      segmentation fault when the test clip is small. As reported in Issue
      944. This refactoring removes hacking and make it possible to address
      the seg fault problem in the future.
      
      Change-Id: Idd2cb99f6c77dae03d32ccfa1f9cbed1d7eed067
      f78632e0
  4. 05 Nov, 2017 1 commit
  5. 03 Nov, 2017 1 commit
    • Yunqing Wang's avatar
      Allow to disable the probability update · 0e141b56
      Yunqing Wang authored
      Added the function of allowing to disable the probability update while
      needed. This would be needed while encoding in multiple tiles, and
      enabling/disabling probability update can be set separately for every
      individual tile.
      
      Change-Id: Ic3c64e6cebac89c483d48b874761bd2e902d81e6
      0e141b56
  6. 02 Nov, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Remove experimental flag of EXT_TX · 3bac9928
      Sebastien Alaiwan authored
      This experiment has been adopted, we can simplify the code
      by dropping the associated preprocessor conditionals.
      
      Change-Id: I02ed47186bbc32400ee9bfadda17659d859c0ef7
      3bac9928
  7. 01 Nov, 2017 3 commits
    • Debargha Mukherjee's avatar
      Move and rename some macros for resize/upscale · d365d3cc
      Debargha Mukherjee authored
      This is the first step towards moving the upscale convolve function
      to convolve.c
      
      Change-Id: I916a974a881d104b0b3cde861fa8bb898883af01
      d365d3cc
    • Sarah Parker's avatar
      Use tx_size 1 level down for transform type search · 90024e44
      Sarah Parker authored
      This addresses an inconsistency between the set used
      to decode the tx_type in the bitstream and the set used
      for the tx_type search. Previously, the set used to
      read/write the tx_type was based on the smallest tx_size
      in the vartx partitioning, but the search uses a set
      based on the largest possible tx_size. This patch
      changes the tx_type search to use the transform type
      set associated with the tx_size 1 recursive level down from
      the max square tx_size to make the search more consistent
      with the bitstream syntax. If a tx_size is selected for an
      invalid tx_type, DCT_DCT is used for that partition instead.
      
      This patch also adds assertions to all exposed transform
      functions to ensure that no illegal transform type/size
      combinations occur.
      
      This currently gets a 0.1% drop in performance on lowres.
      The drop is due to the reduction of the tx_types available
      for 32x16 and 16x32 transform sizes. Before this patch,
      32x16 and 16x32 transforms were getting assigned a
      set of 12 tx_types, some of which we did not intend to
      support for these sizes.
      
      Change-Id: I44aca4876b261c345623cd04ad6235bca4532701
      90024e44
    • Yaowu Xu's avatar
      Help msvc 2017 compiler to generate correct code · 64620cd2
      Yaowu Xu authored
      This loop is wrongly vectorized by the MSVC2017 compiler, this change
      is a work-around for the compiler bug.
      
      Change-Id: Ie4c8403965c3e4cd6d70eb3dbc92148f5272f0ab
      (cherry picked from commit 5b33d7184bf319d9c10e34ef0fdcdd244d2fdb56)
      64620cd2
  8. 31 Oct, 2017 1 commit
  9. 27 Oct, 2017 1 commit
  10. 25 Oct, 2017 4 commits
    • Cheng Chen's avatar
      JNT_COMP: 2. assign proper weigths in rdopt · efc55fd9
      Cheng Chen authored
      Change-Id: I255be6e0193dd6b91424ce53ed41aeaaeb1c01a7
      efc55fd9
    • Rupert Swarbrick's avatar
      Avoid UB from misaligned loads/stores in loopfilter code · 129afee7
      Rupert Swarbrick authored
      This patch changes 32 bit loads and stores (which did trigger
      undefined behaviour when the pointer wasn't aligned) to use the
      xx_storel_32 synonym. This should also just generate a MOVD and is
      less verbose to boot!
      
      The patch also changes store_buffer_horz_8 to take its SSE register by
      value rather than by pointer. The most restrictive ABI for passing SSE
      registers by value is win32, where you can pass at most 3. There's
      only one here, so it should be fine.
      
      BUG=aomedia:912
      
      Change-Id: I6d75803e57da090db59eedad902bd27908eb5118
      129afee7
    • Rupert Swarbrick's avatar
      Avoid UB from misaligned loads in variance_sse2.c · d2dea66b
      Rupert Swarbrick authored
      The undefined behaviour came from READ64, whose loads compile to a
      MOVD but which is technically incorrect if p is misaligned. This patch
      rewrites it, and the other loads and stores in the file, to use the
      xx_* functions from synonyms.h
      
      BUG=aomedia:912
      
      Change-Id: Ic2fae623ef3b609dacd0a830a7cc63653291202f
      d2dea66b
    • Rupert Swarbrick's avatar
      Avoid UB in xx_loadl/storel_32 helper functions · be0aa4ad
      Rupert Swarbrick authored
      The previous code dereferenced a uint32_t * that might be misaligned,
      which is technically undefined behaviour in C. This version uses the
      right (cryptically named) Intel intrinsics to generate a MOVD without
      making any claims about the alignment of the pointer.
      
      BUG=aomedia:912
      
      Change-Id: Ic51679b9f9ed4d2476e69da70f40b2d599cbc6b0
      be0aa4ad
  11. 24 Oct, 2017 1 commit
  12. 23 Oct, 2017 1 commit
  13. 21 Oct, 2017 2 commits
  14. 20 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd D207E/D63E/D45E intrapred x86 optimization · ae676953
      Yi Luo authored
      D207E
      Predictor  SSE2 vs C
      4x4        ~2.6X
      4x8        ~2.5X
      8x4        ~8.0X
      8x8        ~9.1X
      8x16       ~11.7X
      16x8       ~16.9X
      16x16      ~17.3X
      16x32      ~17.2X
      32x16      ~30.2X
      32x32      ~35.5X
      
      D63E
      Predictor  SSE2 vs C
      4x4        ~4.7X
      4x8        ~4.9X
      8x4        ~7.8X
      8x8        ~8.9X
      8x16       ~9.3X
      16x8       ~15.7X
      16x16      ~14.7X
      16x32      ~17.3X
      32x16      ~18.0X
      32x32      ~15.7X
      
      D45E
      Predictor  SSSE3 vs C
      4x4        ~1.8X
      4x8        ~2.9X
      8x4        ~6.7X
      8x8        ~6.5X
      8x16       ~7.4X
      16x8       ~24.4X
      16x16      ~21.5X
      16x32      ~24.2X
      32x16      ~25.4X
      32x32      ~25.2X
      
      Change-Id: I8215de190e2b6314272749761600e389d1ca0fdf
      ae676953
  15. 19 Oct, 2017 1 commit
  16. 18 Oct, 2017 1 commit
    • Yaowu Xu's avatar
      Use proper inttypes for varaiance computations · 9f78e85b
      Yaowu Xu authored
      This commit correct the integer types used in variance functions. It
      now uses same integer type when number of pixels are same, e.g
      16x64 and 64x16 use same integer types as 32x32
      8x32 and 32x8 use same integer types as 16x16
      
      Change-Id: I1a54ba8d73e09126e680ae5af3ee52395a41df41
      9f78e85b
  17. 16 Oct, 2017 3 commits
    • Yaowu Xu's avatar
      Align more restoration work buffers · 15269e6e
      Yaowu Xu authored
      Fixes crashes on x86-win32-vs14 build
      
      Change-Id: I045dd0fe4e9af3bfb80223e291617b717cbcb231
      15269e6e
    • Yi Luo's avatar
      Highbd D207E/D63E intrapred sse2/avx2 optimization · 0b7127b3
      Yi Luo authored
      D207E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~2.7x
      4x8       ~3.0x
      8x4       ~7.2x
      8x8       ~8.5x
      8x16      ~9.4x
      16x8      ~12.8x
      16x16     ~13.0x
      16x32     ~14.3x
      32x16                 ~19.9x
      32x32                 ~23.6x
      
      D63E
      Predictor SSE2 vs C   AVX2 vs C
      4x4       ~3.8x
      4x8       ~4.3x
      8x4       ~6.4x
      8x8       ~6.8x
      8x16      ~8.6x
      16x8                  ~9.0x
      16x16                 ~9.6x
      16x32                 ~10.3x
      32x16                 ~9.1x
      32x32                 ~11.0x
      
      Change-Id: I87373804c9d53276bf4d7788c4ae0d13d01c00dc
      0b7127b3
    • Sebastien Alaiwan's avatar
      Clamp inverse transform coefficients · 29504172
      Sebastien Alaiwan authored
      When --enable-coefficient-range-checking isn't specified, clamp the
      coefficients at each stage.
      
      This doesn't change the decoder behavior for existing AV1 streams.
      However, some AV1 bitstreams that would have been rejected by the
      decoder as illegal (range check failure) are now legal bitstreams.
      
      There is no impact on video quality.
      
      BUG=aomedia:30
      
      Change-Id: Ibcf1683e5c2ae9f91a7f37b468c4bc72e98e22fa
      29504172
  18. 14 Oct, 2017 1 commit
  19. 13 Oct, 2017 1 commit
    • Jingning Han's avatar
      Use 7-bit precision for level-map probability model · 1c077a40
      Jingning Han authored
      Support higher hardware throughput. The coding performance loss
      as compared to 15-bit precision is 0.05% for lowres. The loss is
      smaller as frame size goes up.
      
      Change-Id: I2e22b156f3178cf63689df306e9da13e0e4d205b
      1c077a40
  20. 12 Oct, 2017 2 commits
  21. 11 Oct, 2017 2 commits
  22. 10 Oct, 2017 3 commits
    • Yi Luo's avatar
      Highbd D45E intrapred SSE2/AVX2 speedup · 56ad3dd3
      Yi Luo authored
      Function  SSE2 vs C  AVX2 vs C
      4x4       ~4.5x
      4x8       ~4.5x
      8x4       ~11.7x
      8x8       ~12.7x
      8x16      ~14.0x
      16x8                 ~21.7x
      16x16                ~24.0x
      16x32                ~28.7x
      32x16                ~20.5x
      32x32                ~24.4x
      
      Change-Id: Iaca49727d8df17b7f793b774a8d51a401ef8a8d1
      56ad3dd3
    • Lester Lu's avatar
      lgt-from-pred: transforms based on prediction · 432012f6
      Lester Lu authored
      In this experiment, sharp image discontinuity in the predicted
      block is detected. Based on this discontinuity, we choose
      particular LGTs as row and column transforms.
      
      Bitstream syntax, entropy coding, and RD search for LGT are added.
      One binary symbol is used to signal whether LGT is used. This
      experiment can work independently with the lgt experiment.
      
      lowres: -0.414% for key frames, -0.151% overall
      midres: -0.413% for key frames, -0.161% overall
      
      Change-Id: Iaa2f2c2839c34ca4134fa55e77870dc3f1fa879f
      432012f6
    • Yi Luo's avatar
      Migrate some vp9 highbd intrapred x86 speedup to av1 · 71b6e043
      Yi Luo authored
      Function speedup on i7-6700:
      D117   sse2   ssse3
      4x4    ~1.8x
      8x8           ~3.4x
      16x16         ~5.5x
      32x32         ~2.9x
      
      D135   sse2   ssse3
      4x4    ~1.9
      8x8           ~3.3x
      16x16         ~5.3x
      32x32         ~3.6x
      
      D153   sse2   ssse3
      4x4    ~1.9x
      8x8           ~2.8x
      16x16         ~5.5x
      32x32         ~3.6x
      
      Change-Id: I43ab5fa8dcbcfa51acbde554abf3e5d7d336f391
      71b6e043
  23. 08 Oct, 2017 1 commit
  24. 06 Oct, 2017 1 commit
    • Yi Luo's avatar
      Lowbd SMOOTH_PRED intrapred ssse3 optimization · 46ae1ea3
      Yi Luo authored
      On i7-6700:
      Predictor    ssse3 v. C
      4x4          ~1.3x
      4x8          ~1.9x
      8x4          ~2.3x
      8x8          ~3.4x
      8x16         ~4.1x
      16x8         ~4.6x
      16x16        ~5.2x
      16x32        ~5.6x
      32x16        ~4.2x
      32x32        ~4.7x
      
      Change-Id: Ic12383cf9d4446361d6355eb8a480a3c7602060e
      46ae1ea3