1. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      
      Further 2 optimizations are applied:
      
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
      aa8f8522
  2. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
      0134764f
  3. 07 Aug, 2015 1 commit
  4. 28 Jul, 2015 4 commits
  5. 27 Jul, 2015 1 commit
  6. 22 Jul, 2015 1 commit
  7. 20 Jul, 2015 1 commit
  8. 17 Jul, 2015 1 commit
    • Yunqing Wang's avatar
      Migrate quantization functions from vp9/ to vpx_dsp/ · 38f1fbbb
      Yunqing Wang authored
      The following quantization functions were moved:
      vp9_quantize_b
      vp9_quantize_b_32x32
      vp9_highbd_quantize_b
      vp9_highbd_quantize_b_32x32
      
      vp9_quantize_dc
      vp9_quantize_dc_32x32
      vp9_highbd_quantize_dc
      vp9_highbd_quantize_dc_32x32
      
      The purpose of doing that was to allow these functions to be shared
      by multiple codecs.
      
      Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
      38f1fbbb
  9. 07 Jul, 2015 1 commit
  10. 06 Jul, 2015 2 commits
  11. 02 Jul, 2015 1 commit
    • James Zern's avatar
      Revert "mips msa vp9 subpel variance optimization" · 97946622
      James Zern authored
      This reverts commit a42df86c.
      
      this change causes MSA/VP9SubpelVarianceTest.Ref and
      MSA/VP9SubpelVarianceTest.ExtremeRef failures under
      mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
      
      Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
      97946622
  12. 01 Jul, 2015 2 commits
  13. 26 Jun, 2015 3 commits
  14. 23 Jun, 2015 1 commit
  15. 22 Jun, 2015 1 commit
  16. 20 Jun, 2015 1 commit
  17. 17 Jun, 2015 1 commit
  18. 16 Jun, 2015 1 commit
  19. 26 May, 2015 1 commit
  20. 16 May, 2015 1 commit
    • James Zern's avatar
      rename vp9_dct_impl_sse2.c to vp9_dct_sse2_impl.h · a989c66b
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_sse2.c
      to create a non-high-bitdepth and a high-bitdepth version
      
      silences missing prototype warnings for the unused FDCT* functions
      
      Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
      a989c66b
  21. 15 May, 2015 2 commits
    • James Zern's avatar
      rename vp9_dct32x32_sse2.c to vp9_dct32x32_sse2_impl.h · 587a71f1
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_sse2.c
      to create a non-high-bitdepth and a high-bitdepth version
      
      silences missing prototype warnings for the unused FDCT32x32* functions
      
      Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
      587a71f1
    • James Zern's avatar
      rename vp9_dct32x32_avx2.c to vp9_dct32x32_avx2_impl.h · 4ec47249
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_avx2.c
      to create a non-high-bitdepth and a high-bitdepth version
      
      silences missing prototype warnings for the unused FDCT32x32* functions
      
      Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
      4ec47249
  22. 06 May, 2015 1 commit
    • Johann's avatar
      Move shared SAD code to vpx_dsp · d5d92898
      Johann authored
      Create a new component, vpx_dsp, for code that can be shared
      between codecs. Move the SAD code into the component.
      
      This reduces the size of vpxenc/dec by 36k on x86_64 builds.
      
      Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
      d5d92898
  23. 17 Apr, 2015 3 commits
  24. 01 Apr, 2015 1 commit
  25. 12 Feb, 2015 1 commit
    • Marco's avatar
      Add skin detection. · 56435bb7
      Marco authored
      Simple skin detection, from vp8; works reasonable on most of the
      RTC clips, but could miss sometimes.
      
      Added debug flag to write out skin map over source input.
      
      Change-Id: I2caea7592f1c459047aac46627eeb24a94946464
      56435bb7
  26. 27 Jan, 2015 1 commit
  27. 15 Jan, 2015 1 commit
    • Frank Galligan's avatar
      Add Neon intrinsics for vp9_avg_8x8_neon · 6e7e1cf3
      Frank Galligan authored
      On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
      in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
      increase in perf for 720p.
      
      Tested on Nexus 7, built with ndk r10d, gcc 4.9.
      
      Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
      6e7e1cf3
  28. 04 Dec, 2014 1 commit
    • Yunqing Wang's avatar
      vp9_ethread: the tile-based multi-threaded encoder · eba9c762
      Yunqing Wang authored
      Currently, VP9 supports column-tile encoding, which allows a frame
      to be encoded in multiple column tiles independently. The number of
      column tiles are set by encoder option "--tile-columns". This
      provides a way to encode a frame in parallel.
      
      Based on previous set of patches, this patch implemented the tile-
      based multi-threaded encoder. Each thread processes one or more
      tiles.
      
      Usage:
      For HD clips:
      --tile-columns=2 --threads=1/2/3/4
      
      While using 4 threads, tests showed that the encoder achieved
      2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
      realtime speed 5.
      
      Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
      eba9c762
  29. 02 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Added high bitdepth sse2 transform functions · 7e40a55e
      Peter de Rivaz authored
      Also removes some spurious changes in common/vp9_blockd.h which
      was introduced by a rebase issue between nextgen and master branches.
      
      Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
      (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba)
      (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3)
      (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
      7e40a55e
  30. 24 Nov, 2014 1 commit
    • Peter de Rivaz's avatar
      Refactored idct routines and headers · 3a8c43a4
      Peter de Rivaz authored
      This change is made in preparation for a
      subsequent patch which adds acceleration
      for the highbitdepth transform functions.
      
      The highbitdepth transform functions attempt
      to use 16/32bit sse instructions where possible,
      but fallback to using the C implementations if
      potential overflow is detected.  For this reason
      the dct routines are made global so they can be
      called from the acceleration functions in the
      subsequent patch.
      
      Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
      (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
      3a8c43a4