1. 06 Nov, 2015 1 commit
    • James Zern's avatar
      Revert "Add AVX vectorized vp9_diamond_search_sad" · 30466f26
      James Zern authored
      This reverts commit f1342a7b.
      This breaks 32-bit builds:
       runtime error: load of misaligned address 0xf72fdd48 for type 'const
      __m128i' (vector of 2 'long long' values), which requires 16 byte
      + _mm_set1_epi64x is incompatible with some versions of visual studio
      Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
  2. 05 Nov, 2015 1 commit
    • Geza Lore's avatar
      Add AVX vectorized vp9_diamond_search_sad · f1342a7b
      Geza Lore authored
      This function now has an AVX intrinsics version which is about 80%
      faster compared to the C implementation. This provides a 2-4% total
      speed-up for encode, depending on encoding parameters. The function
      utilizes 3 properties of the cost function lookup table, constructed
      in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
      For the joint cost:
        - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
      For the component costs:
        - For all i: mvsadcost[0][i] == mvsadcost[1][i]
              (equal per component cost)
        - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
              (Cost function is even)
      These must hold, otherwise the AVX version of the function cannot be used.
      Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
  3. 02 Nov, 2015 1 commit
    • Marco's avatar
      Move noise level estimate outside denoiser. · c7da053d
      Marco authored
      Source noise level estimate is also useful for
      setting variance encoder parameters (variance thresholds,
      qp-delta, mode selection, etc), so allow it to be used also
      if denoising is not on.
      Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
  4. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      Further 2 optimizations are applied:
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
  5. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
  6. 07 Aug, 2015 1 commit
  7. 28 Jul, 2015 4 commits
  8. 27 Jul, 2015 1 commit
  9. 22 Jul, 2015 1 commit
  10. 20 Jul, 2015 1 commit
  11. 17 Jul, 2015 1 commit
    • Yunqing Wang's avatar
      Migrate quantization functions from vp9/ to vpx_dsp/ · 38f1fbbb
      Yunqing Wang authored
      The following quantization functions were moved:
      The purpose of doing that was to allow these functions to be shared
      by multiple codecs.
      Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
  12. 07 Jul, 2015 1 commit
  13. 06 Jul, 2015 2 commits
  14. 02 Jul, 2015 1 commit
    • James Zern's avatar
      Revert "mips msa vp9 subpel variance optimization" · 97946622
      James Zern authored
      This reverts commit a42df86c.
      this change causes MSA/VP9SubpelVarianceTest.Ref and
      MSA/VP9SubpelVarianceTest.ExtremeRef failures under
      mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
      Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
  15. 01 Jul, 2015 2 commits
  16. 26 Jun, 2015 3 commits
  17. 23 Jun, 2015 1 commit
  18. 22 Jun, 2015 1 commit
  19. 20 Jun, 2015 1 commit
  20. 17 Jun, 2015 1 commit
  21. 16 Jun, 2015 1 commit
  22. 26 May, 2015 1 commit
  23. 16 May, 2015 1 commit
    • James Zern's avatar
      rename vp9_dct_impl_sse2.c to vp9_dct_sse2_impl.h · a989c66b
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_sse2.c
      to create a non-high-bitdepth and a high-bitdepth version
      silences missing prototype warnings for the unused FDCT* functions
      Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
  24. 15 May, 2015 2 commits
    • James Zern's avatar
      rename vp9_dct32x32_sse2.c to vp9_dct32x32_sse2_impl.h · 587a71f1
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_sse2.c
      to create a non-high-bitdepth and a high-bitdepth version
      silences missing prototype warnings for the unused FDCT32x32* functions
      Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
    • James Zern's avatar
      rename vp9_dct32x32_avx2.c to vp9_dct32x32_avx2_impl.h · 4ec47249
      James Zern authored
      this file shouldn't be built directly, it is included in vp9_dct_avx2.c
      to create a non-high-bitdepth and a high-bitdepth version
      silences missing prototype warnings for the unused FDCT32x32* functions
      Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
  25. 06 May, 2015 1 commit
    • Johann's avatar
      Move shared SAD code to vpx_dsp · d5d92898
      Johann authored
      Create a new component, vpx_dsp, for code that can be shared
      between codecs. Move the SAD code into the component.
      This reduces the size of vpxenc/dec by 36k on x86_64 builds.
      Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
  26. 17 Apr, 2015 3 commits
  27. 01 Apr, 2015 1 commit
  28. 12 Feb, 2015 1 commit
    • Marco's avatar
      Add skin detection. · 56435bb7
      Marco authored
      Simple skin detection, from vp8; works reasonable on most of the
      RTC clips, but could miss sometimes.
      Added debug flag to write out skin map over source input.
      Change-Id: I2caea7592f1c459047aac46627eeb24a94946464
  29. 27 Jan, 2015 1 commit
  30. 15 Jan, 2015 1 commit
    • Frank Galligan's avatar
      Add Neon intrinsics for vp9_avg_8x8_neon · 6e7e1cf3
      Frank Galligan authored
      On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
      in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
      increase in perf for 720p.
      Tested on Nexus 7, built with ndk r10d, gcc 4.9.
      Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee