1. 08 Sep, 2014 1 commit
  2. 05 Sep, 2014 2 commits
    • Dmitry Kovalev's avatar
      Removing postproc mmx code. · 1100e262
      Dmitry Kovalev authored
      Removed functions:
      * vp9_post_proc_down_and_across_mmx
      * vp9_mbpost_proc_down_mmx
      * vp9_plane_add_noise_mmx
      
      They all have sse2 equivalent.
      
      Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
      1100e262
    • James Zern's avatar
      fix x86-darwin* build · a8083449
      James Zern authored
      vp9_variance_sse2.c contains a mix of intrinsics and references to
      assembly which uses x86inc.asm; it's conditionally included as a result.
      
      Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec
      a8083449
  3. 04 Sep, 2014 2 commits
  4. 02 Sep, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing MMX SAD calculation code. · 318fc0c3
      Dmitry Kovalev authored
      Removed functions:
      * vp9_sad_16x16_mmx
      * vp9_sad_8x16_mmx
      * vp9_sad_16x8_mmx
      * vp9_sad_8x8_mmx
      * vp9_sad_4x4_mmx
      
      Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3
      318fc0c3
  5. 29 Aug, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing variance MMX code. · 12cd6f42
      Dmitry Kovalev authored
      Removed functions:
      * vp9_mse16x16_mmx
      * vp9_get_mb_ss_mmx
      * vp9_get4x4var_mmx
      * vp9_get8x8var_mmx
      * vp9_variance4x4_mmx
      * vp9_variance8x8_mmx
      * vp9_variance16x16_mmx
      * vp9_variance16x8_mmx
      * vp9_variance8x16_mmx
      
      They all have SSE2 equivalent.
      
      Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615
      12cd6f42
  6. 08 Aug, 2014 1 commit
    • levytamar82's avatar
      Fix bug 807 · 69a5f5ec
      levytamar82 authored and James Zern's avatar James Zern committed
      in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
      to 32 bytes - now load unaligned data
      
      Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
      69a5f5ec
  7. 07 Aug, 2014 2 commits
    • levytamar82's avatar
      Fix bug 804 · 839911fb
      levytamar82 authored
      A bug in Microsoft compiler was found in the function
      vp9_filter_block1d16_v8_avx2 and a workaround applied.
      the bug occur when there was 4 consecutive maddubs + min + adds
      intrinsic instructions.
      
      Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b
      839911fb
    • levytamar82's avatar
      Fix bug 806 · af10457e
      levytamar82 authored
      in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes
      and not to 32 bytes - the load is now unaligned.
      
      Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c
      af10457e
  8. 01 Aug, 2014 2 commits
  9. 31 Jul, 2014 2 commits
  10. 30 Jul, 2014 2 commits
  11. 29 Jul, 2014 1 commit
  12. 28 Jul, 2014 1 commit
    • levytamar82's avatar
      Fix bug 805 · 4ba92dc5
      levytamar82 authored and James Zern's avatar James Zern committed
      Remove all the redundant dct functions (dct4x4, dct8x8)
      in avx2 except dct32x32 those functions were copied originally from dct_sse2
      
      Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
      4ba92dc5
  13. 16 Jul, 2014 1 commit
  14. 10 Jul, 2014 1 commit
    • Yunqing Wang's avatar
      Refactor vp9_diamond_search_sad function · 75cd5750
      Yunqing Wang authored
      Currently, vp9_diamond_search_sadx4() is only called when sse3 is
      enabled, which is improper since sse2 optimization of sdx4df
      functions are available. Changed to always use
      vp9_diamond_search_sadx4().
      
      Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
      75cd5750
  15. 09 Jul, 2014 1 commit
    • Yunqing Wang's avatar
      Refactor refining_search_sad code · 30117a57
      Yunqing Wang authored
      There are sse2 optimization of sdx4df functions. Instead of calling
      vp9_refining_search_sadx4 only when sse3 is enabled, call it always.
      
      Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
      30117a57
  16. 08 Jul, 2014 1 commit
    • Jingning Han's avatar
      Re-design quantization process for 32x32 transform block · 9ad1b9fc
      Jingning Han authored
      This commit enables a new quantization process for 32x32 2D-DCT
      transform coefficient blocks. It improves the compression
      performance of speed 5 by 1.4%. The overall compression gains of
      speed 5 due to the new quantization scheme is 4.7%. It also includes
      the SSSE3 implementation of the 32x32 quantization process.
      
      Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
      9ad1b9fc
  17. 02 Jul, 2014 1 commit
    • Jingning Han's avatar
      Re-design quantization process · 9ac2f663
      Jingning Han authored
      This commit re-designs the quantization process for transform
      coefficient blocks of size 4x4 to 16x16. It improves compression
      performance for speed 7 by 3.85%. The SSSE3 version for the
      new quantization process is included.
      
      The average runtime of the 8x8 block quantization is reduced
      from 285 cycles -> 255 cycles, i.e., over 10% faster.
      
      Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
      9ac2f663
  18. 12 Jun, 2014 1 commit
    • Jingning Han's avatar
      Fast computation path for forward transform and quantization · ccba289f
      Jingning Han authored
      This commit enables a fast path computational flow for forward
      transformation. It checks the sse and variance of prediction
      residuals and decides if the quantized coefficients are all
      zero, dc only, or more. It then selects the corresponding coding
      path in the forward transformation and quantization stage.
      
      It is currently enabled in rtc coding mode. Will do it for rd
      coding mode next.
      
      In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
      goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
      Overall coding performance for rtc set is changed by -0.18%.
      
      Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
      ccba289f
  19. 10 Jun, 2014 5 commits
    • James Zern's avatar
      vp9_rtcd: correct avx2 references · 9f3a0dbb
      James Zern authored
      s/"\$avx2_x86inc"/"avx2"/
      
      avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm
      
      Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
      9f3a0dbb
    • James Zern's avatar
      vp9_sub_pixel_*variance*: disable avx2 variants · 520cb3f3
      James Zern authored
      tests failing under Win32/Win64
      
      + variance_test: add missing avx2 functions (partially disabled)
      
      Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
      520cb3f3
    • James Zern's avatar
      vp9_sad*x4d: disable avx2 variants · d3ff009d
      James Zern authored
      tests failing under Win32/Win64
      
      + sad_test: add missing avx2 functions (disabled)
      
      Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
      d3ff009d
    • James Zern's avatar
      vp9_f(dct|ht): disable avx2 variants · dd9f5029
      James Zern authored
      tests failing under Win32/Win64
      
      + dct16x16_test: add missing avx2 functions (partially disabled)
      
      exercises the forward transforms
      no idct/iht implementations, so the c-code is used
      
      Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
      dd9f5029
    • James Zern's avatar
      convolve: disable avx2 variants · 5704578f
      James Zern authored
      tests failing under Win32/Win64
      
      Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
      5704578f
  20. 02 Jun, 2014 1 commit
  21. 01 Jun, 2014 1 commit
  22. 29 May, 2014 1 commit
  23. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
      6d21cbd2
  24. 27 May, 2014 1 commit
  25. 23 May, 2014 2 commits
    • Jingning Han's avatar
      Inverse 16x16 2D-DCT SSSE3 implementation · 48b08913
      Jingning Han authored
      This commit enables the SSSE3 implementation of full inverse 16x16
      2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
      about 7% speed-up.
      
      Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
      48b08913
    • Deb Mukherjee's avatar
      Remove Wextra warnings from vp9_sad.c · 91655042
      Deb Mukherjee authored
      As a side-effect, the sad unit tests for VP8 and VP9
      had to be separated.
      
      Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
      91655042
  26. 20 May, 2014 1 commit
  27. 15 May, 2014 1 commit
  28. 14 May, 2014 2 commits
    • levytamar82's avatar
      AVX2 To VP9 Block Error Optimization · 1fbab853
      levytamar82 authored
      vp9_block_error_sse2 can only handle 16 bytes at a time but
      the function requires to handle a sequence of 32 bytes at a time
      so each 16 bytes is handled in a different register.
      With AVX2 optimization the 32 bytes can be handled in one register instead
      of two in the SSE2
      The vp9_block_error was optimized by 85%.
      The user level was optimized by 1.2%
      
      Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
      1fbab853
    • Deb Mukherjee's avatar
      Remove Wextra warnings from vp9_sad.c · 7ab9a958
      Deb Mukherjee authored
      As a side-effect, the max_sad check is removed from the
      C-implementation of VP8, for consistency with VP9, and to
      ensure that the SAD tests common to VP8/VP9 pass.
      That will make the VP8 C implementation of sad a little slower
      but given that is rarely used in practice, the impact will be
      minimal.
      
      Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
      7ab9a958