1. 09 Oct, 2014 1 commit
  2. 18 Sep, 2014 1 commit
  3. 11 Sep, 2014 1 commit
  4. 23 May, 2014 2 commits
    • Yunqing Wang's avatar
      Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters · c5443fc8
      Yunqing Wang authored
      In 8-tap filtering, to guarantee the intermediate results fit in
      16 bits, the order of accumulating the products needs to be done
      correctly, and the largest product should be added last. This
      patch fixed the problem using the method in commit "Correct ssse3
      8/16-pixel wide sub-pixel filter calculation".
      
      Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
      c5443fc8
    • Yaowu Xu's avatar
      change to use assembly version of ssse3 filter code · 7a0c9b82
      Yaowu Xu authored
      As mismatchs were found  between the intrinsic version and c only. The
      commit temporarily revert to use the matching assembly version to
      allow further investigation.
      
      Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
      7a0c9b82
  5. 14 Feb, 2014 1 commit
    • levytamar82's avatar
      SSSE3 convolution optimization · 3068d7d9
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization is done only for 64 bit
      
      Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
      3068d7d9
  6. 13 Feb, 2014 1 commit
    • levytamar82's avatar
      AVX2 Convolve Optimization · 876c72a0
      levytamar82 authored
      Two convolve functions were optimized for AVX2:
      1. vp9_filter_block1d16_h8
      2. vp9_filter_block1d16_v8
      vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
      loop strides by half, two strides were processed in parallel.
      vp9_filter_block1d16_v8 was also optimized in the same way also some of the
      loads were being done outside of the loop and by that preventing redundant
      loads.
      This Optimization gives 43% function level gain and 1.3% user level gain.
      Now can be compiled in Windows
      
      Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
      876c72a0
  7. 10 Feb, 2014 1 commit
  8. 04 Feb, 2014 1 commit
  9. 03 Feb, 2014 1 commit
    • Yunqing Wang's avatar
      Optimize bilinear sub-pixel filters in sse2 · 2488cb34
      Yunqing Wang authored
      Using bilinear filters could speed up the codec in real-time mode.
      This patch added sse2 optimizations of bilinear filters that
      operate on different-sized blocks.
      
      Tests showed that the real-time encoder was speeded up by 3%.
      
      Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
      2488cb34
  10. 29 Jan, 2014 1 commit
  11. 17 Jan, 2014 1 commit
  12. 13 Jan, 2014 1 commit
  13. 10 Jan, 2014 1 commit
  14. 09 Jan, 2014 1 commit
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
      511d218c
  15. 19 Dec, 2013 1 commit
    • Yunqing Wang's avatar
      Code clean up · 09faf559
      Yunqing Wang authored
      Removed unused filter coefficients.
      
      Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd
      09faf559
  16. 15 Oct, 2013 1 commit
  17. 10 Oct, 2013 1 commit
    • Yunqing Wang's avatar
      SSE2 8-tap sub-pixel filter optimization · 3fb728c7
      Yunqing Wang authored
      To ensure fast encoding/decoding on devices without ssse3 support,
      SSE2 optimization of sub-pixel filters was done. Test using 1080p
      clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
      with sse2 filters, and ~15fps with c filters.
      
      Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
      3fb728c7
  18. 13 Sep, 2013 1 commit
    • James Zern's avatar
      Revert "Improved 8t filters" · 2d587619
      James Zern authored
      This is incompatible with most toolchains other than gcc.
      
      Revert "Deleted #include <inttypes.h>"
      
      This reverts commit 4d018be9.
      
      This reverts commit d22a504d.
      
      Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
      2d587619
  19. 11 Sep, 2013 1 commit
    • Scott LaVarnway's avatar
      Improved 8t filters · d22a504d
      Scott LaVarnway authored
      Reformatted version of a patch submitted by Erik/Tamar
      from Intel.  For the test clips used, the decoder
      performance improved by ~2%.
      
      Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
      d22a504d
  20. 11 Jul, 2013 2 commits
  21. 19 Apr, 2013 1 commit
  22. 18 Apr, 2013 1 commit
    • John Koleszar's avatar
      convolve: support larger blocks, fix asm saturation bug · a9ebbcc3
      John Koleszar authored
      Updates the common convoloution code to support blocks larger than
      16x16, and rectangular blocks. This uncovered a bug in the SSSE3
      filtering routines due to the order of application of saturation.
      This commit fixes that bug, adjusts the unit test to bias its
      random values towards the extremes, and adds a test to ensure that
      all filters conform to the expected pairwise addition structure.
      
      Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0
      a9ebbcc3
  23. 13 Feb, 2013 1 commit
    • Scott LaVarnway's avatar
      WIP: ssse3 version of convolve avg functions · 30f866f4
      Scott LaVarnway authored
      Initial ssse3 convolve avg functions and is one step closer
      to using x86inc.asm.  The decoder performance improved by 8% for
      the test clip used.  This should be revisited later to see if
      averaging outside the loop is better than having many similar
      filter functions.
      
      Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
      30f866f4
  24. 09 Feb, 2013 1 commit
    • Scott LaVarnway's avatar
      Bug fix: ssse3 version of subpixel did not match C code · eda30b41
      Scott LaVarnway authored
      A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters.
      (vp9_sub_pel_filters_8lp)  Changed the order of the adds to fix this problem.
      Also added ssse3 support for 4x4 subpixel filtering.
      
      Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
      eda30b41
  25. 08 Feb, 2013 1 commit
    • John Koleszar's avatar
      Restore SSSE3 subpixel filters in new convolve framework · 29d47ac8
      John Koleszar authored
      This commit adds the 8 tap SSSE3 subpixel filters back into the code
      underneath the convolve API. The C code is still called for 4x4
      blocks, as well as compound prediction modes. This restores the
      encode performance to be within about 8% of the baseline.
      
      Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
      29d47ac8
  26. 05 Feb, 2013 1 commit
    • John Koleszar's avatar
      Convert subpixel filters to use convolve framework · 7a07eea1
      John Koleszar authored
      Update the code to call the new convolution functions to do subpixel
      prediction rather than the existing functions. Remove the old C and
      assembly code, since it is unused. This causes a 50% performance
      reduction on the decoder, but that will be resolved when the asm for
      the new functions is available.
      
      There is no consensus for whether 6-tap or 2-tap predictors will be
      supported in the final codec, so these filters are implemented in
      terms of the 8-tap code, so that quality testing of these modes
      can continue. Implementing the lower complexity algorithms is a
      simple exercise, should it be necessary.
      
      This code produces slightly better results in the EIGHTTAP_SMOOTH
      case, since the filter is now applied in only one direction when
      the subpel motion is only in one direction. Like the previous code,
      the filtering is skipped entirely on full-pel MVs. This combination
      seems to give the best quality gains, but this may be indicative of a
      bug in the encoder's filter selection, since the encoder could
      achieve the result of skipping the filtering on full-pel by selecting
      one of the other filters. This should be revisited.
      
      Quality gains on derf positive on almost all clips. The only clip
      that seemed to be hurt at all datarates was football
      (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
      0.347% SSIM.
      
      Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
      7a07eea1
  27. 26 Dec, 2012 1 commit
  28. 21 Dec, 2012 2 commits
  29. 03 Dec, 2012 1 commit
  30. 29 Nov, 2012 1 commit
  31. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
      fcccbcbb
  32. 01 Nov, 2012 3 commits
  33. 31 Oct, 2012 2 commits
  34. 26 Oct, 2012 1 commit
    • Scott LaVarnway's avatar
      Faster 8t filtering · ce811f87
      Scott LaVarnway authored
      Quickly modified the ssse3 sixtap filters to support eight taps.  For the test
      clip used, a 23+% boost in decoder performance was seen.  We can
      revisit later and improve further.
      
      Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
      ce811f87