1. 19 Nov, 2013 1 commit
  2. 18 Nov, 2013 1 commit
  3. 16 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do horizontal loopfiltering in parallel · 64f728ca
      Yunqing Wang authored
      This patch followed "Rewrite filter_selectively_horiz for parallel
      loopfiltering" commit, and added x86 SSE2 optimization to do
      16-pixel filtering in parallel. Also, corrected the declaration
      of aligned arrays. For 8-pixel-in-parallel case, improved the
      calculation of the masks and filters. Updated the threshold loading
      since the thresholds were already duplicated. Updated neon C functions
      to call neon loopfilters twice.
      
      Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
      
      Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
      64f728ca
  4. 13 Nov, 2013 2 commits
  5. 05 Nov, 2013 1 commit
  6. 31 Oct, 2013 1 commit
    • Tamar Levy's avatar
      mb_lpf_horizontal_edge AVX2 optimization · 54f92056
      Tamar Levy authored
      This CL contains two AVX2 optimized loop filter functions,
      mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
      
      Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
      54f92056
  7. 24 Oct, 2013 1 commit
  8. 10 Oct, 2013 1 commit
    • Yunqing Wang's avatar
      SSE2 8-tap sub-pixel filter optimization · 3fb728c7
      Yunqing Wang authored
      To ensure fast encoding/decoding on devices without ssse3 support,
      SSE2 optimization of sub-pixel filters was done. Test using 1080p
      clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
      with sse2 filters, and ~15fps with c filters.
      
      Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
      3fb728c7
  9. 09 Oct, 2013 1 commit
  10. 07 Oct, 2013 1 commit
  11. 02 Oct, 2013 1 commit
  12. 27 Sep, 2013 1 commit
    • Christian Duvivier's avatar
      Properly save neon registers. · b1b4ba1b
      Christian Duvivier authored
      Replace current code which corrupts the stack by
      duplicate of vp8 code to save and restore neon
      registers.
      
      Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
      b1b4ba1b
  13. 26 Sep, 2013 1 commit
  14. 25 Sep, 2013 1 commit
  15. 13 Sep, 2013 1 commit
    • James Zern's avatar
      Revert "Improved 8t filters" · 2d587619
      James Zern authored
      This is incompatible with most toolchains other than gcc.
      
      Revert "Deleted #include <inttypes.h>"
      
      This reverts commit 4d018be9.
      
      This reverts commit d22a504d.
      
      Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
      2d587619
  16. 12 Sep, 2013 1 commit
  17. 11 Sep, 2013 2 commits
    • Christian Duvivier's avatar
      First draft of vp9_short_idct32x32_add_neon. · 6a501462
      Christian Duvivier authored
      Lots of TODO which will be taken care in upcoming changes. As is,
      about 6x faster than C version.
      
      Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
      6a501462
    • Scott LaVarnway's avatar
      Improved 8t filters · d22a504d
      Scott LaVarnway authored
      Reformatted version of a patch submitted by Erik/Tamar
      from Intel.  For the test clips used, the decoder
      performance improved by ~2%.
      
      Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
      d22a504d
  18. 04 Sep, 2013 2 commits
  19. 27 Aug, 2013 1 commit
  20. 26 Aug, 2013 2 commits
  21. 14 Aug, 2013 3 commits
  22. 09 Aug, 2013 1 commit
  23. 07 Aug, 2013 1 commit
  24. 06 Aug, 2013 2 commits
  25. 05 Aug, 2013 1 commit
    • Jim Bankoski's avatar
      Begin to restrict x86inc.asm usage · c3809f3d
      Jim Bankoski authored
      Chromium does not support 32bit builds for Mac which use x86inc.asm.
      Make the files which include it work if 64bit or not PIC enabled
      starting with vp9_copy_sse2.asm
      
      Consolidate these targets in vp9_rtcd_defs.sh
      
      Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
      c3809f3d
  26. 02 Aug, 2013 1 commit
  27. 18 Jul, 2013 1 commit
  28. 17 Jul, 2013 1 commit
    • Johann's avatar
      vp9_convolve8_neon placeholder · 59dc4e9c
      Johann authored
      Call the individually optimized horizontal and vertical functions. This
      implementation abuses the temp buffer.
      
      This will be replaced with a custom optimized function.
      
      Over 2x speedup.
      
      Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
      59dc4e9c
  29. 16 Jul, 2013 3 commits
  30. 12 Jul, 2013 1 commit
    • Johann's avatar
      vp9_convolve8_[horiz|vert]_avg · a15bebfc
      Johann authored
      Super basic conversion from the other implementations. Any changes to
      one should be trivial to copy over keep in sync.
      
      Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
      a15bebfc
  31. 11 Jul, 2013 1 commit
    • Johann's avatar
      convolve8 optimizations for neon · 158c80cb
      Johann authored
      Independent horizontal and vertical implementations.
      
      Requires that blocks be built from 4x4 and [xy]_step_q4 == 16
      
      6-10% improvement. CIF improved the least.
      
      Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
      158c80cb