1. 27 Jan, 2015 1 commit
  2. 15 Jan, 2015 1 commit
    • Frank Galligan's avatar
      Add Neon intrinsics for vp9_avg_8x8_neon · 6e7e1cf3
      Frank Galligan authored
      On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
      in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
      increase in perf for 720p.
      
      Tested on Nexus 7, built with ndk r10d, gcc 4.9.
      
      Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
      6e7e1cf3
  3. 04 Dec, 2014 1 commit
    • Yunqing Wang's avatar
      vp9_ethread: the tile-based multi-threaded encoder · eba9c762
      Yunqing Wang authored
      Currently, VP9 supports column-tile encoding, which allows a frame
      to be encoded in multiple column tiles independently. The number of
      column tiles are set by encoder option "--tile-columns". This
      provides a way to encode a frame in parallel.
      
      Based on previous set of patches, this patch implemented the tile-
      based multi-threaded encoder. Each thread processes one or more
      tiles.
      
      Usage:
      For HD clips:
      --tile-columns=2 --threads=1/2/3/4
      
      While using 4 threads, tests showed that the encoder achieved
      2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
      realtime speed 5.
      
      Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
      eba9c762
  4. 02 Dec, 2014 1 commit
    • Peter de Rivaz's avatar
      Added high bitdepth sse2 transform functions · 7e40a55e
      Peter de Rivaz authored
      Also removes some spurious changes in common/vp9_blockd.h which
      was introduced by a rebase issue between nextgen and master branches.
      
      Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
      (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba)
      (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3)
      (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)
      7e40a55e
  5. 24 Nov, 2014 1 commit
    • Peter de Rivaz's avatar
      Refactored idct routines and headers · 3a8c43a4
      Peter de Rivaz authored
      This change is made in preparation for a
      subsequent patch which adds acceleration
      for the highbitdepth transform functions.
      
      The highbitdepth transform functions attempt
      to use 16/32bit sse instructions where possible,
      but fallback to using the C implementations if
      potential overflow is detected.  For this reason
      the dct routines are made global so they can be
      called from the acceleration functions in the
      subsequent patch.
      
      Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
      (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
      3a8c43a4
  6. 20 Nov, 2014 2 commits
  7. 14 Nov, 2014 1 commit
  8. 12 Nov, 2014 1 commit
  9. 19 Oct, 2014 1 commit
    • levytamar82's avatar
      SAD32xh and SAD64xh for AVX2 · 7045aec0
      levytamar82 authored
      All sad function that process above 32 consecutive elements are optimized
      for AVX2:
      vp9_sad64x64
      vp9_sad64x32
      vp9_sad32x64
      vp9_sad32x32
      vp9_sad32x16
      vp9_sad64x64_avg
      vp9_sad64x32_avg
      vp9_sad32x64_avg
      vp9_sad32x32_avg
      vp9_sad32x16_avg
      The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
      vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
      both of them gave and overall ~2.3% user level gain
      
      Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
      7045aec0
  10. 14 Oct, 2014 1 commit
  11. 07 Oct, 2014 1 commit
    • Jim Bankoski's avatar
      experimental : partition using 1/8 x 1/8 image · 0ce51d82
      Jim Bankoski authored
      The concept:
      
      There's too much noise in source pixels for variance and at low bitrate
      the reconstructed looks nothing like the source so we have problems
      getting good partitionings with either.   This skirts the issue by using
      a box blur scaled down version for variance calculations.  To compare
      against source_var_ moved keyframe to be rd based like source_var.
      
      Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
      0ce51d82
  12. 06 Oct, 2014 1 commit
    • JackyChen's avatar
      Add SSE2 code and unit test for VP9 denoiser. · 80465dae
      JackyChen authored
      This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
      only 16x16 blocks in denoiser, while in VP9, there are 13 different
      block sizes.
      
      By adding this SSE2 code, the improvement of encoder speed is around
      20%(using C code vs using SSE2 code), vary for different clips.
      
      The unit test for VP9 denoiser is to confirm that the SSE2 code is
      bit-exact with the C code. The unit test covers all block size.
      
      Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
      80465dae
  13. 06 Sep, 2014 1 commit
  14. 02 Sep, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing MMX SAD calculation code. · 318fc0c3
      Dmitry Kovalev authored
      Removed functions:
      * vp9_sad_16x16_mmx
      * vp9_sad_8x16_mmx
      * vp9_sad_16x8_mmx
      * vp9_sad_8x8_mmx
      * vp9_sad_4x4_mmx
      
      Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3
      318fc0c3
  15. 29 Aug, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing variance MMX code. · 12cd6f42
      Dmitry Kovalev authored
      Removed functions:
      * vp9_mse16x16_mmx
      * vp9_get_mb_ss_mmx
      * vp9_get4x4var_mmx
      * vp9_get8x8var_mmx
      * vp9_variance4x4_mmx
      * vp9_variance8x8_mmx
      * vp9_variance16x16_mmx
      * vp9_variance16x8_mmx
      * vp9_variance8x16_mmx
      
      They all have SSE2 equivalent.
      
      Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615
      12cd6f42
  16. 31 Jul, 2014 1 commit
  17. 30 Jul, 2014 2 commits
  18. 29 Jul, 2014 1 commit
  19. 24 Jul, 2014 1 commit
  20. 16 Jul, 2014 1 commit
  21. 02 Jul, 2014 1 commit
    • Alex Converse's avatar
      Split vp9_rdopt into vp9_rdopt and vp9_rd. · 03c276ea
      Alex Converse authored
      vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
      other rd related routines. Anything used outside of making an rd optimal
      decision belongs in rd.
      
      Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
      03c276ea
  22. 25 Jun, 2014 1 commit
  23. 12 Jun, 2014 1 commit
  24. 21 May, 2014 1 commit
    • Deb Mukherjee's avatar
      Renames x86_64 specific asm files · e2722734
      Deb Mukherjee authored
      Renames all x86_64 specific assembly files to consistently
      end in _x86_64.asm. This will be useful for build systems to
      handle these files differently.
      All new 64-bit specific assembly files should use the new
      naming convention.
      
      Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
      e2722734
  25. 14 May, 2014 1 commit
    • levytamar82's avatar
      AVX2 To VP9 Block Error Optimization · 1fbab853
      levytamar82 authored
      vp9_block_error_sse2 can only handle 16 bytes at a time but
      the function requires to handle a sequence of 32 bytes at a time
      so each 16 bytes is handled in a different register.
      With AVX2 optimization the 32 bytes can be handled in one register instead
      of two in the SSE2
      The vp9_block_error was optimized by 85%.
      The user level was optimized by 1.2%
      
      Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
      1fbab853
  26. 08 May, 2014 1 commit
  27. 07 May, 2014 1 commit
    • Paul Wilkins's avatar
      Revert "Add an MMX fwht4x4" · 33b1c457
      Paul Wilkins authored
      Includes changes that are not compatible with VS windows builds.
      Amongst other things stdint.h is not supported in VS.
      
      This reverts commit 89fbf3de.
      
      Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
      33b1c457
  28. 05 May, 2014 1 commit
    • Alex Converse's avatar
      Add an MMX fwht4x4 · 89fbf3de
      Alex Converse authored
      7% faster encoding a desktop lossless at RT speed 4.
      
      Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
      89fbf3de
  29. 30 Apr, 2014 1 commit
  30. 29 Apr, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 implementation of 8x8 forward 2D-DCT · 1eaa3a76
      Jingning Han authored
      Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
      version is turned on only for x86_64. The average unit runtime
      goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
      This translates into about 1.5% speed-up for pedestrian_area 1080p
      at speed 2.
      
      Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
      1eaa3a76
  31. 22 Apr, 2014 1 commit
    • Dmitry Kovalev's avatar
      Renaming "onyx" to "encoder". · ef003078
      Dmitry Kovalev authored
      Actual renames:
        vp9_onyx_if.c -> vp9_encoder.c
        vp9_onyx_int.h -> vp9_encoder.h
      
      Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c
      ef003078
  32. 17 Apr, 2014 1 commit
    • Jim Bankoski's avatar
      add a context tree structure to encoder · e890c257
      Jim Bankoski authored
      This patch sets up a quad_tree structure (pc_tree) for holding all of
      pick_mode_context data we use at any square block size during encoding
      or picking modes.  That includes contexts for 2 horizontal and 2 vertical
      splits, one none, and pointers to 4 sub pc_tree nodes corresponding
      to split.  It also includes a pointer to the current chosen partitioning.
      
      This replaces code that held an index for every level in the pick
      modes array including:  sb_index, mb_index,
      b_index, ab_index.
      
      These were used as stateful indexes that pointed to the current pick mode
      contexts you had at each level stored in the following arrays
      
      array ab4x4_context[][][],
      sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][],
      sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][],
      sb16x32[],  sb32_context[], sb32x64_context[], sb64x32_context[],
      sb64_context
      
      and the partitioning that had been stored in the following:
      b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning.
      
      Prior to this patch before doing an encode you had to set the appropriate
      index for your block size ( switch statement),  update it ( up to 3
      lookups for the index array value) and then make your call into a recursive
      function at which point you'd have to call get_context which then
      had to do a switch statement based on the blocksize,  and then up to 3
      lookups based upon the block size to find the context to use.
      
      With the new code the context for the block size is passed around directly
      avoiding the extraneous switch statements and multi dimensional array
      look ups that were listed above.   At any level in the search all of the
      contexts are local to the pc_tree you are working on (in?).
      
      In addition in most places code that used to call sub functions and
      then check if the block size was 4x4 and index was > 0 and return
      now don't preferring instead to call the right none function on the inside.
      
      
      
      Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
      e890c257
  33. 14 Apr, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing unused vp9_mcomp_x86.h file. · 2fc3a186
      Dmitry Kovalev authored
      We don't use declarations from this file. The real declarations
      (differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
      
      Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
      2fc3a186
  34. 08 Apr, 2014 1 commit
  35. 28 Mar, 2014 1 commit
  36. 27 Mar, 2014 1 commit
  37. 24 Mar, 2014 1 commit
  38. 21 Mar, 2014 1 commit