1. 16 Jun, 2016 1 commit
    • Geza Lore's avatar
      Use correct size load in vpx_avg_4x4_sse2. · ffa91733
      Geza Lore authored
      The old version used 64 bit loads, and then ignored the top half
      of the result. This can cause asan failures if we read past the end
      of a buffer. Switched to using 32 bit loads instead.
      
      Change-Id: I57da127a26f869fb4b4f700b55408f6dc2fbbc1a
      ffa91733
  2. 23 May, 2016 1 commit
    • Geza Lore's avatar
      Add optimized vpx_blend_mask6 · a661bc87
      Geza Lore authored
      This is to replace vp10/common/reconinter.c:build_masked_compound.
      Functionality is equivalent, but the interface is slightly more
      generic.
      
      Total encoder speedup with ext-inter: ~7.5%
      
      Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
      a661bc87
  3. 12 Apr, 2016 1 commit
    • Yi Luo's avatar
      Optimized HBD block subtraction for all block sizes · 0f80b1f7
      Yi Luo authored
      - Interface function takes a local MxN function to call based on the
        block size.
      - Repetition call (w/o cache line miss) shows improvement:
        ~63% - ~340%.
      - Overall encoder speed improvement: ~0.9%.
      
      Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
      0f80b1f7
  4. 04 Apr, 2016 1 commit
  5. 08 Mar, 2016 1 commit
  6. 05 Mar, 2016 1 commit
  7. 02 Mar, 2016 1 commit
  8. 18 Feb, 2016 1 commit
  9. 17 Feb, 2016 1 commit
  10. 15 Feb, 2016 1 commit
    • Geza Lore's avatar
      Add optimized vpx_sum_squares_2d_i16 for vp10. · abd00505
      Geza Lore authored
      Using this we can eliminate large numbers of calls to predict intra,
      and is also faster than most of the variance functions it replaces.
      This is an equivalence transform so coding performance is unaffected.
      
      Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
      enabled.
      
      Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
      abd00505
  11. 14 Dec, 2015 1 commit
  12. 20 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vpx_quantize_{b,b_32x32} assembler. · 9cfba09a
      Geza Lore authored
      Added optimization of the 8 bit assembly quantizer routines. This makes
      these functions up to 100% faster, depending on encoding parameters.
      
      This patch maskes the encoder faster in both the high bitdepth and 8bit
      configurations. In the high bitdepth configuration, it effects profile 0
      only.
      
      Based on my profiling using 1080p input the net gain is between 1-3% for
      the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
      depending on target bitrate. The difference between the 8 bit and high
      bitdepth configurations for the same encoder run is reduced by 1% in all
      cases I have profiled.
      
      Change-Id: I86714a6b7364da20cd468cd784247009663a5140
      9cfba09a
  13. 30 Sep, 2015 1 commit
  14. 04 Sep, 2015 1 commit
    • Scott LaVarnway's avatar
      VPX: subpixel_8t_ssse3 asm using x86inc · 19588302
      Scott LaVarnway authored
      This is based on the original patch optimized for 32bit
      platforms by Tamar/Ilya and now uses the x86inc style asm.
      The assembly was also modified to support 64bit platforms.
      
      Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
      19588302
  15. 27 Aug, 2015 1 commit
    • Johann's avatar
      Add sse2 versions of halfpix variance · a28b2c6f
      Johann authored
      These were lost in the great sub pixel variance move of
      6a82f0d7
      
      Not having these functions caused a ~10% performance regression in
      some realtime vp8 encodes.
      
      Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
      a28b2c6f
  16. 19 Aug, 2015 1 commit
  17. 17 Aug, 2015 1 commit
  18. 12 Aug, 2015 1 commit
    • Jingning Han's avatar
      Fork VP9 and VP10 codebase · 3ee6db6c
      Jingning Han authored
      This commit folks the VP9 and VP10 codebase and makes libvpx
      support VP8, VP9, and VP10.
      
      Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
      3ee6db6c
  19. 11 Aug, 2015 1 commit
  20. 07 Aug, 2015 1 commit
  21. 04 Aug, 2015 2 commits
  22. 03 Aug, 2015 3 commits
  23. 02 Aug, 2015 1 commit
  24. 01 Aug, 2015 1 commit
  25. 31 Jul, 2015 3 commits
    • Jingning Han's avatar
      Factor inverse transform functions into vpx_dsp · e8b133c7
      Jingning Han authored
      This commit moves the module inverse transform functions from vp9
      to vpx_dsp folder. The hybrid transform wrapper functions stay in
      the vp9 folder, since it involves codec-specific data structures.
      
      Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
      e8b133c7
    • Zoe Liu's avatar
      Refactor mips/dspr2 on convolution. · 7cfdc003
      Zoe Liu authored
      Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
      7cfdc003
    • Zoe Liu's avatar
      Code refactor on InterpKernel · 7186a2dd
      Zoe Liu authored
      It in essence refactors the code for both the interpolation
      filtering and the convolution. This change includes the moving
      of all the files as well as the changing of the code from vp9_
      prefix to vpx_ prefix accordingly, for underneath architectures:
      (1) x86;
      (2) arm/neon; and
      (3) mips/msa.
      The work on mips/drsp2 will be done in a separate change list.
      
      Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
      7186a2dd
  26. 30 Jul, 2015 1 commit
  27. 28 Jul, 2015 1 commit
  28. 27 Jul, 2015 2 commits
  29. 26 Jul, 2015 1 commit
    • Jingning Han's avatar
      Refactor vp9_idct.h file · 5ebc8feb
      Jingning Han authored
      Separate the common coefficient constant into vpx_dsp/txfm_common.h.
      Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
      This clears the use case of vp9_idct.h in vpx_dsp folder.
      
      Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
      5ebc8feb
  30. 24 Jul, 2015 1 commit
  31. 23 Jul, 2015 2 commits
  32. 22 Jul, 2015 1 commit
  33. 20 Jul, 2015 1 commit