1. 20 Feb, 2018 1 commit
  2. 19 Feb, 2018 1 commit
    • Maxym Dmytrychenko's avatar
      SSE2 optimization for lpf 16_dual implementations · d6a7dd19
      Maxym Dmytrychenko authored
      covers horizontal and vertical variations and
      including low and high bitdepth types.
      
      Appropriate tests are enabled
      
      Performance changes, SSE2 over C:
      Horizontal methods: up to  3x
      Vertical   methods: up to  2x
      
      Change-Id: If430a916394c7befa743e4fbaa9913fd37c535ed
      d6a7dd19
  3. 14 Feb, 2018 2 commits
  4. 07 Feb, 2018 1 commit
    • Maxym Dmytrychenko's avatar
      SSE2 optimizations for _16 highbd lpf functions · e33f5819
      Maxym Dmytrychenko authored
      Includes vertical and horizontal implementations
      and to fix 13 TAPs/Parallel deblocking support
      
      Appropriate tests are enabled
      
      Performance changes, SSE2 over C:
      Horizontal methods: up to    2x
      Vertical   methods: up to  1.5x
      
      Change-Id: Icbdc217a55353eb33417b81847b73005e043262d
      e33f5819
  5. 29 Nov, 2017 1 commit
    • James Zern's avatar
      Unify highbd loopfilter function names · 684b7bd1
      James Zern authored
      Rename aom_highbd_lpf_horizontal_edge_8() to aom_highbd_lpf_horizontal_16().
      Rename aom_highbd_lpf_horizontal_edge_16() to aom_highbd_lpf_horizontal_16_dual().
      
      based on the same change from libvpx:
      7f1f35183 Unify loopfilter function names
      
      Change-Id: I40cd587e74e0fe02bae23e6c10280c8e269df1d6
      684b7bd1
  6. 28 Nov, 2017 1 commit
    • Yi Luo's avatar
      Fix the dual loopfilter for cb4x4 · 771a80ab
      Yi Luo authored
      In cb4x4, dual loopfilter filters 2 * 4 = 8 pixels.
      This patch does not influence encoder/decoder since
      they are not applied in bit mask implementation.
      
      Change-Id: Ifdeb8990127de39143971156db69a69ee3bd3136
      771a80ab
  7. 21 Oct, 2017 1 commit
  8. 30 Aug, 2017 1 commit
    • Yi Luo's avatar
      Highbd parallel_deblocking sse2 optimization · 6f5569f3
      Yi Luo authored
      - Decoder speed improves ~13.7% (baseline + parallel_deblocking).
      - Highbd loopfilter AVX2 version works when this experiment is
        disabled.
      
      Change-Id: I5d56b137a1d52236a4735656c370d57ef71ae043
      6f5569f3
  9. 11 Aug, 2017 1 commit
    • Yi Luo's avatar
      Simplify pixel clamping in highbitdepth loop filter · 099b1221
      Yi Luo authored
      The constants used in pixel clamping is based on bitdepth.
      Their calculation is moved outside pixel clamping and does
      only once. This achieves about <2% speed improvement on
      decoder.
      
      Change-Id: I48dcaebe04a3478962c3b6568d247a23b47a89d4
      099b1221
  10. 10 Aug, 2017 1 commit
    • Yi Luo's avatar
      Highbd loop filter AVX2 · 6ae0054c
      Yi Luo authored
      - Speed test (ms) on i7-6700, Linux x86_64
        FUNCTION             SSE2    AVX2
        horizontal_edge_16   55      28
        vertical_16_dual     84      47
        horizontal_4_dual    27      13
        horizontal_8_dual    36      15
        vertical_4_dual      38      25
        vertical_8_dual      44      27
      - Decoder frame rate improves around 1.2% - 2.8%.
      
      Change-Id: I9c4123869bac9b6d32e626173c2a8e7eb0cf49e7
      6ae0054c
  11. 12 Oct, 2016 1 commit
    • Yaowu Xu's avatar
      port changes on lpf from libvpx/nextgenv2 · 57ad0a05
      Yaowu Xu authored
      Manually cherry-picked the following commits:
      4b5e462d Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      3ea537c0 lpf_8_test: remove unneeded function wrapper
      110d3778 remove loopfilter 'count' param TODOs
      9b44d9d0 split vpx_highbd_lpf_horizontal_16 in two
      1b519fb6 split vpx_lpf_horizontal_16 in two
      e7a23d70 vpx_highbd_lpf_horizontal_4: remove unused count param
      51718573 vpx_highbd_lpf_horizontal_8: remove unused count param
      3c1019e4 vpx_highbd_lpf_vertical_4: remove unused count param
      72a9f06a vpx_highbd_lpf_vertical_8: remove unused count param
      b1e97c6a vpx_lpf_horizontal_4: remove unused count param
       ab25e46pgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      bd5a5bb5 vpx_lpf_horizontal_8: remove unused count param
      109a47b3 vpx_lpf_vertical_4: remove unused count param
      37225744 vpx_lpf_vertical_8: remove unused count param
      47dee375 lpf_8_test: add missing dspr2 tests
      4fec4a8e lpf_8_test: add missing vpx_lpf_horizontal_4 tests
      c3f2c8ad lpf_8_test: add missing vpx_lpf_vertical_4 tests
      45a7b5eb lpf_8_test: simplify function wrapper generation
      
      Change-Id: I0e9212497bbf30de37b19cd2d6ea63b505abe06d
      57ad0a05
  12. 02 Sep, 2016 1 commit
  13. 01 Sep, 2016 2 commits
  14. 10 Aug, 2016 1 commit
  15. 22 Mar, 2016 1 commit
  16. 17 Feb, 2016 5 commits
  17. 28 Jan, 2016 1 commit
  18. 17 Jul, 2015 2 commits
  19. 16 Jul, 2015 1 commit
  20. 13 May, 2015 1 commit
    • Johann's avatar
      Relocate memory operations for common code · 1d7ccd53
      Johann authored
      With the sad functions, and hopefully the variance functions soon,
      moving to the vpx_dsp location, place the defines used in the
      reference C code in a common location.
      
      Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
      1d7ccd53
  21. 07 May, 2015 1 commit
    • James Zern's avatar
      replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED · fd3658b0
      James Zern authored
      this macro was used inconsistently and only differs in behavior from
      DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
      is used with calls to assembly, while generic c-code doesn't rely on it,
      so in a c-only build without an alignment attribute the code will
      function as expected.
      
      Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
      fd3658b0
  22. 26 Feb, 2015 2 commits
  23. 25 Feb, 2015 1 commit
  24. 24 Feb, 2015 1 commit
  25. 09 Oct, 2014 1 commit
  26. 23 Sep, 2014 1 commit