1. 30 Jan, 2018 1 commit
  2. 26 Jan, 2018 1 commit
    • Maxym Dmytrychenko's avatar
      SSE2 optimizations for _6/_16 lowbd lpf functions · ae6e6bc1
      Maxym Dmytrychenko authored
      Includes vertical and horizontal implementations
      and to fix 5/13 TAPs/Parallel deblocking support.
      
      Re-working internals of the filters for better
      re-usage across different sizes.
      
      Tests are enabled.
      
      Performance changes, SSE2 over C:
      Horizontal methods: up to    3-4x
      Vertical   methods: up to 1.5x-2x
      
      Change-Id: I2e36035355d8c23c1d4b0d59d0e23f598e9d0e3f
      ae6e6bc1
  3. 27 Nov, 2017 1 commit
    • James Zern's avatar
      Unify loopfilter function names · 1dbe80bc
      James Zern authored
      Rename aom_lpf_horizontal_edge_8() to aom_lpf_horizontal_16().
      Rename aom_lpf_horizontal_edge_16() to aom_lpf_horizontal_16_dual().
      
      based on the same change from libvpx:
      7f1f35183 Unify loopfilter function names
      
      Change-Id: I4fda7a2e3a893fc3dee0779975e2d4145c32f5d2
      1dbe80bc
  4. 25 Oct, 2017 1 commit
    • Rupert Swarbrick's avatar
      Avoid UB from misaligned loads/stores in loopfilter code · 129afee7
      Rupert Swarbrick authored
      This patch changes 32 bit loads and stores (which did trigger
      undefined behaviour when the pointer wasn't aligned) to use the
      xx_storel_32 synonym. This should also just generate a MOVD and is
      less verbose to boot!
      
      The patch also changes store_buffer_horz_8 to take its SSE register by
      value rather than by pointer. The most restrictive ABI for passing SSE
      registers by value is win32, where you can pass at most 3. There's
      only one here, so it should be fine.
      
      BUG=aomedia:912
      
      Change-Id: I6d75803e57da090db59eedad902bd27908eb5118
      129afee7
  5. 07 Sep, 2017 1 commit
    • Yi Luo's avatar
      Lowbd parallel_deblocking sse2 optimization · ea8a0d52
      Yi Luo authored
      Baseline + parallel_deblocking:
      
      - Passed unit tests *SSE2/Loop8Test6*, *AVX2/Loop8Test6*.
      - 1080p, 25 frames, profile=0, encoding/decoding, output match.
      - Decoder frame rate increases from 54.15 to 65.84.
      
      Change-Id: I55938c94961066594f4b9080192c7268c19d9bf9
      ea8a0d52
  6. 01 Mar, 2017 1 commit
    • Ryan Lei's avatar
      implement combined parallel_deblocking experiment · 392d0ff7
      Ryan Lei authored
      The parallel_deblocking experiment is proposed jointly by Intel
      and Microsoft. The following changes are implemented in this
      experiment:
      
      - deblocking filter order is changed to filter all vertical edges
        of the whole frame followed by filtering all horizontal edges
        of the whole frame
      
      - filter length decision is made based on the transform block size
        on both sides of the edge. block with smaller transform size
        determines the final filter length.
      
      - transform blocks on both sides of the edge are checked, only when
        both blocks are skipped and they belong to the same prediction
        block, filtering of that edge can be skipped.
      
      - 15-tap filter and extended flat area detection are removed.
      
      - special rule for handling 4x4 transform block on the super block
        boundary in VP9 is removed.
      
      Change-Id: I1aa82c6b5335d47c2f73eec8fc8bee2c08a1cf74
      392d0ff7
  7. 12 Oct, 2016 1 commit
    • Yaowu Xu's avatar
      port changes on lpf from libvpx/nextgenv2 · 57ad0a05
      Yaowu Xu authored
      Manually cherry-picked the following commits:
      4b5e462d Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      3ea537c0 lpf_8_test: remove unneeded function wrapper
      110d3778 remove loopfilter 'count' param TODOs
      9b44d9d0 split vpx_highbd_lpf_horizontal_16 in two
      1b519fb6 split vpx_lpf_horizontal_16 in two
      e7a23d70 vpx_highbd_lpf_horizontal_4: remove unused count param
      51718573 vpx_highbd_lpf_horizontal_8: remove unused count param
      3c1019e4 vpx_highbd_lpf_vertical_4: remove unused count param
      72a9f06a vpx_highbd_lpf_vertical_8: remove unused count param
      b1e97c6a vpx_lpf_horizontal_4: remove unused count param
       ab25e46pgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
      bd5a5bb5 vpx_lpf_horizontal_8: remove unused count param
      109a47b3 vpx_lpf_vertical_4: remove unused count param
      37225744 vpx_lpf_vertical_8: remove unused count param
      47dee375 lpf_8_test: add missing dspr2 tests
      4fec4a8e lpf_8_test: add missing vpx_lpf_horizontal_4 tests
      c3f2c8ad lpf_8_test: add missing vpx_lpf_vertical_4 tests
      45a7b5eb lpf_8_test: simplify function wrapper generation
      
      Change-Id: I0e9212497bbf30de37b19cd2d6ea63b505abe06d
      57ad0a05
  8. 02 Sep, 2016 1 commit
  9. 01 Sep, 2016 2 commits
  10. 10 Aug, 2016 1 commit
  11. 18 Jul, 2016 1 commit
    • Johann's avatar
      Merge changes from libvpx/master by cherry-pick · 2967bf35
      Johann authored
      This commit bring all up-to-date changes from master that are
      applicable to nextgenv2. Due to the remove VP10 code in master,
      we had to cherry pick the following commits to get those changes:
      
      Add default flags for arm64/armv8 builds
      
      Allows building simple targets with sane default flags.
      
      For example, using the Android arm64 toolchain from the NDK:
      https://developer.android.com/ndk/guides/standalone_toolchain.html
      ./build/tools/make-standalone-toolchain.sh --arch=arm64 \
        --platform=android-24 --install-dir=/tmp/arm64
      CROSS=/tmp/arm64/bin/aarch64-linux-android- \
        ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
      
      BUG=webm:1143
      
      vpx_lpf_horizontal_4_sse2: Remove dead load.
      
      Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
      
      Fail early when android target does not include --sdk-path
      
      Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
      
      configure: clean up var style and set_all usage
      
      Use quotes whenever possible and {} always for variables.
      
      Replace multiple set_all calls with *able_feature().
      
      Conflicts:
      	build/make/configure.sh
      
      vp9-svc: Remove some unneeded code/comment.
      
      datarate_test,DatarateTestLarge: normalize bits type
      
      quiets a msvc warning:
      conversion from 'const int64_t' to 'size_t', possible loss of data
      
      mips added p6600 cpu support
      
      Removed -funroll-loops
      
      psnr.c: use int64_t for sum of differences
      
      Since the values can be negative.
      
      *.asm: normalize label format
      
      add a trailing ':', though it's optional with the tools we support, it's
      more common to use it to mark a label. this also quiets the
      orphan-labels warning with nasm/yasm.
      
      BUG=b/29583530
      
      Prevent negative variance
      
      Due to rounding, hbd variance may become negative. This commit put in
      check and clamp of negative values to 0.
      
      configure: remove old visual studio support (<2010)
      
      BUG=b/29583530
      
      Conflicts:
      	configure
      
      configure: restore vs_version variable
      
      inadvertently lost in the final patchset of:
      078dff7 configure: remove old visual studio support (<2010)
      
      this prevents an empty CONFIG_VS_VERSION and avoids make failure
      
      Require x86inc.asm
      
      Force enable x86inc.asm when building for x86. Previously there were
      compatibility issues so a flag was added to simplify disabling this
      code.
      
      The known issues have been resolved and x86inc.asm is the preferred
      abstraction layer (over x86_abi_support.asm).
      
      BUG=b:29583530
      
      convolve_test: fix byte offsets in hbd build
      
      CONVERT_TO_BYTEPTR(x) was corrected in:
      003a9d2 Port metric computation changes from nextgenv2
      to use the more common (x) within the expansion. offsets should occur
      after converting the pointer to the desired type.
      
      + factorized some common expressions
      
      Conflicts:
      	test/convolve_test.cc
      
      vpx_dsp: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Conflicts:
      	vpx_dsp/vpx_dsp.mk
      	vpx_dsp/vpx_dsp_rtcd_defs.pl
      	vpx_dsp/x86/highbd_variance_sse2.c
      	vpx_dsp/x86/variance_sse2.c
      
      test: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Conflicts:
      	test/vp9_subtract_test.cc
      
      configure: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
      
      Update vpx subpixel 1d filter ssse3 asm
      
      Speed test shows the new vertical filters have degradation on Celeron
      Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
      the vertical filters activated code. Now just simply active the code
      without degradation on Celeron. Later there should be 2 set of vertical
      filters ssse3 functions, and let jump table to choose based on CPU type.
      
      improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
      
      Make set_reference control API work in VP9
      
      Moved the API patch from NextGenv2. An example was included.
      To try it, for example, run the following command:
      $ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
      
      Conflicts:
      	examples.mk
      	examples/vpx_cx_set_ref.c
      	test/cx_set_ref.sh
      	vp9/decoder/vp9_decoder.c
      
      deblock filter : moved from vp8 code branch
      
      The deblocking filters used in vp8 have been moved to vpx_dsp for
      use by both vp8 and vp9.
      
      vpx_thread.[hc]: update webp source reference
      
      + drop the blob hash, the updated reference will be updated in the
      commit message
      
      BUG=b/29583578
      
      vpx_thread: use native windows cond var if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 19:49:58 2015 -0800
      
          thread: use native windows cond var if available
      
          Vista / Server 2008 and up. no speed difference observed.
      
      100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use InitializeCriticalSectionEx if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 63fadc9ffacc77d4617526a50c696d21d558a70b
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:38:46 2015 -0800
      
          thread: use InitializeCriticalSectionEx if available
      
          Windows Vista / Server 2008 and up
      
      100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use WaitForSingleObjectEx if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:40:26 2015 -0800
      
          thread: use WaitForSingleObjectEx if available
      
          Windows XP and up
      
      100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use CreateThread for windows phone
      
      BUG=b/29583578
      
      original webp change:
      
      commit d2afe974f9d751de144ef09d31255aea13b442c0
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:41:26 2015 -0800
      
          thread: use CreateThread for windows phone
      
          _beginthreadex is unavailable for winrt/uwp
      
          Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
      
      100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vp9_postproc.c missing extern.
      
      BUG=webm:1256
      
      deblock: missing const on extern const.
      
      postproc - move filling of noise buffer to vpx_dsp.
      
      Fix encoder crashes for odd size input
      
      clean-up vp9_intrapred_test
      
      remove tuple and overkill VP9IntraPredBase class.
      
      postproc: noise style fixes.
      
      gtest-all.cc: quiet an unused variable warning
      
      under windows / mingw builds
      
      vp9_intrapred_test: follow-up cleanup
      
      address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea
      
      Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
      2967bf35
  12. 26 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 · 4b5e462d
      Linfeng Zhang authored
      Followed the code style of other lpf fuctions.
      These 2 functions put 2 rows of data in a single xmm register,
      so they have similar but not identical filter operations,
      and cannot share the same macros.
      
      Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
      4b5e462d
  13. 22 Mar, 2016 1 commit
  14. 17 Feb, 2016 2 commits
  15. 16 Feb, 2016 1 commit
  16. 28 Jan, 2016 1 commit
  17. 17 Jul, 2015 1 commit
  18. 16 Jul, 2015 1 commit
  19. 02 Jul, 2015 1 commit
    • levytamar82's avatar
      VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization · 3c5256d5
      levytamar82 authored
      The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
      The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
      instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
      credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
      
      Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
      3c5256d5
  20. 15 May, 2015 1 commit
  21. 07 May, 2015 1 commit
    • James Zern's avatar
      replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED · fd3658b0
      James Zern authored
      this macro was used inconsistently and only differs in behavior from
      DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
      is used with calls to assembly, while generic c-code doesn't rely on it,
      so in a c-only build without an alignment attribute the code will
      function as expected.
      
      Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
      fd3658b0
  22. 18 Sep, 2014 2 commits
  23. 18 Dec, 2013 1 commit
    • Jim Bankoski's avatar
      rename loop filter functions · b720ba16
      Jim Bankoski authored
      This renames all the loop filter functions so that they no
      longer refer to mb
      
      Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
      b720ba16
  24. 22 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do vertical loopfiltering in parallel · ed36720b
      Yunqing Wang authored
      This patch followed "Add filter_selectively_vert_row2 to enable
      parallel loopfiltering" commit, and added x86 SSE2 optimization
      to do 16-pixel filtering in parallel. For other optimizations
      (neon and dspr2), current 16-pixel functions were done by calling
      8-pixel functions twice, and real 16-pixel functions could be added
      later.
      
      Decoder speedup:
      tulip clip:     2% speed gain;
      old_town_cross: 1.2% speed gain;
      bus:            2% speed gain.
      
      Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
      ed36720b
  25. 16 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do horizontal loopfiltering in parallel · 64f728ca
      Yunqing Wang authored
      This patch followed "Rewrite filter_selectively_horiz for parallel
      loopfiltering" commit, and added x86 SSE2 optimization to do
      16-pixel filtering in parallel. Also, corrected the declaration
      of aligned arrays. For 8-pixel-in-parallel case, improved the
      calculation of the masks and filters. Updated the threshold loading
      since the thresholds were already duplicated. Updated neon C functions
      to call neon loopfilters twice.
      
      Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
      
      Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
      64f728ca
  26. 29 Aug, 2013 1 commit
    • Scott LaVarnway's avatar
      Improved mb_lpf_horizontal_edge_w_sse2_8 · 22dc946a
      Scott LaVarnway authored
      This patch is a reformatted version of optimizations done by
      engineers at Intel (Erik/Tamar) who have been providing
      performance feedback for VP9.  For the test clips used (720p, 1080p),
      up to 1.2% performance improvement was seen.
      
      Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
      22dc946a
  27. 16 Jul, 2013 2 commits
  28. 14 Jul, 2013 3 commits
  29. 10 Jul, 2013 1 commit
    • John Koleszar's avatar
      Wide loopfilter 16 pix at a time · 64f7a4d8
      John Koleszar authored
      Where possible, do the 16 pixel wide filter while doing the horizontal
      filtering pass. The same approach can be taken for the mbloop_filter
      when that's implemented. Doing so on the vertical pass is a little more
      involved, but possible.
      
      Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
      64f7a4d8
  30. 12 Jun, 2013 3 commits
  31. 10 May, 2013 1 commit
  32. 02 Apr, 2013 1 commit
    • Johann's avatar
      Demux vp9_loopfilter_x86.c · 3db60c8c
      Johann authored
      Allow more careful targeting of compiler flags.
      
      Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8
      3db60c8c