1. 13 Oct, 2016 2 commits
  2. 12 Oct, 2016 1 commit
    • Yaowu Xu's avatar
      minor updates · f36d0b46
      Yaowu Xu authored
      1. vp8->aom
      2. removed no-effect statements and spaces
      
      Change-Id: I367d05ff9bf1b9f3c71c517c45d8049d9d4236ec
      f36d0b46
  3. 11 Oct, 2016 1 commit
  4. 06 Oct, 2016 1 commit
    • Yi Luo's avatar
      Hybrid forward transforms 16x16 AVX2 optimization · e8e8cd8f
      Yi Luo authored
      - Unit tests are added for AVX2 SIMD.
      - Encoder speed improvement:
        AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
        800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
        user level time reduction: 3.86%.
      
      Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
      e8e8cd8f
  5. 28 Sep, 2016 1 commit
  6. 19 Sep, 2016 1 commit
    • Alex Converse's avatar
      Move ANS to aom_dsp. · 1ac1ae73
      Alex Converse authored
      That's where it lives in aom/master.
      
      Change-Id: I38f405827d9c2d0b06ef5f3bfd7cadc35d5991ef
      1ac1ae73
  7. 17 Sep, 2016 1 commit
  8. 02 Sep, 2016 1 commit
  9. 01 Sep, 2016 2 commits
  10. 03 Aug, 2016 1 commit
  11. 29 Jul, 2016 1 commit
  12. 28 Jul, 2016 1 commit
  13. 18 Jul, 2016 1 commit
    • Johann's avatar
      Merge changes from libvpx/master by cherry-pick · 2967bf35
      Johann authored
      This commit bring all up-to-date changes from master that are
      applicable to nextgenv2. Due to the remove VP10 code in master,
      we had to cherry pick the following commits to get those changes:
      
      Add default flags for arm64/armv8 builds
      
      Allows building simple targets with sane default flags.
      
      For example, using the Android arm64 toolchain from the NDK:
      https://developer.android.com/ndk/guides/standalone_toolchain.html
      ./build/tools/make-standalone-toolchain.sh --arch=arm64 \
        --platform=android-24 --install-dir=/tmp/arm64
      CROSS=/tmp/arm64/bin/aarch64-linux-android- \
        ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
      
      BUG=webm:1143
      
      vpx_lpf_horizontal_4_sse2: Remove dead load.
      
      Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
      
      Fail early when android target does not include --sdk-path
      
      Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
      
      configure: clean up var style and set_all usage
      
      Use quotes whenever possible and {} always for variables.
      
      Replace multiple set_all calls with *able_feature().
      
      Conflicts:
      	build/make/configure.sh
      
      vp9-svc: Remove some unneeded code/comment.
      
      datarate_test,DatarateTestLarge: normalize bits type
      
      quiets a msvc warning:
      conversion from 'const int64_t' to 'size_t', possible loss of data
      
      mips added p6600 cpu support
      
      Removed -funroll-loops
      
      psnr.c: use int64_t for sum of differences
      
      Since the values can be negative.
      
      *.asm: normalize label format
      
      add a trailing ':', though it's optional with the tools we support, it's
      more common to use it to mark a label. this also quiets the
      orphan-labels warning with nasm/yasm.
      
      BUG=b/29583530
      
      Prevent negative variance
      
      Due to rounding, hbd variance may become negative. This commit put in
      check and clamp of negative values to 0.
      
      configure: remove old visual studio support (<2010)
      
      BUG=b/29583530
      
      Conflicts:
      	configure
      
      configure: restore vs_version variable
      
      inadvertently lost in the final patchset of:
      078dff7 configure: remove old visual studio support (<2010)
      
      this prevents an empty CONFIG_VS_VERSION and avoids make failure
      
      Require x86inc.asm
      
      Force enable x86inc.asm when building for x86. Previously there were
      compatibility issues so a flag was added to simplify disabling this
      code.
      
      The known issues have been resolved and x86inc.asm is the preferred
      abstraction layer (over x86_abi_support.asm).
      
      BUG=b:29583530
      
      convolve_test: fix byte offsets in hbd build
      
      CONVERT_TO_BYTEPTR(x) was corrected in:
      003a9d2 Port metric computation changes from nextgenv2
      to use the more common (x) within the expansion. offsets should occur
      after converting the pointer to the desired type.
      
      + factorized some common expressions
      
      Conflicts:
      	test/convolve_test.cc
      
      vpx_dsp: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Conflicts:
      	vpx_dsp/vpx_dsp.mk
      	vpx_dsp/vpx_dsp_rtcd_defs.pl
      	vpx_dsp/x86/highbd_variance_sse2.c
      	vpx_dsp/x86/variance_sse2.c
      
      test: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Conflicts:
      	test/vp9_subtract_test.cc
      
      configure: remove x86inc.asm distinction
      
      BUG=b:29583530
      
      Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
      
      Update vpx subpixel 1d filter ssse3 asm
      
      Speed test shows the new vertical filters have degradation on Celeron
      Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
      the vertical filters activated code. Now just simply active the code
      without degradation on Celeron. Later there should be 2 set of vertical
      filters ssse3 functions, and let jump table to choose based on CPU type.
      
      improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
      
      Make set_reference control API work in VP9
      
      Moved the API patch from NextGenv2. An example was included.
      To try it, for example, run the following command:
      $ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
      
      Conflicts:
      	examples.mk
      	examples/vpx_cx_set_ref.c
      	test/cx_set_ref.sh
      	vp9/decoder/vp9_decoder.c
      
      deblock filter : moved from vp8 code branch
      
      The deblocking filters used in vp8 have been moved to vpx_dsp for
      use by both vp8 and vp9.
      
      vpx_thread.[hc]: update webp source reference
      
      + drop the blob hash, the updated reference will be updated in the
      commit message
      
      BUG=b/29583578
      
      vpx_thread: use native windows cond var if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 19:49:58 2015 -0800
      
          thread: use native windows cond var if available
      
          Vista / Server 2008 and up. no speed difference observed.
      
      100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use InitializeCriticalSectionEx if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 63fadc9ffacc77d4617526a50c696d21d558a70b
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:38:46 2015 -0800
      
          thread: use InitializeCriticalSectionEx if available
      
          Windows Vista / Server 2008 and up
      
      100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use WaitForSingleObjectEx if available
      
      BUG=b/29583578
      
      original webp change:
      
      commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:40:26 2015 -0800
      
          thread: use WaitForSingleObjectEx if available
      
          Windows XP and up
      
      100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vpx_thread: use CreateThread for windows phone
      
      BUG=b/29583578
      
      original webp change:
      
      commit d2afe974f9d751de144ef09d31255aea13b442c0
      Author: James Zern <jzern@google.com>
      Date:   Mon Nov 23 20:41:26 2015 -0800
      
          thread: use CreateThread for windows phone
      
          _beginthreadex is unavailable for winrt/uwp
      
          Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
      
      100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
      100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
      
      vp9_postproc.c missing extern.
      
      BUG=webm:1256
      
      deblock: missing const on extern const.
      
      postproc - move filling of noise buffer to vpx_dsp.
      
      Fix encoder crashes for odd size input
      
      clean-up vp9_intrapred_test
      
      remove tuple and overkill VP9IntraPredBase class.
      
      postproc: noise style fixes.
      
      gtest-all.cc: quiet an unused variable warning
      
      under windows / mingw builds
      
      vp9_intrapred_test: follow-up cleanup
      
      address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea
      
      Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
      2967bf35
  14. 13 Jul, 2016 1 commit
  15. 11 Jul, 2016 1 commit
    • Geza Lore's avatar
      Improve vpx_blend_* functions. · bfa59b4a
      Geza Lore authored
      - Made source buffers pointers to const.
      - Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
        indicative that the function does alpha blending. The 6, or 6b
        suffix was misleading, as the max mask value (64) does not fit into
        6 bits.
      - Added VPX_BLEND_* macros to use when needing to blend scalars.
      - Use VPX_BLEND_A256 in combine_interintra to be more explicit about
        the operation being done.
      - Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
        masks directly and apply them to all rows/columns
        (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
        horizontal version now falls back on the 2D version. This can be
        improved upon if it show up high enough in a profile.
      - All vpx_blend_a64_* functions now support block sizes down to 1x1
        (ie: a single pixel). This is for usage convenience. The SSE4.1
        optimized versions fall back on the C implementation if
        w <= 2 or h <= 2. This can again be improved if it becomes hot code.
      
      Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
      bfa59b4a
  16. 08 Jul, 2016 1 commit
  17. 06 Jul, 2016 1 commit
  18. 16 Jun, 2016 1 commit
    • Geza Lore's avatar
      Use correct size load in vpx_avg_4x4_sse2. · ffa91733
      Geza Lore authored
      The old version used 64 bit loads, and then ignored the top half
      of the result. This can cause asan failures if we read past the end
      of a buffer. Switched to using 32 bit loads instead.
      
      Change-Id: I57da127a26f869fb4b4f700b55408f6dc2fbbc1a
      ffa91733
  19. 03 Jun, 2016 2 commits
  20. 26 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 · 4b5e462d
      Linfeng Zhang authored
      Followed the code style of other lpf fuctions.
      These 2 functions put 2 rows of data in a single xmm register,
      so they have similar but not identical filter operations,
      and cannot share the same macros.
      
      Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
      4b5e462d
  21. 23 May, 2016 1 commit
    • Geza Lore's avatar
      Add optimized vpx_blend_mask6 · a661bc87
      Geza Lore authored
      This is to replace vp10/common/reconinter.c:build_masked_compound.
      Functionality is equivalent, but the interface is slightly more
      generic.
      
      Total encoder speedup with ext-inter: ~7.5%
      
      Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
      a661bc87
  22. 16 May, 2016 1 commit
    • Johann's avatar
      neon hadamard 8x8 · 9b54e812
      Johann authored
      Runs about 30% faster than the C
      
      BUG=webm:1021
      
      Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
      9b54e812
  23. 11 May, 2016 1 commit
    • Linfeng Zhang's avatar
      remove mmx sad functions · d0e687bf
      Linfeng Zhang authored
      there are sse2 equivalents which is a reasonable modern baseline
      
      Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
      d0e687bf
  24. 10 May, 2016 1 commit
  25. 02 May, 2016 1 commit
  26. 12 Apr, 2016 1 commit
    • Yi Luo's avatar
      Optimized HBD block subtraction for all block sizes · 0f80b1f7
      Yi Luo authored
      - Interface function takes a local MxN function to call based on the
        block size.
      - Repetition call (w/o cache line miss) shows improvement:
        ~63% - ~340%.
      - Overall encoder speed improvement: ~0.9%.
      
      Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
      0f80b1f7
  27. 04 Apr, 2016 1 commit
  28. 08 Mar, 2016 1 commit
  29. 05 Mar, 2016 1 commit
  30. 02 Mar, 2016 1 commit
  31. 18 Feb, 2016 1 commit
  32. 17 Feb, 2016 1 commit
  33. 15 Feb, 2016 1 commit
    • Geza Lore's avatar
      Add optimized vpx_sum_squares_2d_i16 for vp10. · abd00505
      Geza Lore authored
      Using this we can eliminate large numbers of calls to predict intra,
      and is also faster than most of the variance functions it replaces.
      This is an equivalence transform so coding performance is unaffected.
      
      Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
      enabled.
      
      Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
      abd00505
  34. 14 Dec, 2015 1 commit
  35. 20 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vpx_quantize_{b,b_32x32} assembler. · 9cfba09a
      Geza Lore authored
      Added optimization of the 8 bit assembly quantizer routines. This makes
      these functions up to 100% faster, depending on encoding parameters.
      
      This patch maskes the encoder faster in both the high bitdepth and 8bit
      configurations. In the high bitdepth configuration, it effects profile 0
      only.
      
      Based on my profiling using 1080p input the net gain is between 1-3% for
      the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
      depending on target bitrate. The difference between the 8 bit and high
      bitdepth configurations for the same encoder run is reduced by 1% in all
      cases I have profiled.
      
      Change-Id: I86714a6b7364da20cd468cd784247009663a5140
      9cfba09a
  36. 30 Sep, 2015 1 commit
  37. 04 Sep, 2015 1 commit
    • Scott LaVarnway's avatar
      VPX: subpixel_8t_ssse3 asm using x86inc · 19588302
      Scott LaVarnway authored
      This is based on the original patch optimized for 32bit
      platforms by Tamar/Ilya and now uses the x86inc style asm.
      The assembly was also modified to support 64bit platforms.
      
      Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
      19588302