- 12 Jun, 2014 1 commit
-
-
Jingning Han authored
This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
-
- 10 Jun, 2014 5 commits
-
-
James Zern authored
s/"\$avx2_x86inc"/"avx2"/ avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
-
James Zern authored
tests failing under Win32/Win64 + variance_test: add missing avx2 functions (partially disabled) Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
-
James Zern authored
tests failing under Win32/Win64 + sad_test: add missing avx2 functions (disabled) Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
-
James Zern authored
tests failing under Win32/Win64 + dct16x16_test: add missing avx2 functions (partially disabled) exercises the forward transforms no idct/iht implementations, so the c-code is used Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
-
James Zern authored
tests failing under Win32/Win64 Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
-
- 02 Jun, 2014 1 commit
-
-
Deb Mukherjee authored
As a side-effect, the sad unit tests for VP8 and VP9 had to be separated. Fixes a bug in original patch: (https://gerrit.chromium.org/gerrit/#/c/70163/8) that was reverted due to a nightly test failure. Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
-
- 01 Jun, 2014 1 commit
-
-
Frank Galligan authored
This reverts commit 91655042 Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb
-
- 29 May, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377
-
- 28 May, 2014 1 commit
-
-
Jingning Han authored
This commit enables SSSE3 implementation of the inverse 2D-DCT with only first 10 coefficients non-zero. It reduces the runtime of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up. Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
-
- 27 May, 2014 1 commit
-
-
Yunqing Wang authored
This reverts commit e8bbb3d9. Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc
-
- 23 May, 2014 2 commits
-
-
Jingning Han authored
This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
-
Deb Mukherjee authored
As a side-effect, the sad unit tests for VP8 and VP9 had to be separated. Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
-
- 20 May, 2014 1 commit
-
-
Deb Mukherjee authored
This is needed for profiles 1 and 2. Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
-
- 15 May, 2014 1 commit
-
-
Jim Bankoski authored
This reverts commit 7ab9a958 Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console Failed This patch did not address all the assembly issues some of the vp8 assembly counts on 5 arguments being passed in to this function: one example : vp8_sad8x16_wmt Please address or split this into vp9 and vp8 patches. Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
-
- 14 May, 2014 2 commits
-
-
levytamar82 authored
vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
-
Deb Mukherjee authored
As a side-effect, the max_sad check is removed from the C-implementation of VP8, for consistency with VP9, and to ensure that the SAD tests common to VP8/VP9 pass. That will make the VP8 C implementation of sad a little slower but given that is rarely used in practice, the impact will be minimal. Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
-
- 12 May, 2014 1 commit
-
-
Johann authored
Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
-
- 08 May, 2014 3 commits
-
-
Alex Converse authored
Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
-
Jingning Han authored
The scanning order has the first 12 coefficients of the 8x8 2D-DCT sitting in the top left 4x4 block. Hence the partial inverse 8x8 2D-DCT allows to handle cases with eob below 12. The overall runtime of the inverse 8x8 2D-DCT unit is reduced from 166 cycles (using SSE2) to 150 cycles (using SSSE3). Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
-
Jingning Han authored
This commit enables ssse3 assembly implementation of the 8x8 inverse 2D-DCT with only first 10 coefficients non-zero. The average runtime for this unit goes down from 198 cycles to 129 cycles (34.8% faster). Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
-
- 07 May, 2014 1 commit
-
-
Paul Wilkins authored
Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit 89fbf3de. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
-
- 06 May, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a
-
- 05 May, 2014 2 commits
-
-
Alex Converse authored
7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
-
Jingning Han authored
This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
-
- 29 Apr, 2014 2 commits
-
-
Jingning Han authored
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
-
Dmitry Kovalev authored
Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
-
- 25 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
-
- 24 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508
-
- 11 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434
-
- 09 Apr, 2014 1 commit
-
-
Yunqing Wang authored
Calculate the difference variance between last source frame and current source frame. The variance is calculated at 16x16 block level. The variances are compared to several thresholds to decide final partition sizes. An adaptive strategy is implemented to decide using SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions in the video. The switching test is done once every search_type_check_frequency frames. The selection of source_var_thresh needs to be investigated further later. RTC set Borg test showed 0.424% overall psnr gain, and 0.357% ssim gain. For clips with large enough static area, the encoding speedup is around 2% to 15%. Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
-
- 21 Mar, 2014 1 commit
-
-
levytamar82 authored
2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
-
- 03 Mar, 2014 1 commit
-
-
James Zern authored
significantly speeds up file generation. the goal of this change is to convert rtcd.sh to perl as directly as possible to allow for simple comparison. future changes can make it more perl-like. --- Linux [CREATE] vpx_scale_rtcd.h real 0m0.485s -> 0m0.022s [CREATE] vp8_rtcd.h real 0m4.619s -> 0m0.060s [CREATE] vp9_rtcd.h real 0m10.102s -> 0m0.087s Windows [CREATE] vpx_scale_rtcd.h real 0m8.360s -> 0m0.080s [CREATE] vp8_rtcd.h real 1m8.083s -> 0m0.160s [CREATE] vp9_rtcd.h real 2m6.489s -> 0m0.233s Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
-