- 14 Dec, 2015 1 commit
-
-
James Zern authored
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
-
- 11 Nov, 2015 1 commit
-
-
Geza Lore authored
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
-
- 06 Nov, 2015 1 commit
-
-
James Zern authored
This reverts commit f1342a7b. This breaks 32-bit builds: runtime error: load of misaligned address 0xf72fdd48 for type 'const __m128i' (vector of 2 'long long' values), which requires 16 byte alignment + _mm_set1_epi64x is incompatible with some versions of visual studio Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
-
- 05 Nov, 2015 1 commit
-
-
Geza Lore authored
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
-
- 02 Nov, 2015 1 commit
-
-
Marco authored
Source noise level estimate is also useful for setting variance encoder parameters (variance thresholds, qp-delta, mode selection, etc), so allow it to be used also if denoising is not on. Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
-
- 21 Oct, 2015 1 commit
-
-
Geza Lore authored
A new version of vp9_highbd_error_8bit is now available which is optimized with AVX assembly. AVX itself does not buy us too much, but the non-destructive 3 operand format encoding of the 128bit SSEn integer instructions helps to eliminate move instructions. The Sandy Bridge micro-architecture cannot eliminate move instructions in the processor front end, so AVX will help on these machines. Further 2 optimizations are applied: 1. The common case of computing block error on 4x4 blocks is optimized as a special case. 2. All arithmetic is speculatively done on 32 bits only. At the end of the loop, the code detects if overflow might have happened and if so, the whole computation is re-executed using higher precision arithmetic. This case however is extremely rare in real use, so we can achieve a large net gain here. The optimizations rely on the fact that the coefficients are in the range [-(2^15-1), 2^15-1], and that the quantized coefficients always have the same sign as the input coefficients (in the worst case they are 0). These are the same assumptions that the old SSE2 assembly code for the non high bitdepth configuration relied on. The unit tests have been updated to take this constraint into consideration when generating test input data. Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
-
- 08 Oct, 2015 1 commit
-
-
Geza Lore authored
If high bit depth configuration is enabled, but encoding in profile 0, the code now falls back on optimized SSE2 assembler to compute the block errors, similar to when high bit depth is not enabled. Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
-
- 07 Aug, 2015 1 commit
-
-
Alex Converse authored
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
-
- 28 Jul, 2015 4 commits
-
-
Jingning Han authored
The forward 32x32 2D-DCT functions are aligned in vpx_dsp folder. The vp9_dct.h file is not effectively used now. Change-Id: Ie7946b6fdd784b8e91496242337bc9002c75c281
-
Jingning Han authored
This completes the forward transform functions layout refactoring. Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
-
Jingning Han authored
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/. Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
-
James Zern authored
~60-70% faster depending on the block size Change-Id: Icdbaa9977a91a63cbcc6ead0cf19d5a2af7f27e1
-
- 27 Jul, 2015 1 commit
-
-
Jingning Han authored
Change-Id: Iba03852ce778c956200818e3473cfb2b48cf8d8e
-
- 22 Jul, 2015 1 commit
-
-
Jingning Han authored
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward transform operations into vpx_dsp folder. Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
-
- 20 Jul, 2015 1 commit
-
-
Yaowu Xu authored
Change-Id: Id27e0007a0feac821ca66bcecbf3a723305da82d
-
- 17 Jul, 2015 1 commit
-
-
Yunqing Wang authored
The following quantization functions were moved: vp9_quantize_b vp9_quantize_b_32x32 vp9_highbd_quantize_b vp9_highbd_quantize_b_32x32 vp9_quantize_dc vp9_quantize_dc_32x32 vp9_highbd_quantize_dc vp9_highbd_quantize_dc_32x32 The purpose of doing that was to allow these functions to be shared by multiple codecs. Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
-
- 07 Jul, 2015 1 commit
-
-
Johann authored
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
-
- 06 Jul, 2015 2 commits
-
-
Change-Id: If88401bf8c5d8ee58200278734d7a5058d1585d0
-
Jingning Han authored
Factor out the subtraction operator as common function. Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
-
- 02 Jul, 2015 1 commit
-
-
James Zern authored
This reverts commit a42df86c. this change causes MSA/VP9SubpelVarianceTest.Ref and MSA/VP9SubpelVarianceTest.ExtremeRef failures under mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
-
- 01 Jul, 2015 2 commits
-
-
Johann authored
Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06
-
Parag Salasakar authored
average improvement ~3x-5x Change-Id: I4cbba2711467b0e205904769ebbb4a1fcbb1a311
-
- 26 Jun, 2015 3 commits
-
-
Parag Salasakar authored
average improvement ~4x-5x Change-Id: Iad9c0a296dbc2ea96d000bd009077999ed58a3c5
-
Parag Salasakar authored
average improvement ~3x-4x Change-Id: Idbe4d13a00d05ff8be6559b116f416e42c3b4097
-
Parag Salasakar authored
average improvement ~3x-4x Change-Id: If0fdcc34b17437a7e3e7fb4caaf1067bc175f291
-
- 23 Jun, 2015 1 commit
-
-
Parag Salasakar authored
average improvement ~2x-3x Change-Id: I76f7fc00c0ffdf2b4ba41bf3819f3b6044bcdeff
-
- 22 Jun, 2015 1 commit
-
-
Parag Salasakar authored
average improvement ~2x-3x Change-Id: Idf8be780b8b4228fc91f110a94e4ee1fd9af0163
-
- 20 Jun, 2015 1 commit
-
-
Parag Salasakar authored
average improvement ~4x-5x Change-Id: I37582efc2622bc20b2bf99617a76110ab24e9f6a
-
- 17 Jun, 2015 1 commit
-
-
Parag Salasakar authored
average improvement ~4x-6x Change-Id: Ibcac3ef8ed5e207cf8c121e696570e6b63d3c0f4
-
- 16 Jun, 2015 1 commit
-
-
Parag Salasakar authored
average improvement ~4x-6x Change-Id: Id3b2243e5b3c7844c90c4231a5e75fa69911362c
-
- 26 May, 2015 1 commit
-
-
Johann authored
subpel functions will be moved in another patch. Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
-
- 16 May, 2015 1 commit
-
-
James Zern authored
this file shouldn't be built directly, it is included in vp9_dct_sse2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT* functions Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
-
- 15 May, 2015 2 commits
-
-
James Zern authored
this file shouldn't be built directly, it is included in vp9_dct_sse2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
-
James Zern authored
this file shouldn't be built directly, it is included in vp9_dct_avx2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
-
- 06 May, 2015 1 commit
-
-
Johann authored
Create a new component, vpx_dsp, for code that can be shared between codecs. Move the SAD code into the component. This reduces the size of vpxenc/dec by 36k on x86_64 builds. Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
-
- 17 Apr, 2015 3 commits
-
-
Jim Bankoski authored
Change-Id: Iedceeb020492050063acf3fd2326f96c29db9ae5
-
Jim Bankoski authored
PSNR HVS is a human visual system weighted version of SNR that's gained some popularity from academia and apparently better matches MOS testing. This code is borrowed from the Daala Project but uses our FDCT code. Change-Id: Idd10fbc93129f7f4734946f6009f87d0f44cd2d7
-
Jim Bankoski authored
This code appeared in the Daala project first and was originally committed by Nathan Egge. Change-Id: Iadce416a091929c51b46637ebdec984cddcaf18c
-
- 01 Apr, 2015 1 commit
-
-
James Zern authored
exclude files that only contain functions for non-high-bitdepth builds. this removes some warnings related to missing prototypes Change-Id: Ic6642998c46a7b808c6c53b2f9c34bcd4d037abe
-
- 12 Feb, 2015 1 commit
-
-
Marco authored
Simple skin detection, from vp8; works reasonable on most of the RTC clips, but could miss sometimes. Added debug flag to write out skin map over source input. Change-Id: I2caea7592f1c459047aac46627eeb24a94946464
-