- 25 Nov, 2015 1 commit
-
-
James Zern authored
~60-65% faster at the function level across block sizes Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
-
- 20 Nov, 2015 2 commits
-
-
James Zern authored
accumulate satd in 32-bits + add unit test Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
-
James Zern authored
the final sum may use up to 26 bits + add a unit test + disable the sse2 as the result will rollover; this will be fixed in a future commit Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
-
- 13 Nov, 2015 1 commit
-
-
paulwilkins authored
This change alters the nature and use of exhaustive motion search. Firstly any exhaustive search is preceded by a normal step search. The exhaustive search is only carried out if the distortion resulting from the step search is above a threshold value. Secondly the simple +/- 64 exhaustive search is replaced by a multi stage mesh based search where each stage has a range and step/interval size. Subsequent stages use the best position from the previous stage as the center of the search but use a reduced range and interval size. For example: stage 1: Range +/- 64 interval 4 stage 2: Range +/- 32 interval 2 stage 3: Range +/- 15 interval 1 This process, especially when it follows on from a normal step search, has shown itself to be almost as effective as a full range exhaustive search with step 1 but greatly lowers the computational complexity such that it can be used in some cases for speeds 0-2. This patch also removes a double exhaustive search for sub 8x8 blocks which also contained a bug (the two searches used different distortion metrics). For best quality in my test animation sequence this patch has almost no impact on quality but improves encode speed by more than 5X. Restricted use in good quality speeds 0-2 yields significant quality gains on the animation test of 0.2 - 0.5 db with only a small impact on encode speed. On most clips though the quality gain and speed impact are small. Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
-
- 11 Nov, 2015 1 commit
-
-
Geza Lore authored
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
-
- 06 Nov, 2015 1 commit
-
-
James Zern authored
This reverts commit f1342a7b. This breaks 32-bit builds: runtime error: load of misaligned address 0xf72fdd48 for type 'const __m128i' (vector of 2 'long long' values), which requires 16 byte alignment + _mm_set1_epi64x is incompatible with some versions of visual studio Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
-
- 05 Nov, 2015 1 commit
-
-
Geza Lore authored
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
-
- 21 Oct, 2015 1 commit
-
-
Geza Lore authored
A new version of vp9_highbd_error_8bit is now available which is optimized with AVX assembly. AVX itself does not buy us too much, but the non-destructive 3 operand format encoding of the 128bit SSEn integer instructions helps to eliminate move instructions. The Sandy Bridge micro-architecture cannot eliminate move instructions in the processor front end, so AVX will help on these machines. Further 2 optimizations are applied: 1. The common case of computing block error on 4x4 blocks is optimized as a special case. 2. All arithmetic is speculatively done on 32 bits only. At the end of the loop, the code detects if overflow might have happened and if so, the whole computation is re-executed using higher precision arithmetic. This case however is extremely rare in real use, so we can achieve a large net gain here. The optimizations rely on the fact that the coefficients are in the range [-(2^15-1), 2^15-1], and that the quantized coefficients always have the same sign as the input coefficients (in the worst case they are 0). These are the same assumptions that the old SSE2 assembly code for the non high bitdepth configuration relied on. The unit tests have been updated to take this constraint into consideration when generating test input data. Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
-
- 08 Oct, 2015 1 commit
-
-
Geza Lore authored
If high bit depth configuration is enabled, but encoding in profile 0, the code now falls back on optimized SSE2 assembler to compute the block errors, similar to when high bit depth is not enabled. Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
-
- 29 Sep, 2015 1 commit
-
-
Julia Robson authored
When configured with high bitdepth enabled, the 8bit transform stopped using optimised code. This made 8bit content decode slowly. Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
-
- 07 Aug, 2015 1 commit
-
-
Alex Converse authored
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
-
- 01 Aug, 2015 1 commit
-
-
James Zern authored
~50-60% faster depending on the width Change-Id: I9d007cfa10b9aaa2169c8c009d95522df6123a92
-
- 31 Jul, 2015 2 commits
-
-
Jingning Han authored
This commit moves the module inverse transform functions from vp9 to vpx_dsp folder. The hybrid transform wrapper functions stay in the vp9 folder, since it involves codec-specific data structures. Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
-
Zoe Liu authored
It in essence refactors the code for both the interpolation filtering and the convolution. This change includes the moving of all the files as well as the changing of the code from vp9_ prefix to vpx_ prefix accordingly, for underneath architectures: (1) x86; (2) arm/neon; and (3) mips/msa. The work on mips/drsp2 will be done in a separate change list. Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
-
- 28 Jul, 2015 3 commits
-
-
Jingning Han authored
This completes the forward transform functions layout refactoring. Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
-
Jingning Han authored
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/. Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
-
James Zern authored
~60-70% faster depending on the block size Change-Id: Icdbaa9977a91a63cbcc6ead0cf19d5a2af7f27e1
-
- 27 Jul, 2015 1 commit
-
-
hui su authored
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
-
- 22 Jul, 2015 1 commit
-
-
Jingning Han authored
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward transform operations into vpx_dsp folder. Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
-
- 20 Jul, 2015 1 commit
-
-
Jingning Han authored
The SSE2 version high bit-depth forward hybrid transforms are essentially using the C functions via cross referencing to 1-D functions in vp9_dct.c. This commit unifies the two versions and removes the unnecessary dependency. Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
-
- 17 Jul, 2015 1 commit
-
-
Yunqing Wang authored
The following quantization functions were moved: vp9_quantize_b vp9_quantize_b_32x32 vp9_highbd_quantize_b vp9_highbd_quantize_b_32x32 vp9_quantize_dc vp9_quantize_dc_32x32 vp9_highbd_quantize_dc vp9_highbd_quantize_dc_32x32 The purpose of doing that was to allow these functions to be shared by multiple codecs. Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
-
- 16 Jul, 2015 1 commit
-
-
Jingning Han authored
The various tap loop filter operations are common functions across codec. This commit moves them along with SIMD optimizations to vpx_dsp folder. Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
-
- 15 Jul, 2015 1 commit
-
-
Frank Galligan authored
BUG=https://code.google.com/p/webm/issues/detail?id=1023 Change-Id: I212a1d67b23ce3b5ce08800de369b25b9e375e7d
-
- 14 Jul, 2015 1 commit
-
-
Alex Converse authored
Roughly half as many cycles as plain C. Change-Id: I8c16c29940b76d54ee7e4fb874c328ce90bff5d4
-
- 13 Jul, 2015 1 commit
-
- 08 Jul, 2015 1 commit
-
-
Alex Converse authored
80% fewer cycles than C Change-Id: I841bde1e268ddd33ae2ee75eee94737a400e2cde
-
- 07 Jul, 2015 1 commit
-
-
Johann authored
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
-
- 06 Jul, 2015 2 commits
-
-
Change-Id: If88401bf8c5d8ee58200278734d7a5058d1585d0
-
Jingning Han authored
Factor out the subtraction operator as common function. Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
-
- 02 Jul, 2015 2 commits
-
-
James Zern authored
This reverts commit a42df86c. this change causes MSA/VP9SubpelVarianceTest.Ref and MSA/VP9SubpelVarianceTest.ExtremeRef failures under mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
-
James Zern authored
This reverts commit 61774ad1. this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
-
- 01 Jul, 2015 4 commits
-
-
Johann authored
Change-Id: I0ed6de72dc0bb99fc9c5b1f6500399b16754ffb3
-
Johann authored
Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06
-
Parag Salasakar authored
average improvement ~3x-5x Change-Id: Iefbcafc05daab77b38a4e63b551e427867a501a4
-
Parag Salasakar authored
average improvement ~3x-5x Change-Id: I4cbba2711467b0e205904769ebbb4a1fcbb1a311
-
- 26 Jun, 2015 3 commits
-
-
Parag Salasakar authored
average improvement ~4x-5x Change-Id: Iad9c0a296dbc2ea96d000bd009077999ed58a3c5
-
Parag Salasakar authored
average improvement ~3x-4x Change-Id: Idbe4d13a00d05ff8be6559b116f416e42c3b4097
-
Parag Salasakar authored
average improvement ~3x-4x Change-Id: If0fdcc34b17437a7e3e7fb4caaf1067bc175f291
-
- 23 Jun, 2015 2 commits
-
-
Frank Galligan authored
BUG=https://code.google.com/p/webm/issues/detail?id=1022 Change-Id: I510c3b0a70158fa2e4da554f7c5d7558021a6ddf
-
James Zern authored
~90% faster over 20M pixels Change-Id: I92d80f66e91e0a870a672cfb5dd29bf1a17cb11a
-