- 25 Jul, 2013 1 commit
-
-
Dmitry Kovalev authored
Removing unused constants, macros, and function declarations. Using ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving #include from *.h to *.c. Merging for loops for motion vectors. Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
-
- 22 Jul, 2013 1 commit
-
-
Ronald S. Bultje authored
4x4: 163 -> 123 cycles (33% faster) 8x8: 491 -> 399 cycles (23% faster) 16x16: 1889 -> 1763 cycles (7% faster) 32x32: 8311 -> 8180 cycles (1.6% faster) Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from 1min4.33 to 1min3.00, i.e. 2.11% faster. Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
-
- 16 Jul, 2013 1 commit
-
-
Dmitry Kovalev authored
Removing unused and duplicated constants, moving them from *.h to *.c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
-
- 01 Jul, 2013 2 commits
-
-
Ronald S. Bultje authored
This should significantly speedup cost_coeffs(). Basically what the patch does is to make the neighbour arrays padded by one item to prevent an eob check in get_coef_context(), then it populates each col/row scan and left/top edge coefficient with two times the same neighbour - this prevents a single/double context branch in get_coef_context(). Lastly, it populates neighbour arrays in pixel order (rather than scan order), so we don't have to dereference the scantable to get the correct neighbours. Total encoding time of first 50 frames of bus (speed 0) at 1500kbps goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase. Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
-
Ronald S. Bultje authored
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
-
- 28 Jun, 2013 1 commit
-
-
Ronald S. Bultje authored
Makes cost_coeffs() a lot faster: 4x4: 236 -> 181 cycles 8x8: 888 -> 588 cycles 16x16: 3550 -> 2483 cycles 32x32: 17392 -> 12010 cycles Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup. Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
-
- 24 Jun, 2013 1 commit
-
-
John Koleszar authored
This function is not called from the decoder, so it doesn't need to be in common/. Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
-
- 21 Jun, 2013 1 commit
-
-
John Koleszar authored
This function never referenced. Change-Id: I1c42cd355bfa88e17d169f7335a44be682af58cc
-
- 14 Jun, 2013 1 commit
-
-
John Koleszar authored
All elements of this table are equal to 252, so replace it with a single constant VP9_COEF_UPDATE_PROB. Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9
-
- 29 May, 2013 1 commit
-
-
Deb Mukherjee authored
This patch changes the coefficient tree to move the EOB to below the ZERO node in order to save number of bool decodes. The advantages of moving EOB one step down as opposed to two steps down in the other parallel patch are: 1. The coef modeling based on the One-node becomes independent of the tree structure above it, and 2. Fewer conext/counter increases are needed. The drawback is that the potential savings in bool decodes will be less, but assuming that 0s are much more predominant than 1's the potential savings is still likely to be substantial. Results on derf300: -0.237% Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
-
- 28 May, 2013 1 commit
-
-
Deb Mukherjee authored
Uses reduced arrays for probabilities and branch counts in the encoder. No change in bitstream. Change-Id: Iec605446f44db4cd325eb45fa12a3003a6ee29db
-
- 24 May, 2013 1 commit
-
-
Paul Wilkins authored
Also some unused data structures/references removed. Change-Id: I295809e887173543e794250cb60ddaf1475ffd24
-
- 23 May, 2013 1 commit
-
-
Paul Wilkins authored
Removal from under configure flag. A bit renaming Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
-
- 22 May, 2013 1 commit
-
-
Deb Mukherjee authored
Reverts to using 128 bit LUT for the coef models rather than 48 to ease hardware implementation. Also incorporates some cleanups including removing various hooks to support different lookup tables based on block_type and ref_type. Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
-
- 21 May, 2013 2 commits
-
-
Deb Mukherjee authored
Merges the experiment. Change-Id: I4eb19af6de6df6aa3a96a2e82f231d47ed9b3ae9
-
Deb Mukherjee authored
Uses more aggrerssive interpolation to reduce storage for the model tables by almost more than half. Only 48 lists of probs are stored (as opposed to 128 before), corresponding to ONE_NODE probabilities of: 1, 3, 7, 11, ..., 115, 119, 127, 135, ..., 247, 255. Besides, only 1 table is used as opposed to 2 before. So the overall memory needed for the tables is just 48 * 8 = 384 bytes. The table currently used is based on a new Pareto distribution with heavier tail than a generalized Gaussian - which improves results on derf by about 0.1% over a single table Generaized Gaussian. Results overall on derfraw300 is -0.14%. Change-Id: I19bd03559cbf5894a9f8594b8023dcc3e546f6bd
-
- 20 May, 2013 1 commit
-
-
Deb Mukherjee authored
Cleans up the experiment. Actually uses reduced counts for backward updates, and reduced number of probabilities in the context. No change in bitstream when the experiment is on. Between expt on and off: derfraw300 is down only -0.062% (which is better than when expts were run previously). Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
-
- 13 May, 2013 1 commit
-
-
Paul Wilkins authored
Change band calculation back to simpler model based on the order in which coefficients are coded in scan order not the absolute coefficient positions. With the scatter scan experiment enabled the results were appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134). Without the scatterscan experiment on the results were up derf as well. Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
-
- 07 May, 2013 2 commits
-
-
Dmitry Kovalev authored
Change-Id: I81c19a8f19cfb5c7183609656ade833d72feb500
-
Paul Wilkins authored
Delete code under the CONFIG_CODE_ZEROGROUP flag. Change-Id: I5fe6c7b42a5da9b73118e33594301da4129f320a
-
- 29 Apr, 2013 2 commits
-
-
Ronald S. Bultje authored
Output changes slightly because of a minor bug in (at least) the sb32x16 block2above tx16x16 tables that previously existed in vp9_blockd.c. Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
-
Deb Mukherjee authored
Turns model based reverse updates on for coefficients in an effort to reduce the memory requirement for counters. With this patch the counters needed will be reduced by about 75% since only 3 counts are needed instead of 12. The impact in performance is: derf300: -0.252% stdhd250: -0.046% However retraining should alleviate some of the drop in performance. Change-Id: I6f2b3e13f6d5520aa3400b0b228fb5e8b4a43caa
-
- 26 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I087e08e7909a406b71715b8525c104208daa6889
-
- 22 Apr, 2013 3 commits
-
-
Dmitry Kovalev authored
Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438
-
Deb Mukherjee authored
This patch does not seem to give any benefits. Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db
-
Deb Mukherjee authored
Adds an experiment that codes an end-of-orientation symbol for every eligible zero encountered in scan order. This cleans out various other sub-experiments that were part of the origiinal patch, which will be later included if found useful. Results are slightly positive on all sets (0.1 - 0.2% range). Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
-
- 19 Apr, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ie4713da125e954c1d30e1d4cbeb38666fce90ccc
-
- 11 Apr, 2013 3 commits
-
-
Deb Mukherjee authored
This patch changes the default with the modecoefprob expt to use mode-based forward updates with one-node pegged modeling. The maximum difference with fully trained tables is now less that 0.1%. Change-Id: I06b44322e10c6703f93f3c1d48d973b1136a0618
-
Dmitry Kovalev authored
Change-Id: If69c3d795f87af5cc7bfdfe70ef733c41b4d55c8
-
John Koleszar authored
Use sb-common version instead. Change-Id: If2552b5a39fd2e5272f66a41c5667dda85fd3939
-
- 10 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
-
- 26 Mar, 2013 4 commits
-
-
Ronald S. Bultje authored
These are mostly just for experimental purposes. I saw small gains (in the 0.1% range) when playing with this on derf. Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
-
Ronald S. Bultje authored
Now that the first AC coefficient in both directions use the same DC as their context, there no longer is a purpose in letting both have their own band. Merging these two bands allows us to split bands for some of the very high-frequency AC bands. In addition, I'm redoing the banding for the 1D-ADST col/row scans. I don't think the old banding made any sense at all (it merged the last coefficient of the first row/col in the same band as the first two of the second row/col), which was clearly an oversight from the band being applied in scan-order (rather than in their actual position). Now, coefficients at the same position will be in the same band, regardless what scan order is used. I think this makes most sense for the purpose of banding, which is basically "predict energy for this coefficient depending on the energy of context coefficients" (i.e. pt). After full re-training, together with previous patch, derf gains about 1.2-1.3%, and hd/stdhd gain about 0.9-1.0%. Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
-
Ronald S. Bultje authored
Pearson correlation for above or left is significantly higher than for previous-in-scan-order (absolute values depend on position in scan, but in general, we gain about 0.1-0.2 by using either above or left; using both basically just makes this even better). For eob branch skipping, we continue to use the previous token in scan order. This helps about 0.9% on derf after re-training on a limited data set. Full re-training and results on larger-resolution clips are pending. Note that this commit breaks trellis, so we can probably get further gains out of it by fixing trellis at some later point. Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
-
Deb Mukherjee authored
Replaces the default tables for single coefficient magnitudes with those obtained from an appropriate distribution. The EOB node is left unchanged. The model is represeted as a 256-size codebook where the index corresponds to the probability of the Zero or the One node. Two variations are implemented corresponding to whether the Zero node or the One-node is used as the peg. The main advantage is that the default prob tables will become considerably smaller and manageable. Besides there is substantially less risk of over-fitting for a training set. Various distributions are tried and the one that gives the best results is the family of Generalized Gaussian distributions with shape parameter 0.75. The results are within about 0.2% of fully trained tables for the Zero peg variant, and within 0.1% of the One peg variant. The forward updates are optionally (controlled by a macro) model-based, i.e. restricted to only convey probabilities from the codebook. Backward updates can also be optionally (controlled by another macro) model-based, but is turned off by default. Currently model-based forward updates work about the same as unconstrained updates, but there is a drop in performance with backward-updates being model based. The model based approach also allows the probabilities for the key frames to be adjusted from the defaults based on the base_qindex of the frame. Currently the adjustment function is a placeholder that adjusts the prob of EOB and Zero node from the nominal one at higher quality (lower qindex) or lower quality (higher qindex) ends of the range. The rest of the probabilities are then derived based on the model from the adjusted prob of zero. Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
-
- 12 Mar, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I07ddf3be8bc5d6c2eb561d4241879777c315b183
-
- 09 Mar, 2013 1 commit
-
-
Deb Mukherjee authored
Adds probability updates for extra bits for the nzcs, code for getting nzc stats, plus some minor cleanups and fixes. Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
-
- 07 Mar, 2013 1 commit
-
-
Deb Mukherjee authored
This patch revamps the entropy coding of coefficients to code first a non-zero count per coded block and correspondingly remove the EOB token from the token set. STATUS: Main encode/decode code achieving encode/decode sync - done. Forward and backward probability updates to the nzcs - done. Rd costing updates for nzcs - done. Note: The dynamic progrmaming apporach used in trellis quantization is not exactly compatible with nzcs. A suboptimal approach has been used instead where branch costs are updated to account for changes in the nzcs. TODO: Training the default probs/counts for nzcs Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
-
- 05 Mar, 2013 1 commit
-
-
Ronald S. Bultje authored
Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
-
- 23 Feb, 2013 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
-