- 10 Mar, 2013 1 commit
-
-
John Koleszar authored
The previous implementation visited each node in the tree multiple times because it used each symbol's encoding to revisit the branches taken and increment its count. Instead, we can traverse the tree depth first and calculate the probabilities and branch counts as we walk back up. The complexity goes from somewhere between O(nlogn) and O(n^2) (depending on how balanced the tree is) to O(n). Only tested one clip (256kbps, CIF), saw 13% decoding perf improvement. Note that this optimization should port trivially to VP8 as well. In VP8, the decoder doesn't use this function, but it does routinely show up on the profile for realtime encoding. Change-Id: I4f2848e4f41dc9a7694f73f3e75034bce08d1b12
-
- 05 Mar, 2013 1 commit
-
-
Ronald S. Bultje authored
Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
-
- 04 Mar, 2013 2 commits
-
-
Yunqing Wang authored
Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder performance. Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
-
Jingning Han authored
Fixed a couple of variable/function definitions, as well as header handling to support 16K sequence coding at high bit-rates. The width and height are each specified by two bytes in the header. Use an extra byte to explicitly indicate the scaling factors in both directions, each ranging from 0 to 15. Tested coding up to 16400x16400 dimension. Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
-
- 02 Mar, 2013 2 commits
-
-
John Koleszar authored
Update the function prototypes to match between VP9 and VP8. Change-Id: If58965073989e87df3b62b67a030ec6ce23ca04f
-
Dmitry Kovalev authored
Change-Id: Iab0176f058045181821ded95ff1cf423af1625f9
-
- 01 Mar, 2013 1 commit
-
-
Yunqing Wang authored
Simplified idct32x32 calculation when there are only 10 or less non-zero coefficients in 32x32 block. This helps the decoder performance. Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
-
- 28 Feb, 2013 5 commits
-
-
Yunqing Wang authored
Provided a wrapper and removed duplicate code. Change-Id: Iaef842226ec348422e459202793b001d0983ea30
-
Scott LaVarnway authored
Change-Id: Ie89bd00d58e30bf4094cb748a282f1dfa81a31d8
-
John Koleszar authored
The ABOVESPREFMV experiment uses four pixels to the left of the current block, which don't exist for the left-most column. Change-Id: I4cf0b42ae8f54c0b3e7b1ed8755704b74fafc39c
-
Jim Bankoski authored
sse4_1 code used uint16_t for returning sad, but that won't work for 32x32 or 64x64. This code fixes the assembly for those and also reenables sse4_1 on linux Change-Id: I5ce7288d581db870a148e5f7c5092826f59edd81
-
Christian Duvivier authored
Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
-
- 27 Feb, 2013 10 commits
-
-
Dmitry Kovalev authored
Fixing code style, using array lookup instead of switch statements for forward hybrid transforms (in the same way as for their inverses). Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places. Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
-
Yunqing Wang authored
Removed vp9_idctllm_mmx.asm Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe
-
Ronald S. Bultje authored
Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
-
John Koleszar authored
This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d
-
Jan Kratochvil authored
s/movd/movq/ Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626
-
John Koleszar authored
This avoids duplicating all the filters twice. Includes fixups to the convolve routines and associated tests to make this work. Change-Id: I922f86021594e55072ddb63b42b2313605db6e00
-
John Koleszar authored
This patch extends the previous support for using references of a different resolution in ZEROMV mode to all inter prediction modes. Subpixel based best-mv scoring is disabled when the reference frame differs in resolution from the current frame. Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
-
John Koleszar authored
This commit updates the 4x4 prediction to consistently use the build_2x1_inter_predictor() method. That function is updated to calculate the scale offset, rather than relying on the caller to calculate it. In the case that the 2x1 prediction can not be used, the scale offset is recalculated for each 1x1 block. The idea here is that the offsets are calculated before each call to vp9_build_scaled_inter_predictor(). Change-Id: I0ac3343dd54e2846efa3c4195fcd328b709ca04d
-
John Koleszar authored
This patch allows coding frames using references of different resolution, in ZEROMV mode. For compound prediction, either reference may be scaled. To test, I use the resize_test and enable WRITE_RECON_BUFFER in vp9_onyxd_if.c. It's also useful to apply this patch to test/i420_video_source.h: --- a/test/i420_video_source.h +++ b/test/i420_video_source.h @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource { virtual void FillFrame() { // Read a frame from input_file. + if (frame_ != 3) if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) { limit_ = frame_; } This forces the frame that the resolution changes on to be coded with no motion, only scaling, and improves the quality of the result. Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
-
Yunqing Wang authored
Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
-
- 26 Feb, 2013 4 commits
-
-
Dmitry Kovalev authored
Change-Id: I893fa36297b9bd9cff93d082f1736f6860b15c0d
-
John Koleszar authored
Ensure that all inter prediction goes through a common code path that takes scaling into account. Removes a bunch of duplicate 1st/2nd predictor code. Also introduces a 16x8 mode for 8x8 MVs, similar to the 8x4 trick we were doing before. This has an unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that case for now. Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
-
Yaowu Xu authored
The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
-
Dmitry Kovalev authored
Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
-
- 25 Feb, 2013 3 commits
-
-
Dmitry Kovalev authored
Removing switch statements for inverse hybrid transforms. Making code style consistent for all similar transform implementations. Renaming shortpitch and short_pitch variables to half_pitch. Change-Id: I875f7a82aae4e8063a58777bf1cc3f1e67b48582
-
Dmitry Kovalev authored
Removing redundant parentheses, better code formatting, introducing ROUND_POWER_OF_TWO macro to replace repeated expression. Change-Id: I91aad7a53ed03482428b2419de4bb99fd92c6771
-
Jingning Han authored
Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
-
- 23 Feb, 2013 3 commits
-
-
Ronald S. Bultje authored
Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
-
Paul Wilkins authored
This patch alters the balance of context between the coefficient bands (reflecting the position of coefficients within a transform blocks) and the energy of the previous token (or tokens) within a block. In this case the number of coefficient bands is reduced but more previous token energy bands are supported. Some initial rebalancing of the default tables has been by running multiple derf clips at multiple data rates using the ENTOPY_STATS macro. Further balancing needs to be done using larger image formatsd especially in regard to the bigger transform sizes which are not as well represented in encodings of smaller image formats. Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
-
James Zern authored
variance_vtable clashed with vp8/common/variance.h Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f
-
- 22 Feb, 2013 2 commits
-
-
Dmitry Kovalev authored
Removing redundant 'extern' keywords and parentheses, fixing indentation, making variable names lower case, using short expressions x *= c instead of x = x * c, minor code simplifications. Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
-
Jingning Han authored
This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
-
- 21 Feb, 2013 2 commits
-
-
Ronald S. Bultje authored
The information is a duplicate of "eob" in BLOCKD. Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3
-
Deb Mukherjee authored
Refactors the switchable filter search in the rd loop to improve encode speed. Uses a piecewise approximation to a closed form expression to estimate rd cost for a Laplacian source with a given variance and quantization step-size. About 40% encode time reduction is achieved. Results (on a feb 12 baseline) show a slight drop: derf: -0.019% yt: +0.010% std-hd: -0.162% hd: -0.050% Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0
-
- 20 Feb, 2013 3 commits
-
-
Dmitry Kovalev authored
Change-Id: I7c6e3bebd94856b24dbe2aded7f9e04ef8bb8c08
-
Yaowu Xu authored
Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
-
Tero Rintaluoma authored
- Using multiplication and shifting instead of division in intra prediction. - Maximum absolute difference is 1 for division statements in d45, d27, d63 prediction modes. However, errors can cumulate for large block sizes when using already predicted values. - Maximum number of non-matching result values in loops using division are: 4x4 0/16 8x8 0/64 16x16 10/256 32x32 13/1024 64x64 122/4096 Overall PSNR derf: 0.005 yt: -0.022 std-hd: 0.021 hd: -0.006 Change-Id: I3979a02eb6351636442c1af1e23d6c4e6ec1d01d
-
- 19 Feb, 2013 1 commit
-
-
Jingning Han authored
rebased. This patch includes 16x16 butterfly inverse ADST/DCT hybrid transform. It uses the variant ADST of kernel sin((2k+1)*(2n+1)/4N), which allows a butterfly implementation. The coding gains as compared to DCT 16x16 are about 0.1% for both derf and std-hd. It is noteworthy that for std-hd sets many sequences gains about 0.5%, some 0.2%. There are also few points that provides -1% to -3% performance. Hence the average goes to about 0.1%. Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
-