- 06 Oct, 2013 1 commit
-
-
Jim Bankoski authored
Change-Id: Id5f25b74e2207bf44b6f6c8ffe548fa30fd78b4d
-
- 02 Oct, 2013 1 commit
-
-
Dmitry Kovalev authored
Moving functions from vp9_idct_blk to vp9_idct because these functions are used from both encoder and decoder. Removing duplicated code from vp9_encodemb.c and reusing existing functions. Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
-
- 27 Sep, 2013 2 commits
-
-
Dmitry Kovalev authored
Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1. Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
-
Dmitry Kovalev authored
Change-Id: I6be72c8b048d1ccc7ef43764cf84c32360098970
-
- 26 Sep, 2013 1 commit
-
-
Dmitry Kovalev authored
Making function name consistent with vp9_short_idct16x16 and vp9_short_idct16x16_1. Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
-
- 25 Sep, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ife0dd29fb4ad65c7e12ac5f1db8cea4ed81de488
-
- 23 Sep, 2013 1 commit
-
-
Jingning Han authored
This commit enables forcing all coefficients zero per transformed block, when its rate-distortion cost is lower than regular coeff quantization. The overall performance improvement (including its parent patch on calculating rd cost per transformed block) at speed 1: derf: 0.298% yt: 0.452% hd: 0.741% stdhd: 0.006% Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
-
- 19 Sep, 2013 1 commit
-
-
Dmitry Kovalev authored
Extracting get_scan_and_band function from get_entropy_context to remove duplicated code. Change-Id: I5da1f5a60263017e887da68bc834317b5f084cb2
-
- 11 Sep, 2013 1 commit
-
-
Scott LaVarnway authored
mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of pointers to MODE_INFO structs. The MODE_INFO structs are now stored as a stream (decoder only), eliminating unnecessary copies and is a little more cache friendly. Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
-
- 09 Sep, 2013 1 commit
-
-
James Zern authored
This reverts commit dae17734 Encode crashes, leaks and increases integer overflow errors. Change-Id: I595aa2649bb8d0b6552ff91652837a74c103fda2
-
- 07 Sep, 2013 1 commit
-
-
Jingning Han authored
The 16x16 transform unit test suggested that the peak coefficient value can reach 32639. This could cause potential overflow issue in the SSSE3 implmentation of 16x16 block quantization. This commit fixes this issue by replacing addition with saturated addition. Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
-
- 06 Sep, 2013 1 commit
-
-
Scott LaVarnway authored
mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of a pointer to a MODE_INFO struct and a "in the image" flag. The MODE_INFO structs are now stored as a stream, eliminating unnecessary copies and is a little more cache friendly. For the test clips used, the decoder performance improved by ~4.3% (1080p) and ~9.7% (720p). Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p) and 5.9% (720p). Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
-
- 28 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I752e374867d459960995b24d197301d65ad535e3
-
- 27 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I62bb07c377f947cb72fac68add7a6b199e42c6b9
-
- 21 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ib2c975e1d96deefb7ac4d6b600c8c5388035d111
-
- 19 Aug, 2013 2 commits
-
-
Dmitry Kovalev authored
Updating all foreach_transformed_block_visitor functions to work with plane block size instead of general block. Removing a lot of duplicated code. Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
-
Dmitry Kovalev authored
This change set is intermediate. The next one will remove all repetitive plane_bsize calculations, because it will be passed as argument to foreach_transformed_block_visitor. Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
-
- 16 Aug, 2013 2 commits
-
-
Dmitry Kovalev authored
Redundant dst, pre[2] from build_inter_predictors_args, unused cm from encode_b_args. Change-Id: I2c476cd328c5c0cca4c78ba451ca6ba2a2c37e2d
-
Dmitry Kovalev authored
Updating foreach_transformed_block_visitor and corresponding functions to accept tx_size instead of ss_txfrm_size. List of functions per file: vp9_decodframe.c decode_block decode_block_intra vp9_detokenize.c decode_block vp9_encodemb.c optimize_block vp9_xform_quant vp9_encode_block_intra vp9_rdopt.c dist_block rate_block block_yrd_txfm vp9_tokenize.c set_entropy_context_b tokenize_b is_skippable Change-Id: I351bf563eb36cf34db71c3f06b9bbc9a61b55b73
-
- 15 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Updated function signatures: txfrm_block_to_raster_block txfrm_block_to_raster_xy extend_for_intra vp9_optimize_b Change-Id: I7213f4c4b1b9ec802f90621d5ba61d5e4dac5e0a
-
- 14 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Making foreach_transformed_block_in_plane more clear (it's not finished yet). Using explicit tx_size variable consistently instead of (ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression. Change-Id: I1b9bba2c0a9f817fca72c88324bbe6004766fb7d
-
- 09 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82
-
- 08 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I040b82b8e32aee272d10cbb021c7ba1c76343d7a
-
- 07 Aug, 2013 1 commit
-
-
Jingning Han authored
The low precision 32x32 fdct has all the intermediate steps within 16-bit depth, hence allowing faster SSE2 implementation, at the expense of larger round-trip error. It was used in the rate-distortion optimization search loop only. Using the low precision version, in replace of the high precision one, affects the compression performance by about 0.7% (derf, stdhd) at speed 0. For speed 1, it makes derf set down by only 0.017%. Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
-
- 05 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I428c4d42212b757112e3acfe5b81314cfbb5fd6b
-
- 02 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Using it instead of long unclear verbose check "mbmi->ref_frame[0] != INTRA_FRAME". Change-Id: I9c7b4b3797942fa962bf3ba7460fff3084beabe9
-
- 01 Aug, 2013 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I27471768980fc631916069f24bc7c482a5c9ca17
-
- 29 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit provides special handle on 16x16 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero value. Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
-
- 27 Jul, 2013 2 commits
-
-
Ronald S. Bultje authored
This allows us to increment the position at the band-level only as we go from one band to the next; more importantly, that allows us to use an add instead of multiply instruction, and omit the instruction altogether if the band doesn't change from one coef to the next, thus being slightly faster (probably more noticeable on systems where a multiply is expensive, like arm). Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
-
Jingning Han authored
This commit brought back the shortcut implementation of 8x8/16x16 inverse 2D-DCT. When the eob <= 10, it skips the inverse transform operations on row 4:7/4:15 in the first round. For bus_cif at 1000 kbps, this provides about 2% speed-up at speed 0. Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
-
- 26 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit enables a special handle for the 8x8 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero. For bus_cif at 2000 kbps, it provides about 1% speed-up at speed 0. Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
-
- 25 Jul, 2013 1 commit
-
-
Jingning Han authored
This commit makes the initialization of trellis coeff optimization a per-plane operation, thereby eliminating the redundant steps in encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly faster. Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
-
- 24 Jul, 2013 1 commit
-
-
Dmitry Kovalev authored
Adding plane type check condition because it was always used outside of get_tx_type_{4x4, 8x8, 16x16}. Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
-
- 23 Jul, 2013 4 commits
-
-
Jingning Han authored
The struct optimize_block_args is defined same as encode_b_args. Remove this redundant definition, and use encode_b_args consistently. Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
-
Jingning Han authored
The xform_quant() module is only used by inter modes, hence removing the redundant switches therein conditioned on tx_type. Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
-
Jingning Han authored
When all the transform coefficients were quantized to zero, skip the inverse transform operation. For bus_cif at 1000 kbps, the runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up, at speed 0. Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
-
Jim Bankoski authored
many structures use bw and bh and they have different meanings. This cl attempts to start this clean up and remove unneccessary 2 step look up log and then shift operations... also removed partition type multiple operation code in bitstream.c. Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
-
- 16 Jul, 2013 1 commit
-
-
Ronald S. Bultje authored
Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
-
- 15 Jul, 2013 2 commits
-
-
Ronald S. Bultje authored
Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
-
Jingning Han authored
Skip the inverse transform and reconstruction of inter-mode coded blocks in the rate-distortion optimization loop, when skip_encode_sb feature is turned on. This provides about 1% speed-up at speed 0, and 1.5% speed-up at speed 1. No performance change in both settings. Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
-