- 10 Oct, 2017 2 commits
-
-
Rupert Swarbrick authored
For large blocks this is about 8x the speed of the C version. The code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit multiplies. The patch uses av1_convolve_scale_test (written already to test the low bit depth path) to make sure the optimised code matches the C version. Change-Id: I9304d6bb3d2cb31390de93ed08ff1a852e3ace86
-
Rupert Swarbrick authored
For large blocks this is almost 8x the speed of the C version. The code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit multiplies. This patch also makes av1_convolve_scale_test actually test something, making sure the optimised code matches the C version. The slightly excessive generality in the test (all the templating) is because of a following patch, which is for the high bit depth path and can then use most of the same test code. Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
-
- 09 Oct, 2017 11 commits
-
-
Angie Chiang authored
Since 32x32 transform use DCT only, we can avoid update other types of transform Change-Id: I51dd8ec71975187d249d7e25130e994a48cac5c1
-
Sarah Parker authored
0.15% improvement on lowres set Change-Id: If16a8e07797c64508f9e2d9b26ae874ac53c57a4
-
Rupert Swarbrick authored
There's a bitstream conformance requirement that says that any block must subsample to a valid block size with the current subsampling mode. For example, this means that BLOCK_4X8 is illegal if there is subsampling in only the horizontal direction (since there is no BLOCK_2X8). This patch checks the bitstream is conformant as it reads partition information in decodeframe.c BUG=aomedia:875 Change-Id: I18139aa76d6f965282402edbb0b68959478a46c3
-
Urvang Joshi authored
Introduced by: https://aomedia-review.googlesource.com/c/aom/+/25181 Change-Id: I1f25178d6b273fbeade4c33f153b5f2bac4a8b99
-
Rupert Swarbrick authored
This unit test doesn't actually provide any test coverage and merely exists to benchmark the C function, av1_convolve_2d_scale_c. The following patch will add an SSE version of that function and extend this test to check that the SSE code matches the C code. Change-Id: Ic942ad8f9fd57d2659fc60f92c5a0b6c9a9f8cac
-
Debargha Mukherjee authored
Change-Id: I73e9d2d327b062828a75bc99fb348441dd32174a
-
Debargha Mukherjee authored
Change-Id: Iaff923f34100ecdce76d2319fab67cde59d485ae
-
Cheng Chen authored
Change-Id: I23344af711d9a31b819fca35ae3ad3b7edf4852e
-
Rupert Swarbrick authored
This returns true if a block signals tx_size in the stream and uses it in the bitstream writing code and the decoder. Note that we can't quite use it in pack_inter_mode_mvs when CONFIG_VAR_TX && !CONFIG_RECT_TX but I've switched the code to using it the rest of the time since rect-tx is adopted and eventually the other code path should be deleted. Also use the helper function in tx_size_cost in rdopt.c, where the test was wrong and caused underestimates of block costs. (Specifically, the code that subtracts tx_size_cost from this_rate_tokenonly in rd_pick_intra_sby_mode ended up subtracting zero for a 4x8 block). The behaviour of the decoder should be unchanged. The only change in the encoder's behaviour should be in tx_size_cost where it should now match the rest of the code. Change-Id: I97236c9ce444993afe01ac5c6f4a0bb9e5049217
-
Zoe Liu authored
This coding tool is to introduce a new prediction mode for the bi-predictive frames that have a forward referernce within 2 frames away (distance denoted as 'fwd_delta'), and a backward reference, within (3-fwd_delta) frames away. If this prediction mode, namely 'ext_skip' is set, it will be coded using compound prediction with the most recent forward and backward reference frames as its reference pair, NEARESTMV as its motion mode, and the skip flag is set for the residue. Change-Id: I826034ccf1a956f4b350f0bc2e2dca8ea71b5197
-
Zoe Liu authored
Frame sign bias value will not be signaled in frame header. Instead, the sign bias of reference frames are derived from their corresponding frame offsets at both encoder and decoder. The tool of 'frame_sign_bias' is dependent of 'frame_marker'. Compared against baseline, the enabling of both tools obtains a small coding gain of -0.08 ~ -0.11% in BDRate over Google lowres/midres tests. Change-Id: I8d85dc427ced0b2152712ccf61be4be6068075b9
-
- 08 Oct, 2017 10 commits
-
-
Cheng Chen authored
Change-Id: I5446327378938128f27186015619a079c2845d53
-
Debargha Mukherjee authored
Change-Id: I71c07652565c0e1ca44d73f3731459949271fe45
-
Debargha Mukherjee authored
Solves some Windows build issues Change-Id: Ia903ed05285362449829a2777999cf73058f7733
-
Zoe Liu authored
This coding tool is dependent on the tool of frame_marker. This tool derives the frame sign bias directly from the frame offset. No sign bias signaling is needed. Change-Id: I3a8c77904d73caeeb1b6777fb026279fd2bbc6fb
-
Yunqing Wang authored
Add an experiment "tmp", which includes: 1. Always use larger block size while storing frame MVs and make it consistent for CB4X4 or non-CB4X4 cases. Namely, use 8x8 for 4x4 mi size and 16x16 for 8x8 mi size. 2. Allocate smaller buffer for frame MVs and save memory usage. 3. Use nearby 8x8 or 16x16 location's previous frame MVs, and make the logic simple. 4. Reduce the number of copying for frame MVs, that is very costly in decoder. Baseline decoder got 5+% speedup. Borg test on lowres set showed a +0.009% PSNR difference before/after the patch. Change-Id: I61e14e95fd35bea88f338931b4f43c44f4e4cf1f
-
Debargha Mukherjee authored
Change-Id: I16cee2064ddc80f80a21560e9d192a39033949ca
-
Debargha Mukherjee authored
Change-Id: I599f8fbdd3c19ec67d9a2118a41d735e11dd3f07
-
Zoe Liu authored
Change-Id: Ibdcb1530b9f81a2a5222e95cf5c0b7b2938509a8
-
Debargha Mukherjee authored
Change-Id: Ia6731231f860c3ca689240c777463d8b232b3901
-
Debargha Mukherjee authored
Various fixes for pvq build. Change-Id: Ideebdb072ed5786f3224e93ded5ec75a23e68dab
-
- 07 Oct, 2017 11 commits
-
-
Luc Trudeau authored
Change-Id: I13ba0dbe57297b540b78512d21a119f05a86a849
-
Luc Trudeau authored
high bit depth (_hbd) and low bit depth (_lbd) versions of the cfl functions: sum_above_row, sum_left_col, cfl_build_prediction, cfl_luma_subsampling_420 (4:4:4 will be added in subsequent commit) and cfl_alpha_dist. For cfl_alpha_dist, special care is given to scale the SSE according to the bit depth. BUG=aomedia:835 Change-Id: I5b72845100d88fb8a438efe665bcae7fe1ba50b8
-
Urvang Joshi authored
When enabled, scaling through resize and superres will occur only in the frame's width; the height will not be scaled. Macro is off by default. Change-Id: I501b2b0b2766aa4a86da5937b57c4d5aee4e34c4
-
Urvang Joshi authored
Change-Id: I27292b7cdb27cec23754a6f017c5c7c55eb38bb5
-
Debargha Mukherjee authored
ext-partition-types and supertx are incompatible Change-Id: I6c4cce16453cff13b0acbaad93dde7d089891038
-
Urvang Joshi authored
Earlier, the superres scale was in the form of: N/16, where N ranged from 8 to 16. We change this to the form: 8/D, where D ranges from 8 to 16. This helps on the decoder side, by making it possible to work on 8x8 blocks at a time. Change-Id: I6c72d4b3e8d1c830e61d4bb8d7f6337a100c3064
-
Urvang Joshi authored
cm->superres_scale_numerator is used for both keyframes and non-keyframes, and is initialized from either oxcf->superres_scale_numerator or oxcf->superres_kf_scale_numerator as appropriate. Change-Id: Ie46df576ef3830e181643ae591d836449a4bd38f
-
Rupert Swarbrick authored
The restoration tiles (rtiles) divide the upscaled frame, not the encoded one. Change-Id: I2d08fe926d694fee7064461685289d3fd1c1de0c
-
Debargha Mukherjee authored
This optimization for speed was useful only when max tx-size was 32x32. However with tx64x64 this was breaking certain assumptions causing huge drops in coding efficiency. So I am removing this optimization for now. This can be brought back latger as a speed feature. The removal of this optimzation brings back the loss when 32x64 and 64x32 transforms are used. Change-Id: I15987ea9ff53fa36a2962fe5f156c30a11e809ed
-
Joe Young authored
The SSE4 function filter_intra_edge_sse4_1() reads data slightly past the initialized part of the array. Those data are discarded later, but causes a valgrind warning. This change avoids the warning by initializing the array an extra +16 positions. BUG=aomedia:868 Change-Id: Ib610492cff91492ae379c5d62895773f8747c4bc
-
Luc Trudeau authored
To simplify high bit depth commit, the summing the top row and the left column are extracted out of cfl_dc_pred. This does not change the bitstream. Change-Id: I5c9fe91df4942f736c5af29c1d93abb3a6c8501f
-
- 06 Oct, 2017 6 commits
-
-
Jingning Han authored
Reduce the context model size for key frame modes from 30240 bits to 4500 bits, i.e., less than 1/6 of the original context model. The coding performance loss on key frame is 0.14% for lowres and noise level difference for video sequence. The loss on key frame for midres is 0.05% and noise level for whole video. The change on hdres kf coding is 0.015%. Change-Id: I9e36825e5c5ee6ba35038c3ca349ad1ad3429910
-
Debargha Mukherjee authored
When ext-partition and ncobmc-adapt-weight is on, avoid too large stack allocations. Change-Id: I8db74e45cac80c4e5dfd9e20cfc73d9978d1578e
-
Angie Chiang authored
Change-Id: I923931a9dbf828eb13670511852d55c953b479c1
-
Sebastien Alaiwan authored
This is undefined behaviour in C99 and could mislead the optimizer. This fixes the ubsan warning, and still generates optimal code (i.e an inlined 'sar' instruction). Change-Id: I36b20a6780532b8c9379b9fbfd970933d56b1bc5
-
Alexander Bokov authored
Average speed-up (lowres): low bitrates: 6.6% mid bitrates: 2.5% high bitrates: 0.0% Average PSNR loss: lowres: 0.010% midres: 0.005% Change-Id: Id34fb247e5e31f04ca324c58142e4b5ac4edacda
-
Yi Luo authored
On i7-6700: Predictor ssse3 v. C 4x4 ~1.3x 4x8 ~1.9x 8x4 ~2.3x 8x8 ~3.4x 8x16 ~4.1x 16x8 ~4.6x 16x16 ~5.2x 16x32 ~5.6x 32x16 ~4.2x 32x32 ~4.7x Change-Id: Ic12383cf9d4446361d6355eb8a480a3c7602060e
-