- 02 Feb, 2018 13 commits
-
-
Debargha Mukherjee authored
Change-Id: I01cecc829e2d57517427a1de6387e91ba3c64312
-
Imdad Sardharwalla authored
The SSE4.1 and AVX2 implementations of the self-guided filter have been updated to match the updated FAST_SGR C implementation in restoration.c. The self-guided filter speed tests have been altered to compare the speeds of the SIMD and C implementations of the relevant functions. Speed Tests (code compiled with CLANG) =========== For LowBD: - The SSE4.1 implementation is ~220% faster (~69% less time) than the C code - The AVX2 implementation is ~314% faster (~76% less time) than the C code For HighBD: - The SSE4.1 implementation is ~240% faster (~71% less time) than the C code - The AVX2 implementation is ~343% faster (~77% less time) than the C code Change-Id: Ic2734bb89ccd3f66667c68647e5f677a5a496233
-
Angie Chiang authored
Change-Id: I8dcaa6882d47a097498c8f8af515b1185df4fdf3
-
Hui Su authored
In preparation for supporting q_adapt_probs. Change-Id: I4a39b81b0d2c4ceb1586ae411a1216c6c20d896d
-
Hui Su authored
Reduce the length of inter_tx_size[] from 1024 to 16. On a cif test sequence, encoder memory consumption decreases by 18% (380MB -> 312MB); decoder memory consumption decreases by 56% (21.4MB -> 9.4MB). Change-Id: I42928eb9312748f96f4393c8d8040791f38f98b6
-
Frederic Barbier authored
Change-Id: I91f18c498c694829b933bb73812ad94d66962994
-
Imdad Sardharwalla authored
Added an AVX2 version of the Wiener filter, along with associated tests. Speed tests have been added for all implementations of the Wiener filter. Speed Test results ================== GCC --- Low bit-depth filter: - SSE2 vs C: SSE2 takes ~92% less time - AVX2 vs C: AVX2 takes ~96% less time - SSE2 vs AVX2: AVX2 takes ~43% less time (~74% faster) High bit-depth filter: - SSSE3 vs C: SSSE3 takes ~92% less time - AVX2 vs C: AVX2 takes ~96% less time - SSSE3 vs AVX2: AVX2 takes ~46% less time (~84% faster) CLANG ----- Low bit-depth filter: - SSE2 vs C: SSE2 takes ~84% less time - AVX2 vs C: AVX2 takes ~88% less time - SSE2 vs AVX2: AVX2 takes ~27% less time (~36% faster) High bit-depth filter: - SSSE3 vs C: SSSE3 takes ~85% less time - AVX2 vs C: AVX2 takes ~89% less time - SSS3 vs AVX2: AVX2 takes ~24% less time (~31% faster) Change-Id: Ide22d7c09c0be61483e9682caf17a39438e4a208
-
Debargha Mukherjee authored
Changes the CONFIG_FAST_SGR=1 strategy to not use any subsampling for the r=1 filter, but for the r=2 filter sub-sample vertically but combine only by filtering horizontally in the last stage for odd rows. Coding efficiency loss sems quite minimal. Change-Id: I5644ac400b387c37a2d278db7f6ad3ac0a6b5e93
-
Debargha Mukherjee authored
Change-Id: I6138519456b2ad3ffc8bced803ddc4418b246e74
-
Debargha Mukherjee authored
Some parameter tuning included. lowres (q, 30 frames, speed 1): -1.243% av PSNR, -2.337% ov PSNR, +0.577% SSIM lowres (vbr, 30 frames, speed 1): -0.327% av PSNR, -1.007% ov PSNR, +0.182% SSIM A few videos become a lot worse in SSIM, which needs to be investigated. But PSNR-wise the patch seems pretty good. Change-Id: I17c8d812c96ee49ddae7d3959a459aa3ffcea208
-
Peng Bin authored
Since aom_comp_mask_upsampled_pred just call aom_upsampled_pred and aom_comp_mask_pred, no need to separate c version from simd version any more. Change-Id: I1ff8bcae87d501c68a80708fd2dc6b74c6952f88
-
Yaowu Xu authored
BUG=aomedia:1306 Change-Id: I5a8bdbd472213ded2de706c5b044a1bf24823670
-
Jingning Han authored
The current aq mode encoder setting would alter the segment_id between the rate-distortion optimization and the block encoding stages. Disable the corresponding consistency check in this case. BUG=aomedia:1251 Change-Id: Ic910a23fd64a9b4554567d3c8c9a9ae5f6062c7b
-
- 01 Feb, 2018 14 commits
-
-
James Zern authored
until av1_idct*_new are optimized this function sits high on the decode perf Change-Id: Ic55c9a92b9926fc09eaee211a45fde00333b7c15
-
Debargha Mukherjee authored
Change-Id: I1d7f33546053615a334b67b75147bd5e027a545b
-
Debargha Mukherjee authored
Change-Id: Ia34909cc6edc20f17a777e0b7bff97a62e0ac0c2
-
Jonathan Matthews authored
This reverts Change-Id: Ie11dd055255d200954b704b8c2ad8ca3dff7bf5c BUG=aomedia:1305 Change-Id: I6894928dcadc99a79417034a7096a215693a46f2
-
Debargha Mukherjee authored
Change-Id: I1e6a8a74d0ca1e6aa01d2da12bd9b19c8307154e
-
Cheng Chen authored
Change-Id: Ia448b44ca734fe111422de9afdad97ac48e78b66
-
Hui Su authored
When cb_partition_scan is true, only DCT_DCT is considered. Therefore there's no need to prune transform types; and if DCT_DCT is pruned, we end up with no transform type to use. Change-Id: I1d65fe94e72de66fde18e271a598f9e67ade9cfb
-
Yaowu Xu authored
Change-Id: Ibe4f7bb61837b6bae6717f0c683fa23f78de5b80
-
Jingning Han authored
Obtain the most likely partition range from a first pass square block base partition search. Use the constrained partition search region for full rate-distortion optimization search in the second pass. Tested on pedestrian 1080p at 2000 kbps, it makes the encoding speed 40% faster for speed 0 and 30% faster for speed 1. The average coding performance loss is around 0.15%. Change-Id: Ifc83d48e6413d1b887e68cd1962084e018a2258f
-
Jingning Han authored
Use simple rate-distortion search route for the first pass coding block partition. Change-Id: Iaaec3e1af83f46f625d3de8361eddd79a2bc6cef
-
Jingning Han authored
Add square block partition to serve as the first pass partition search. Change-Id: Ib637bba205d2cd0f6b0a5e2e91b270e22dce5580
-
Yaowu Xu authored
BUG=aomedia:1274 Change-Id: Ib1d814db4ef1bcb075444e4da855fd840e945a7d
-
Peng Bin authored
Same as https://aomedia-review.googlesource.com/c/aom/+/42901 Adopt the same code refactoring to aom_comp_mask_pred_c (Should be bitwise identical). Change-Id: Ieea71d370f5df48d216f40515842ad62499432c8
-
Maxym Dmytrychenko authored
SSE2 version already extended to support 13 TAPs Change-Id: I58e04527b297256b6ca63b12097d9345196a12bd
-
- 31 Jan, 2018 13 commits
-
-
Hui Su authored
Change-Id: If0b1d2fe31569104f2d8eef3cfd42cab30162c7e
-
Hui Su authored
Reduce the length of inter_tx_size[] from 1024 to 16. On a cif test sequence, encoder memory consumption decreases by 18% (380MB -> 312MB); decoder memory consumption decreases by 56% (21.4MB -> 9.4MB). Change-Id: Ie11dd055255d200954b704b8c2ad8ca3dff7bf5c
-
Tom Finegan authored
- Add explicit cast of bool to int to silence a test warning. - Add explicit cast of size_t to int for same in dump_obu. Change-Id: I90846eb5c88880d921f20cb66b116ab7d2799af5
-
Angie Chiang authored
BUG=aomedia:717 Change-Id: Ib06a12039cb72665c1ee534cc2246ac3d23f878d
-
Soo-Chul Han authored
cmake: -DCONFIG_SCALABILITY=1 Change-Id: Ifa908f809bcf904bdf0ed87b351e1ef3accc2b3f
-
Johann authored
Clear linker error when building with gcc 6: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC BUG=aomedia:102 Change-Id: I6c06de1e9dac1c044a4b07125abcaba0943a29b6
-
Hui Su authored
3~5% encoding speedup for speed 0; no quality loss. Change-Id: I0e31755f45253e5e99d8d9eed0d7a6fe6050f49f
-
Urvang Joshi authored
(1) Explicitly reset RD stats for each partition. Earlier, PARTITION_SPLIT was the only one resetting the RD_STATS in 'sum_rdc'. But this was working because: - PARTITION_SPLIT was tried before VERT, HORZ, VERT_4 and HORZ_4; and - RD cost calculations in VERT, HORZ, VERT_4 and HORZ_4 partitions implicitly discarded existing value in sum_rdc However, that was very fragile; explicitly resetting the stats every time is much safer. (2) Using a separate variable 'temp_best_rd_cost' was fragile as someone may forget to update the same. So, we use best_rdc.rdcost directly. BUG=aomedia:1246 Change-Id: Icd75f25c34bb0f1806e691784648bcffce2417e6
-
Deepa K G authored
AVX2 implementation of av1_convolve_x_sr, av1_convolve_y_sr and av1_convolve_2d_sr have been added. Improvements have been made to av1_convolve_x_avx2, av1_convolve_y_avx2 and av1_convolve_2d_avx2. Change-Id: I62a699dd9dcf42de94dd72cc2d43affc0dc31404
-
Tom Finegan authored
BUG=aomedia:1296 Change-Id: If9f944b58f23cdb71f919bd391f6b37e27b271f1
-
Angie Chiang authored
Serialize the adst4 operations Update stage range accordingly Change the cos_bit precision accordingly. Correct 4x8/8x4 inv_start_range BUG=aomedia:1271 Change-Id: I10bc91585a61d790decdc24cb91659102e043620
-
David Barker authored
As per the linked bug report, the distance-weighted compound prediction has two separate round operations, first by 3 bits (inside the various convolve functions), then by 10 bits (after the convolution functions). We can improve on this by right shifting by 3 bits inside the convolve functions - this is equivalent to doing a single round by 13 bits at the end. Note: In the encoder, when doing joint_motion_search(), we do things a bit differently: So that we can try modifying the two "sides" of the prediction independently, we predict each side as if it were a single prediction (including rounding), then blend these single predictions together. This is already an approximation to the "real" prediction, even in the non-jnt-comp case. So we leave that code path as-is. BUG=aomedia:1289 Change-Id: I9ad1fbcb3e12db2b5fc3c82b407f0fd9e6b39750