- 16 Jan, 2018 1 commit
-
-
David Michael Barr authored
Includes unit tests for conformance and speed. SSSE3/CFLPredictHBDTest: 4x4: C time = 1436 us, SIMD time = 358 us (~4x) 8x8: C time = 4821 us, SIMD time = 598 us (~8.1x) 16x16: C time = 18528 us, SIMD time = 1793 us (~10x) 32x32: C time = 72998 us, SIMD time = 6400 us (~11x) AVX2/CFLPredictHBDTest: 4x4: C time = 1436 us, SIMD time = 398 us (~3.6x) 8x8: C time = 4924 us, SIMD time = 644 us (~7.6x) 16x16: C time = 18624 us, SIMD time = 1617 us (~12x) 32x32: C time = 73509 us, SIMD time = 3635 us (~20x) Change-Id: Icbcfefbf165facdbd77c9b3861af2bbf464254a0
-
- 11 Jan, 2018 1 commit
-
-
David Michael Barr authored
Includes unit tests for conformance and speed. SSSE3/CFLPredictTest: 4x4: C time = 2063 us, SIMD time = 313 us (~6.6x) 8x8: C time = 6656 us, SIMD time = 493 us (~14x) 16x16: C time = 24970 us, SIMD time = 1327 us (~19x) 32x32: C time = 59020 us, SIMD time = 5178 us (~11x) AVX2/CFLPredictTest: 4x4: C time = 2052 us, SIMD time = 333 us (~6.2x) 8x8: C time = 6712 us, SIMD time = 513 us (~13x) 16x16: C time = 25292 us, SIMD time = 1023 us (~25x) 32x32: C time = 58994 us, SIMD time = 2828 us (~21x) Change-Id: I08690a548be981ff10e184de468b9e0e691ee812
-
- 08 Jan, 2018 1 commit
-
-
Luc Trudeau authored
Includes unit tests for conformance and speed. SSSE2/SubsampleSpeedTest: 4x4: C time = 868 us, SIMD time = 200 us (~4.3x) 8x8: C time = 3054 us, SIMD time = 293 us (~10x) 16x16: C time = 11887 us, SIMD time = 760 us (~16x) AVX2/SubsampleSpeedTest: 4x4: C time = 784 us, SIMD time = 205 us (~3.8x) 8x8: C time = 2774 us, SIMD time = 307 us (~9x) 16x16: C time = 10978 us, SIMD time = 489 us (~22x) Change-Id: I7d5958097542599d57d1a9f9a0a1b809c6a345b0
-
- 19 Dec, 2017 1 commit
-
-
Luc Trudeau authored
By default, the DC_PRED is not cached (this includes decoding). During cfl_rd_pick_alpha(), DC_PRED caching is enabled, the DC_PRED is cached after the first time it is computed (for each plane) and then it is reused when testing all the other scaling parameters. Change-Id: Ie8ba0bb0427c4d9be8de5b44e6330e8a78b9c7d9
-
- 15 Dec, 2017 1 commit
-
-
Debargha Mukherjee authored
Removes unused BLOCK_2X2, BLOCK_2X4 and BLOCK_4X2 from the BLOCK_SIZE enum. Change-Id: I964d99718026c51a1eaf30d4a1fc83cc52f94083
-
- 13 Dec, 2017 2 commits
-
-
Luc Trudeau authored
Based on the HW Subgroup call of December 4th 2017, we limit luma partition to 32X32. Regression on Subset 1 PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0881 | 1.3504 | 1.2936 | 0.0572 | 0.0182 | 0.0227 | 0.5204 https://two.arewecompressedyet.com/?job=CfL-PartU%402017-12-12T15%3A39%3A36.794Z&job=CfL-Max32x32%402017-12-12T16%3A10%3A09.989Z Change-Id: I7e3cfd68097c0bc24b1426348b5fd574c4f638a0
-
Luc Trudeau authored
To avoid a cascade of encodes when performing CfL RDO, we compute DC_PRED on the partition unit. To do so, we change the tx_size of CfL to match the size of the partition unit (i.e. CfL partitions only contain 1 transform block). This change requires disabling CfL when a chroma partition-unit-sized DC_PRED is unavailable (i.e. 4:1, 1:4 partitions and chroma partitions > 32X32). Results on Subset1 (compared to disabling 4:1 and 1:4 partU): PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.1243 | -1.9286 | -2.0140 | -0.1514 | -0.1512 | -0.1947 | -0.8066 https://two.arewecompressedyet.com/?job=master%402017-12-12T14%3A53%3A01.451Z&job=CfL-PartU%402017-12-12T15%3A39%3A36.794Z Change-Id: I2a4adde79c10089130775b8e0df5f9c198855cad
-
- 05 Dec, 2017 1 commit
-
-
Luc Trudeau authored
Moving CfL to using partition unit DC_PRED requires 4:1 and 1:4 DC_PRED, which are not currently implemented. A simple solution is to disable CfL for 4:1 and 1:4 partitions. CfL is also disabled for luma intra partitions < 4x4. This is inherent to luma intra prediction partition sizes. We add an assert to enforce this. Resulting in the following regression for Subset1 PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0093 | 0.1803 | 0.1519 | -0.0180 | 0.0256 | 0.0226 | 0.0352 https://two.arewecompressedyet.com/?job=CfL%402017-11-30T19%3A05%3A05.639Z&job=CfL-Disable-4to1%402017-11-30T19%3A04%3A00.761Z Change-Id: Ie2c8b4d9cb6b6f33a103b540209e1a2fb6df74a7
-
- 09 Nov, 2017 1 commit
-
-
Luc Trudeau authored
Results on Subset 1 PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0354 | -0.2567 | -0.3941 | 0.0104 | -0.0084 | 0.0120 | -0.0996 https://arewecompressedyet.com/?job=master%402017-11-03T15%3A57%3A30.643Z&job=cfl-av1-DC_PRED%402017-11-03T16%3A00%3A10.866Z BUG=aomedia:928 Change-Id: I4e26e8c56d2246ca32b8d86145ef67f6df90d8d1
-
- 28 Sep, 2017 2 commits
-
-
Luc Trudeau authored
Pixels are subsampled when they are stored in the CfL prediction buffer. This allows to avoid having two buffers. No impact on the bitstream. Results on subset1 (Compared to parent) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 https://arewecompressedyet.com/?job=cfl-avg-on-fly%402017-09-23T21%3A38%3A04.440Z&job=cfl-Sub-On-Store%402017-09-24T15%3A01%3A41.161Z Change-Id: If051e34d6d7078d77609542305a2afebef5cb177
-
Luc Trudeau authored
Instead of storing the transform block average, it is immediately subtracted from the subsampled pixels. This change does not alter the bitstream and it reduces CfL complexity. Change-Id: Ia5038b336abf1ec01e295b235734318906d3bae6
-
- 27 Sep, 2017 1 commit
-
-
Luc Trudeau authored
Result from luma subsampling is left-shifted by 3. This avoids having to do it during averaging, in alpha search and when building the prediction. This change does not alter the bitstream. Results on Subset1 PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 https://arewecompressedyet.com/?job=cfl-baseline%402017-09-06T17%3A41%3A38.041Z&job=cfl-SubsampleQ3%402017-09-06T17%3A42%3A01.252Z Change-Id: I6e89eac6496f7c36e46364c9223fbcbca6759032
-
- 26 Sep, 2017 1 commit
-
-
Luc Trudeau authored
Like for intra block in intra frames, an extra call to txfm_rd_in_plane is added to the RDO of intra blocks in inter frames. This extra call is performed using the best parameters found during RDO and the reconstructed luma pixel are stored. Results on objective-1-fast (compared to CfL on Intra frames only) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.2497 | -3.5526 | -3.5048 | -0.2456 | -0.2392 | -0.2508 | -1.4811 https://arewecompressedyet.com/?job=cfl-no-inter%402017-09-13&job=cfl-inter%402017-09-13T14%3A13%3A13.918Z Change-Id: I70ea2c01859b6c55d7c3eb9680d492c0bfc2aad4
-
- 19 Sep, 2017 1 commit
-
-
Luc Trudeau authored
The cfl_init function is moved out of cfl.h simplifying the includes and removing the need for forward declarations. Change-Id: I47312b25410b718a830b001391e386647005d57e
-
- 13 Sep, 2017 1 commit
-
-
David Michael Barr authored
Instead of forward-declaring AV1_COMMON and MACROBLOCKD, move the dependent struct and function prototype closer to where they are used and after these types are defined. Change-Id: I75f005b46ef322a6fcbc01377b8dded1637c5f73
-
- 31 Aug, 2017 1 commit
-
-
Luc Trudeau authored
When Chroma from Luma is combined with chroma_sub8x8, the prediction used for sub8x8 blocks originates from multiple luma blocks. Extra asserts are added to validate that the prediction buffer contains all the required information. Change-Id: I305c46ce9b8292697e1d5b181d123461026da11c
-
- 30 Aug, 2017 1 commit
-
-
Luc Trudeau authored
Since the scaled luma can be negative, ROUND_POWER_OF_TWO_SIGNED must be used. This changes the behavior from rounding toward -infinity to rounding towards 0. Results for Subset1 (compared with 35545dd5 with CfL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0082 | -0.1061 | -0.0119 | -0.0126 | -0.0011 | -0.0121 | 0.0094 Change-Id: Ie7258a17a199368339d4794fba6b5916e607c95b
-
- 28 Aug, 2017 1 commit
-
-
Luc Trudeau authored
With recent changes, it is now possible to store the storage flag inside the CFL_CTX. This simplifies the implementation and will allow reuse in the decoder. This change does not alter the bitstream. Change-Id: Ibb8aebdd3d06f8765d40248ece8a038892e87032
-
- 17 Aug, 2017 1 commit
-
-
David Michael Barr authored
Also, move body of update_cfl_costs() to av1_fill_mode_rates(). Results on Subset1 (Compared to 1cfe474b with CFL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 No change in bitstream, for an average encode speed-up of 2.3%. Change-Id: I3948abcd70cfecad8086edfe4c45552b576ae06f
-
- 29 Jul, 2017 1 commit
-
-
David Michael Barr authored
Expand the range of alpha to [-2, 2] in Q3. Jointly signal the signs, including zeros. Use the signs to give context for each quadrant and half-axis. The (0, 0) point is excluded. Symmetry in alpha_u == alpha_v yields 6 contexts. Results on Subset1 (Compared to 9136ab7d with CFL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0792 | -0.7535 | -0.7574 | -0.0639 | -0.0843 | -0.0665 | -0.3324 Change-Id: I250369692e92a91d9c8d174a203d441217d15063 Signed-off-by:
David Michael Barr <b@rr-dav.id.au>
-
- 28 Jul, 2017 1 commit
-
-
Luc Trudeau authored
CfL is now an independent mode. Results on Subset1 (Compared to 4266a7ed with CFL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.1645 | -0.4017 | 0.2475 | -0.1851 | -0.2179 | -0.2338 | -0.2897 Change-Id: I2e86e7ea7bfc12bb1d763e70a136ca992d57a3c5
-
- 11 Jul, 2017 1 commit
-
-
Luc Trudeau authored
Alpha's biggest fraction is 1/8, so Q3 does not change the bitstream. Results on Subset1 (compared to 503aca74 with CfL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 Change-Id: I1fe5b2ace97179d5f950d7406a4f3d391924f89d
-
- 10 Jul, 2017 1 commit
-
-
Luc Trudeau authored
The block level DC_PRED computed by CfL goes down from Q6 to Q0. This will allow to reuse existing assembly for DC_PRED and also reduce the requirements on the multilpy required to scale the reconstructed luma values Results on Subset1 (compared to f9684d222 with CfL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0347 | 0.0229 | -0.1326 | -0.0420 | -0.0057 | -0.0072 | -0.0644 Change-Id: I6ba82cc9e04fa4ab7c8ec40a7856deb273881748
-
- 06 Jul, 2017 4 commits
-
-
Luc Trudeau authored
Since alpha is Q3, we reduce y_average from Q10 to Q3. As such, the prediction is reduced from Q13 to Q6. Chroma dc_pred is reduced from Q7 to Q6 in order to match with the prediction. Results on Subset1 (compared to 209de2e5b with CfL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0010 | 0.0176 | -0.0538 | -0.0043 | 0.0027 | -0.0097 | -0.0018 Change-Id: Ib7dd3968a764e0380ddc0ad2333ebacf1e9699cd
-
Luc Trudeau authored
The dc_pred values stored in the CfL context are in Q8.7 (Worst case division will be of 1/128). Results on Subset1 (compared to f9684d222 with CfL enabled) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0118 | -0.0181 | -0.0109 | 0.0086 | 0.0086 | 0.0196 | 0.0018 Change-Id: I0701e04fb76f03eff12ed01fd5fda675fbb15e32
-
Luc Trudeau authored
This change does not impact the bitstream as no loss is incured by using a fixed point value for the transform size average. For low bit depth, the transform size average is stored using Q8.10 fixed point format. Worst case, smallest fraction is 1/1024. Results on Subset1 (Compared to 366b74 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 Change-Id: Ia5b046b92a0e4c40e413b16af3394bdc0a8c8cd9
-
Luc Trudeau authored
When computing alpha, multiple averages are computed, one for each transform block. The CfL prediction now uses the transform block average instead of partition block average. This allows the decoder to build the CfL prediction by using only the collocated reconstructed luma values for the current transform size and not the entire partition. Results on Subset 1 (Compared to 0e81b97c with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0180 | 0.2627 | 0.2274 | 0.0233 | 0.0301 | 0.0312 | 0.1506 A small regression is expected, this change was made to simplify hardware implementations. Change-Id: Ib2ce2a3053b85300c5c62ef0e3270af489568a38
-
- 03 Jul, 2017 1 commit
-
-
Luc Trudeau authored
Adjust row and col offset for sub8x8 blocks to allow the CfL prediction to use all available reconstructed luma pixels. Results on Subset 1 (Compared to b03c2f44 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.1355 | -0.8517 | -0.4481 | -0.0579 | -0.0237 | -0.0203 | -0.2765 Change-Id: Ia91f0a078f0ff4f28bb2d272b096f579e0d04dac
-
- 29 Jun, 2017 1 commit
-
-
Luc Trudeau authored
The function cfl_compute_parameters is added and contains the logic related to building the CfL context parameters. As such, many cfl functions can now be encapsulated inside of cfl.c and not exposed to the rest of AV1. This also allows for supplemental asserts that validate that the CfL context is properly built. Results on Subset1 (compared to 9c6f8547 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 Change-Id: I6d14a426416b3af5491bdc145db7281b5e988cae
-
- 20 Jun, 2017 2 commits
-
-
David Michael Barr authored
Results on Subset 1 (Compared to a0f8c145 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0677 | -0.3359 | -0.2115 | 0.0529 | 0.0735 | 0.0495 | -0.0907 Change-Id: Ib61ff862e8cfbdf0c693a4eba5f2712a6e9ab819 Signed-off-by:
David Michael Barr <b@rr-dav.id.au>
-
David Michael Barr authored
In 84a44dbe, there was a transcription error in the table. Point { 5, 3 } was duplicated and { 5, 0 } was missing. While we are here, updated the order and CDF from subset3. Results on subset1 (compared to cab68ae6 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0416 | -0.2732 | -0.3848 | 0.0385 | 0.0309 | 0.0230 | -0.0178 Change-Id: I3244245792c5ab99b4149ae5f8a2439d4214ed69 Signed-off-by:
David Michael Barr <b@rr-dav.id.au>
-
- 19 Jun, 2017 1 commit
-
-
Luc Trudeau authored
Extract the compution of the luma reconstructed average out of cfl_load and into cfl_compute_average. The reconstructed luma average is stored in the CFL_CONTEXT to avoid computing it for each transform block and for each plane. Results on subset1 (compared to 803bea26 with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0474 | -0.1486 | -0.2931 | -0.0358 | -0.0397 | -0.0127 | -0.1162 Change-Id: I9e34af0fe5961ce8dbe70cb80aea2a16221d0d92
-
- 14 Jun, 2017 1 commit
-
-
Luc Trudeau authored
plane_bsize is now computed properly. This also includes support for the special case of blocks < 4X4 Results on subset1 (compared to 8e689e4b with CfL) PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 -0.0218 | -0.2328 | -0.2555 | -0.0230 | -0.0379 | -0.0723 | -0.1205 Change-Id: I6ec87d818d8df6a40ecf3bb1b86954e59c952930
-
- 06 Jun, 2017 1 commit
-
-
Luc Trudeau authored
This change does not impact the bitstream PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 Change-Id: I6e131e91bad5efa345ed2542ae970eb6122eff51
-
- 24 May, 2017 1 commit
-
-
David Michael Barr authored
Separate the codes into a table of distinct values and an index into that table. Pull the SSE calculation of the RDO loop and avoid repeating for the same alpha values. Change-Id: I8c4bd7eab6f8000e6aca9687d9190abc3e270c37 Signed-off-by:
David Michael Barr <b@rr-dav.id.au>
-
- 18 May, 2017 1 commit
-
-
Luc Trudeau authored
Encapsulates the logic to update the rate of each CfL codeword. The if statements are removed from the loop and the arrays are stored in CFL_CTX instead of being declared every time. Change-Id: I0cb208b14e6c6a888210dd33c5e8fe8d74dd87f4
-
- 12 May, 2017 1 commit
-
-
Luc Trudeau authored
Move cfl_idx_to_alpha in the header to facilitate inlining. Remove the forward MB_MODE_INFO forward declaration Change-Id: Id33fb0228d88b6285252843e2345a0d3ae875cd2
-
- 09 May, 2017 1 commit
-
-
Luc Trudeau authored
The prediction block level DC_PRED is stored and computed as double instead of int. Change-Id: I22766c102a7b62d4b5e7621438185808cc0ea8f4
-
- 08 May, 2017 1 commit
-
-
Luc Trudeau authored
Since the size used with cfl_load can either be based on the transform block size and the prediction block size, width and height are used as parameters instead of TX_SIZE. This resolves a problem where cfl_compute_alpha_ind was reading uninitialized memory. Change-Id: I187dbdd5b2e8bd85e82bb77eb74859bee2cd3f1e
-
- 05 May, 2017 1 commit
-
-
Luc Trudeau authored
Writes and reads alpha to and from the bitstream. A special case is needed on the encoder side to handle prediction block skips. Since whether or not a prediction block is skipped during CfL, a rollback is required if the block was skipped and the alpha index was not zero. The advantage of this is that no signaling is required when the prediction block is skipped as it is assumed tha the alpha index is zero. A encode facade is added to the intra prediction facade as CfL requires special encoder side operations. Change-Id: Ic3b11d0fdbd51389d862112eb09d8785127a6b06
-