- 23 Nov, 2017 2 commits
-
-
Rupert Swarbrick authored
This doesn't have a big performance impact, and it's rather simpler just having one version of everything. Change-Id: I5fa5e7640a63d0ccb0c371f266c6eee99d9520f9
-
Rupert Swarbrick authored
Change-Id: Ifac3a3bf620061865b82b986d6b16bcabd96a187
-
- 22 Nov, 2017 1 commit
-
-
Rupert Swarbrick authored
The code was assuming that an mi was always 4 samples wide and high. For chroma planes with subsampling, this is wrong and the size and position of the sb in the plane was over-estimated by a factor of two. This meant that we sent all the coefficients in the top-left hand quarter of the tile. Since the encoder and decoder made the same mistake, this worked fine, but it's clearly not what we're supposed to do! Change-Id: I0da8ada1d76639ad476ad84491658bc25ef3a43f
-
- 20 Nov, 2017 1 commit
-
-
David Barker authored
We currently have two implementations of the same function (aom_memset16() and memset16()), one of which is only defined inside restoration.c. Remove this duplicate, and use the globally defined version instead. Change-Id: I52740541f2e974f505728240127842397f6ef38d
-
- 17 Nov, 2017 1 commit
-
-
David Barker authored
The stripes are intended to extend down to the full decoded height of the frame, which is always a multiple of 8 luma pixels, in order to avoid some nasty edge cases. This change was partially implemented in previous patches, but not everywhere was modified, leading to slightly inconsistent code. This patch finishes making the relevant changes, along with a slight bit of refactoring. Change-Id: Ibc8e2f5ace5415815625edbc224557a7c548c38a
-
- 16 Nov, 2017 2 commits
-
-
David Barker authored
A while ago, I calculated some bounds on the intermediate values inside the self-guided filter. These bounds turned out to be not quite correct in one particular instance (when we have a large region of max-value pixels). This caused a variable to overflow a uint32_t when decoding 12-bit streams in the reference decoder, and would force 8/10-bit-only hardware to use wider buffers than intended in order to match the reference code. Fortunately, this can be fixed quite easily, with minimal changes to the filter output. See comments within the patch for the exact details. Also re-instate a Wikipedia link which seems to have gone missing but which provided useful context for the derivation of the bounds. Change-Id: I83d4a277a37eff048af9989cccf19202fafb17b5
-
David Barker authored
* Setup and restore the correct number of left/right boundary pixels at vertical tile edges, and save them in the correct buffers. Also fix the restore process in high-bitdepth mode. * When loop filtering across tiles is enabled, we were previously acting inconsistently at horizontal tile borders: The stripe just above the boundary would use CDEF pixels from the tile below for context, while the stripe just below would use deblocked pixels from the stripe above. The intended design appears to have been to use CDEF pixels on both sides (so we logically have a 64-pixel high stripe, it's just split into an 8-pixel and a 56-pixel high stripe in order to keep the coefficient sets aligned to tiles) Implement that behaviour by disabling the context setup process when at a horizontal tile border. * Pull some common calculations out of {setup,restore}_processing_stripe_boundary and into their common caller. This allows us to reduce the number of arguments going into each function and their internal complexity. * Add more design comments around stripe boundary setup, as there are quite a lot of constraints to be aware of Change-Id: Ic1586c149b7f764b9c1a711df3f11fb0f130b38a
-
- 14 Nov, 2017 1 commit
-
-
Rupert Swarbrick authored
The "src_height" computed in save_deblock_boundary_lines didn't match the one in save_tile_row_boundary_lines, which meant that the wrapper function assumed the deblock code was saving some lines and that code thought that save_cdef_boundary_lines would do it. This patch fixes up the logic to match, and also completely gets rid of the lines_to_save variable (after all, bad things would happen if lines_to_save was 1 because we'll still read both boundary lines later) The tile height gets rounded up to a multiple of 8 luma pixels in save_tile_row_boundary_lines to avoid nasty corner cases. This will only have any effect for rows at the bottom of the frame (where av1_get_tile_rect clips to the frame boundary). BUG=aomedia:1020 Change-Id: I55adb53fa8ba9c7f97fb2fd5b328a3f2f5065464
-
- 10 Nov, 2017 2 commits
-
-
Hui Su authored
Change-Id: Ia47101de22091a60d7931890f00a4a3a527f5bd4
-
Rupert Swarbrick authored
The logic (copied from save_deblock_boundary_lines) to work out how many lines to save isn't correct here: we always save exactly one line but duplicate it across multiple rows in the "boundary" buffer. BUG=aomedia:1020 Change-Id: Ib7ce7a777191062f3511d82c8c4eec589c900a2f
-
- 08 Nov, 2017 1 commit
-
-
Yaowu Xu authored
This reverts commit ab8bb8b8. The reverted breaks many nightly run tests, reverting this temporarily to allow nightly tests to detect other failures. Once the issues are fixed, we can reenable the change in the reverted commit. BUG=aomedia:1012 BUG=aomedia:1013 BUG=aomedia:1014 Change-Id: I2503fe78e47c7a08bb6cfdfff2c295cec0b6497d
-
- 07 Nov, 2017 1 commit
-
-
Rupert Swarbrick authored
As of patch https://aomedia-review.googlesource.com/c/aom/+/28821 , loop-restoration units cannot cross tile borders. But the context around each processing unit was still allowed to cross tile borders. This is fine in the usual case - but, when loop filtering across tiles is switched off, we're supposed to be able to decode each tile completely independently (each tile column, if dependent-horztiles is on). Roughly, the change we need to make is: When loop filtering across tiles is switched off, we treat each tile as if it were a full frame, and extend the CDEF output for that tile to form a 3-pixel border around the tile. We only use deblocked above/below pixels for processing unit boundaries which lie inside a tile. In terms of the code, this is implemented in two parts. This only applies when the loop_filter_across_tiles_flag is false; otherwise, we keep the old behaviour. * For processing units at the top edge of a tile, fill the above context with copies of the topmost line of CDEF output *from the same tile*, rather than using deblocked pixels from the tile above. The below context of processing units at the bottom edge of a tile is treated analogously. * When setting up the boundary for a processing stripe at the left edge of a tile, fill the stripe's left boundary with copies of the leftmost column of CDEF output from the same tile. Again, processing stripes at the right edge of a tile are treated analogously. Similarly to the above/below boundaries, we store the overwritten pixels into a pair of left/right context buffers, and restore them to their original values once we've dealt with that processing stripe. Change-Id: I53a0932793c1c56dc037683c6a4353a3f5dc4539
-
- 06 Nov, 2017 1 commit
-
-
Dominic Symes authored
max_tile was provisionally adopted at the working group meeting 2017-Oct-10 This patch also enables support for 64x64 and 128x128 superblock size for max tile (rather than assuming 128). There is also one fix for max_tile in combination of loop restoration where the width/height was in the wrong units for max-tile specific code. Change-Id: Icb862a2738fea5fc6215819396e1afa4eb86e461
-
- 03 Nov, 2017 2 commits
-
-
Rupert Swarbrick authored
The "data8_bl" variable is a uint8_t* and will be scaled up later (with REAL_PTR) if it's pointing to highbd data. Don't scale up the x offset. Change-Id: I03e2ce8861e25e3a603e8f0ba2c8af585e08b9c5
-
Rupert Swarbrick authored
We do this by upscaling the deblocked output as we save it into the RestorationStripeBoundaries line buffers. (See save_boundary_lines in restoration.c for the details) The upscaling is done by calling av1_convolve_horiz_rs, which reads off the edge of the frame and, of course, across tile boundaries. This means we need to extend the frame borders before saving boundary lines (hence the changes to decodeframe.c and encoder.c) Change-Id: Ia096846898b20afe4737433d772f7277d4f71724
-
- 02 Nov, 2017 5 commits
-
-
Rupert Swarbrick authored
Before this patch, striped loop restoration didn't restart correctly on each tile row. Now, the loop restoration stripes start at the top of a tile row in the same way as if it were the top of the entire frame. Change-Id: I0a88a28d7804b2f09d792ecbbf4f22f666f67012
-
David Barker authored
Because we have an (effective) 3-pixel border around each processing unit, and the local sums in the self-guided filter are only taken over at most 5x5 regions, we have 1 pixel's worth of spare border. We can use this border to greatly simplify the filter: Instead of calculating a 64x64 region of the A[] and B[] arrays, we can calculate a 66x66 region. Then we don't have to deal with complicated boundary conditions when generating the final 64x64 output block. This also makes a few other related changes: * The 'boxnum' function has been effectively redundant for a while - due to the way we do the 5x5 (or 3x3) windowing, the values we actually use are always (2r+1)^2. So we can skip calling this function if MAX_RADIUS <= 2 * We can remove the annoying special case for tiny processing units in the self-guided filter, as we no longer have to worry about border behaviour * We change the SSE4.1 code to match the new C code, removing a ton of complexity. Further refactoring/speedups are probably now possible, but this includes the minimal changes to pass all the tests. Change-Id: I99beee164a31349a5228a9bef048e5f35c9639f2
-
Rupert Swarbrick authored
These are just RESTORATION_PROC_UNIT_SIZE shifted right by the vertical or horizontal subsampling for this plane and it's easier not to have to pass them around. Change-Id: I86441d6cd86bb146f3e5dcdf2c89e34dd9fed0e1
-
Rupert Swarbrick authored
Previously we were calling aom_extend_frame_borders to generate extended pixels for use in loop-restoration. This generates quite a large border, when we only need 3 pixels. In addition, we were also calling extend_frame, which does the same thing but with a smaller border, once (in the decoder) or multiple times (in the encoder) per plane. This patch tidies all of this up so that we only call extend_frame once per plane, with the largest border size we need (3px). It also adds two new #defines. RESTORATION_BORDER is the 3 pixel border needed to do filtering for a processing unit. RESTORATION_CTX_VERT is the number of rows saved for each stripe when doing striped loop restoration. Change-Id: I2c3ffcc19808f79db195f76d857e2f23da5d8a84
-
Rupert Swarbrick authored
After this patch, we don't scale sb coordinates vertically when using HORZ_FRAME_SUPERRES. Change-Id: I24c652b4b357b132e8b29979a119e7aeb8420e19
-
- 30 Oct, 2017 2 commits
-
-
Rupert Swarbrick authored
I'd got the scaling backwards. This gets it right and adds a comment explaining the calculation. Change-Id: Ife2913700cc73996c09b702b394832799c449a8c
-
David Barker authored
Remove the special case handling for the topmost/bottommost rows in each processing unit. This causes slightly different effects depending on whether striped-loop-restoration is enabled. With striped-loop-restoration: Now that we explicitly fill out 3 rows of above/below pixels for each stripe, we don't need to use stepdown_wiener_kernel. Instead, the duplication of the topmost/bottommost pixels accomplishes the same task, while making the code much cleaner. This patch should not cause a change in output, except in a couple of cases which were already questionable. In particular, it fixes bug #953, where the Wiener filter could not handle small processing units (<4 rows high) Without striped-loop-restoration: The Wiener filter returns to using a full 3 pixels above/below the processing unit. In order to make sure there are enough pixels, we need to expand WIENER_BORDER_VERT to 3 pixels. This will result in a slight change in output, but should be fairly minor. BUG=aomedia:953 Change-Id: I9530ef55909246f7ba488b7ecfd92d59e776b2f9
-
- 27 Oct, 2017 1 commit
-
-
David Barker authored
Save and restore 3 rows above and below each stripe, instead of 2. The extra rows are filled with duplicates of the outermost context rows. This should not affect the encoder or decoder output in any way, as currently these outer rows are not used. But this will enable later patches to simplify the code and make it a closer match to the way things are described in the striped-loop-restoration design document. Change-Id: I8ae5433e321d6025c6dc1b473330f485f1599340
-
- 26 Oct, 2017 1 commit
-
-
Rupert Swarbrick authored
With this patch, restoration units are allocated within each tile as if it were its own image. Arrays of information that need one entry per restoration unit are laid out in tiles, with rsi->units_per_tile units for each tile. Change-Id: I485c17166f33e24d281079b3138b76f98f0fe081
-
- 25 Oct, 2017 2 commits
-
-
Ola Hugosson authored
* The above/below buffers did not fit the extra replication pixels to the right and left * The wiener filter stripe has to be at least 4 pixel high (because of the split into above/mid/below parts) Change-Id: I360bef114c7ceb439e11b76bd4724af15e051348
-
Rupert Swarbrick authored
This is the last stage in a quest to move all knowledge of the layout of restoration units across the frame into restoration.c. Now this is done, we can change how they are laid out (to split them properly at tile boundaries) without having to change code in any other file. Change-Id: Id5108d787d342f5070580d0e34d84b5ddcc53a86
-
- 24 Oct, 2017 1 commit
-
-
Rupert Swarbrick authored
This patch also does a certain amount of rejigging for loop restoration coefficients, grouping the information for a given restoration unit into a structure called RestorationUnitInfo. The end result is to completely dispense with the RestorationInternal structure. The copy_tile functions in restoration.c, together with those functions that operate on a single stripe, have been changed so that they take pointers to the top-left corner of the area on which they should work, together with a width and height. The same isn't true of av1_loop_restoration_filter_unit, which still takes pointers to the top-left of the tile. This is because you actually need the absolute position in the tile in order to do striped loop restoration properly. Change-Id: I768c182cd15c9b2d6cfabb5ffca697cd2a3ff9e1
-
- 19 Oct, 2017 5 commits
-
-
Rupert Swarbrick authored
This shouldn't change the behaviour at all, but I think the resulting code is slightly easier to read and follow. I've also added copious comments to setup_processing_stripe_boundary to explain exactly what the code is doing. Change-Id: I68adf2d0455b7d87aa04d7e6daa43f4d730c6f80
-
Rupert Swarbrick authored
This refactors the iteration in restoration.c so that all the scary stuff lies in a pair of general functions, filter_frame and filter_rest_unit. filter_frame is currently very simple, iterating over the restoration units in the frame. Once we've made it so that restoration units don't span tile boundaries, this function is the one we'll need to update to iterate over tiles and then restoration units within the tile. filter_rest_unit replaces the outer loop of the loop_*_filter_tile* functions. It deals with chopping the restoration unit into stripes of height procunit_height. When CONFIG_STRIPED_LOOP_RESTORATION is true, it also deals with calling setup_processing_stripe_boundary and restore_processing_stripe_boundary to use boundary data from the deblocked output. Some of the ugly #if/#endif blocks have been elided in the wiener filter code (both low and high bit depth), by defining a convolve alias based on USE_WIENER_HIGH_INTERMEDIATE_PRECISION. There are also changes to extend const-ness for the source frame. I've adopted the convention that the frame input is called "data" (as it was before) while it's non-const. This is true as far as filter_rest_unit. Then each "process one stripe" function takes a const pointer to the source frame, at which point it's called "src". The intention is that, once filter_rest_unit no longer needs a RestorationInternal pointer, this function can be exposed in restoration.h and can be used by pickrst.c Change-Id: I18043a172ef0ca1154d87cf7f63e3a80944627cd
-
Rupert Swarbrick authored
This flag comes from the loop filter's speed features and (I think) tells the encoder to make decisions about the filter by looking at a narrow strip in the middle of the frame. That's reasonable enough, but doesn't make any sense for loop restoration, where we were calling av1_loop_restoration_frame from pickrst.c in order to calculate what restoration parameters to use for a given restoration unit (which might not be in the narrow strip in the middle!) As it turns out, the LPF_PICK_FROM_SUBIMAGE method is never actually signalled in the reference encoder, which is presumably why we haven't spotted this before. Change-Id: I745e2eab873c0b33920caca40e338af9d078d25e
-
Rupert Swarbrick authored
The bits needed by striped loop restoration are now in RestorationInfo (which also gets rid of a rather ugly extra index). The scratch buffer that's used for self-guided restoration has been moved up to its own variable (rst_tmpbuf). All the rest of the fields are now safely hidden inside restoration.c This patch also does a big cleanup of the initialisation code in loop_restoration_rows: it doesn't need to be as repetitive now that the fields of YV12_BUFFER_CONFIG can be accessed by plane index. Change-Id: Iba7edc0f94041fa053cdeb3d6cf35d84a05dbfaf
-
Rupert Swarbrick authored
Restoration units are a fixed square size (in cm->rst_info[plane]) for almost the entire image. The only special case is for tiles at the right hand edge or the bottom row, which might expand or be cropped. The av1_get_rest_ntiles function was implementing the cropping behaviour when the image happened to be less than one restoration unit wide or high (but not the expansion behaviour), but the result was never useful: if you want to get the size of a restoration tile in order to divide by it to work out what tile you're on, the fixed square size is what you want. If you need to know how big this particular tile is, call av1_get_rest_tile_limits. As well as removing the output arguments from av1_get_rest_tile_limits, this patch also removes the tile_width and tile_height fields from the RestorationInternal structure. Note that the tile size which is what you actually need is accessible as rst->rsi->restoration_tilesize. (In practice, these were almost always the same anyway). This patch also has a couple of other small cleanups. Firstly, it moves the subsampling_y field out of CONFIG_STRIPED_LOOP_RESTORATION. It's not actually needed when you're not doing striped loop restoration, but this gets rid of lots of horrible #if/#endif lines at callsites for av1_get_rest_tile_limits. Secondly, it simplifies the code in init_rest_search_ctxt (and fixes some tautologous assertions). Now that YV12_BUFFER_CONFIG has a more uniform layout, there's a simpler way to set things up, so we use that. Change-Id: I3c32d8ea0abe119dc86b9efa7564b27dde2151dc
-
- 11 Oct, 2017 1 commit
-
-
Debargha Mukherjee authored
This makes sure that all eps values are at least 4 since otherwise the computation will not fit within 32 bits. BUG=aomedia:893 Change-Id: I4815a865be8db792d0481172a2dfa0bc0a817f73
-
- 10 Oct, 2017 1 commit
-
-
Rupert Swarbrick authored
This was getting the wrong count of restoration tiles for chroma planes with subsampling and a smaller restoration tilesize. BUG=aomedia:886 Change-Id: I5c9c17ed4ad91111bcc6fa6205a9550b53f84a64
-
- 07 Oct, 2017 1 commit
-
-
Urvang Joshi authored
Earlier, the superres scale was in the form of: N/16, where N ranged from 8 to 16. We change this to the form: 8/D, where D ranges from 8 to 16. This helps on the decoder side, by making it possible to work on 8x8 blocks at a time. Change-Id: I6c72d4b3e8d1c830e61d4bb8d7f6337a100c3064
-
- 28 Sep, 2017 1 commit
-
-
Ola Hugosson authored
This experiment offset the filter tile grid 8 pixels upwards. Deblocked pixels (rather than CDEFed pixels) are used for the 2 lines above and below the filter processing unit. The 8 pixel offset is the offset produced by deblock/cdef. This way the loop_restoration does not need additional line buffers in a single pass hardware implementation. Change-Id: I89e0831dc28413a5d3e02d7a426ce2885ab629d7
-
- 26 Sep, 2017 1 commit
-
-
Rupert Swarbrick authored
The subtile and clamping features are no longer used. This patch removes the dead code that implemented them and the parameters to support them. It also changes the return format. Instead of having return type void and passing data out through 4 output pointers, the function now just returns a RestorationTileLimits structure. Since the function is defined inline in a header, I suspect that most callsites will actually compile to identical code. There should be no functional change from this patch. Change-Id: I6ebc4da66a00676bd988f939a4b4957f743e8004
-
- 10 Sep, 2017 2 commits
-
-
Debargha Mukherjee authored
Inlcudes miscellaneous cleanups, test fixes, and code reorganization for loop-restoration components. Change-Id: I5b2e6419234d945e6f4344b22636119b50df4054
-
Debargha Mukherjee authored
This patch forces the vertical filtering for the top and bottom rows of a processing unit for the Wiener filter to not use border more than what is set in the WIENER_BORDER_VERT macro. This macro is currently set at 0 to eliminate line buffer completely, but it could be increased to 1 or 2 to use limited line buffers if the coding efficiency is affected too much with a 0 line-buffer. Also, for the sgr filter we added the option of using overlapping windows horizonttally and vertically to improve coding efficiency. The vertical border used is set by the SGRPROJ_BORDER_VERT macro, while the horizontal border can be set by the SGRPROJ_BORDER_HORZ macro set at 2, the max needed. Currently we do not recommend changing SGRPROJ_BORDER_HORZ below 2. The overall line buffer requirement for LR is twice the max of WIENER_BORDER_VERT and SGRPROJ_BORDER_VERT. Currently both are set as 0, eliminating line buffers completely. Also this patch extends borders consistently before CDEF / LR. Change-Id: Ie58a98c784a0db547627b9cfcf55f018c30e8e79
-
- 07 Sep, 2017 1 commit
-
-
Debargha Mukherjee authored
This patch forces the vertical filtering for the top and bottom rows of a processing unit for the Wiener filter to be 5-tap. The 5-taps are derived from the primary 7-tap fitler by forcing the taps at the end to be zero, and absorbing their weights into the other taps to maintain normalization. This will effectively reduce the line buffer size for luma Wiener filter to 4 (from 6). Change-Id: I5e21b58369777eabf553a8987387d112f98a5598
-