-
David Barker authored
Because we have an (effective) 3-pixel border around each processing unit, and the local sums in the self-guided filter are only taken over at most 5x5 regions, we have 1 pixel's worth of spare border. We can use this border to greatly simplify the filter: Instead of calculating a 64x64 region of the A[] and B[] arrays, we can calculate a 66x66 region. Then we don't have to deal with complicated boundary conditions when generating the final 64x64 output block. This also makes a few other related changes: * The 'boxnum' function has been effectively redundant for a while - due to the way we do the 5x5 (or 3x3) windowing, the values we actually use are always (2r+1)^2. So we can skip calling this function if MAX_RADIUS <= 2 * We can remove the annoying special case for tiny processing units in the self-guided filter, as we no longer have to worry about border behaviour * We change the SSE4.1 code to match the new C code, removing a ton of complexity. Further refactoring/speedups are probably now possible, but this includes the minimal changes to pass all the tests. Change-Id: I99beee164a31349a5228a9bef048e5f35c9639f2
369d8f22