Commit ab29978e authored by Committed by Debargha Mukherjee
Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect. Uses of these masks can be arranged such that all input blocks are contiguous in memory (stride == block width). In this case 1D versions of operations can be used. 1D vector operations have superior performance over 2D block equivalents as they are more processor cache friendly and they can do away with a second loop overhead. Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
Showing with 137 additions and 74 deletions