-
Johann Koenig authored
The data processed by the loopfilter overlaps. At the block level, this results in some redundant transforms. Grouping the filtering allows for a single 16x16 transpose (and inversion) instead of three 16x8 transposes (and three more inversions). This implementation is x86_64 only. We retain the previous implementation for x86. Improvements are obviously material dependant, but it seems to be ~%1 in tests here. Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
3556deac