-
Timothy B. Terriberry authored
These don't share the same kernel functions as the others so we can avoid doing two transposes for the rows and because we don't need to split short rows into multiple registers for the columns. The resulting IDTX implementations can be re-used for all sizes, though we might benefit from the larger AVX registers for the larger sizes. It might also be worth having a fast path for IDTX_IDTX to avoid an extra round-trip through memory, but that can be added in a separate patch if it proves worthwhile. Change-Id: I36fa4ea44c7dd2c165bff750d9bc8a213783041f
f03f543d