The end-of-buffer check is hoisted out of the inner loop. Gives about 0.5% improvement on x86_64. Change-Id: I8e3ed08af7d33468c5c749af36c2dfa19677f971