Increase chrow row alignment to 16 bytes.
This is done by expanding luma row to 32-byte alignment, since there is currently a bunch of code that assumes that uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c, common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm, encoder/temporal_filter.c, and possibly others; I haven't done a full audit). It also uses replaces the hardcoded border of 16 in a number of encoder buffers with VP8BORDERINPIXELS (currently 32), as the chroma rows start at an offset of border/2. Together, these two changes have the nice advantage that simply dumping the frame memory as a contiguous blob produces a valid, if padded, image. Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003