Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the patch does is to make the neighbour arrays padded by one item to prevent an eob check in get_coef_context(), then it populates each col/row scan and left/top edge coefficient with two times the same neighbour - this prevents a single/double context branch in get_coef_context(). Lastly, it populates neighbour arrays in pixel order (rather than scan order), so we don't have to dereference the scantable to get the correct neighbours. Total encoding time of first 50 frames of bus (speed 0) at 1500kbps goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase. Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56