    Restructure multi-threaded decoder · f857a850
    Yunqing Wang authored
    On each MB, loopfiltering is done right after MB decoding. This
    combines two loops in multi-threaded code into one, which reduces
    number of synchronizations to half.
    The above-row/left-col data are saved in temp buffers for
    next-row/next MB decoding.
    Tests on 4-core gLucid machine showed 10% decoder performance
    gain with threads=4 (tulip clip). Testing on other platforms
    isn't done yet.
    Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9