    vp9 decoder: row-based multi-threaded loopfilter · 903801f1
    Yunqing Wang authored
    Implemented parallel loopfiltering, which uses existing tile-
    decoding threads. Each thread works on one row, and when that row
    is loopfiltered, it moves to next unattended row. To ensure the
    correct filtering order, threads are synchronized and one
    superblock is filtered only if the superblocks it depends on are
    filtered already.
    To reduce synchronization overhead and speed up the decoder, we use
    nsync > 1 for high resolution.
    Performance tests:
    1. on desktop:
    8-tile 4k video using 8 threads, speedup: 70% - 80%
    4-tile HD video using 4 threads, speedup: ~35%
    2. on mobile device(Nexus 7):
    4-tile 1080p video using 4 threads, speedup: 18% - 25%
    4-tile 1080p video using 2 threads, speedup: 10% - 15%
