• Yunqing Wang's avatar
    Preload reference area to an intermediate buffer in sub-pixel motion search · 20bd1446
    Yunqing Wang authored
    In sub-pixel motion search, the search range is small(+/- 3 pixels).
    Preload whole search area from reference buffer into a 32-byte
    aligned buffer. Then in search, load reference data from this buffer
    instead. This keeps data in cache, and reduces the crossing cache-
    line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
    showed encoder speed improvement:
      3.4%   at --rt --cpu-used =-4
      2.8%   at --rt --cpu-used =-3
      2.3%   at --rt --cpu-used =-2
      2.2%   at --rt --cpu-used =-1
    Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
    Test on Xeon machine also showed less improvement, since unaligned
    data access latency is greatly reduced in newer cores.
    Next, I will apply similar idea to other 2 sub-pixel search functions
    for encoding speed > 4.
    Make this change exclusively for x86 platforms.
    Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f