1. 22 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do vertical loopfiltering in parallel · ed36720b
      Yunqing Wang authored
      This patch followed "Add filter_selectively_vert_row2 to enable
      parallel loopfiltering" commit, and added x86 SSE2 optimization
      to do 16-pixel filtering in parallel. For other optimizations
      (neon and dspr2), current 16-pixel functions were done by calling
      8-pixel functions twice, and real 16-pixel functions could be added
      later.
      
      Decoder speedup:
      tulip clip:     2% speed gain;
      old_town_cross: 1.2% speed gain;
      bus:            2% speed gain.
      
      Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
      ed36720b
  2. 21 Nov, 2013 3 commits
  3. 20 Nov, 2013 7 commits
  4. 19 Nov, 2013 10 commits
  5. 18 Nov, 2013 1 commit
  6. 17 Nov, 2013 1 commit
  7. 16 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do horizontal loopfiltering in parallel · 64f728ca
      Yunqing Wang authored
      This patch followed "Rewrite filter_selectively_horiz for parallel
      loopfiltering" commit, and added x86 SSE2 optimization to do
      16-pixel filtering in parallel. Also, corrected the declaration
      of aligned arrays. For 8-pixel-in-parallel case, improved the
      calculation of the masks and filters. Updated the threshold loading
      since the thresholds were already duplicated. Updated neon C functions
      to call neon loopfilters twice.
      
      Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
      
      Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
      64f728ca
  8. 15 Nov, 2013 5 commits
  9. 14 Nov, 2013 6 commits
  10. 13 Nov, 2013 3 commits
  11. 12 Nov, 2013 2 commits