Skip to content
Snippets Groups Projects
  1. Sep 28, 2010
    • Johann Koenig's avatar
      Merge "update gitignore" · e4d43c21
      Johann Koenig authored
      e4d43c21
    • Johann Koenig's avatar
      update gitignore · 6fa5c24a
      Johann Koenig authored
      this was excluding all .asm files when it should have just been .asm
      files in the top level directory and .asm.s files lower down. also be
      more restrictive on some other items, and run the whole thing through
      sort to keep it organized
      
      Change-Id: Ia48525033226b13098a491ce89465d0377b990c2
      6fa5c24a
    • Timothy B. Terriberry's avatar
      Add 4-tap version of 2nd-pass ARMv6 MC filter. · 18dc92fd
      Timothy B. Terriberry authored
      The existing code applied a 6-tap filter with 0's on either end.
      We're already paying the branch penalty to avoid computing the two
       extra columns needed as input to this filter.
      We might as well save time computing the filter as well.
      This reduces the inner loop from 21 instructions to 16, the number
       of loads per iteration from 4 to 1, and the number of multiplies
       from 7 to 4.
      The gain in overall decoding performance, however, is small (less
       than 1%).
      
      This change also means we now valgrind clean on ARMv6, which is
       its real purpose.
      The errors reported here were valgrind's fault (it does not detect
       that 0 times an uninitialized value is initialized), but Julian
       Seward says it would slow down valgrind considerably to make such
       checks.
      Speeding up libvpx rather, even by a small amount, seems a much
       better idea if only to enable proper valgrind checking of the
       rest of the codec.
      
      Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
      18dc92fd
  2. Sep 27, 2010
  3. Sep 24, 2010
  4. Sep 23, 2010
  5. Sep 22, 2010
    • Johann Koenig's avatar
      Remove dead code · 7fed3832
      Johann Koenig authored
      The new loopfilter was originally introduced as an experimental change.
      It's permanent now.
      
      Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
      7fed3832
  6. Sep 21, 2010
  7. Sep 20, 2010
  8. Sep 17, 2010
    • Johann Koenig's avatar
      reorder data to use wider instructions · 022323bf
      Johann Koenig authored
      the previous commit laid the groundwork by doing two sets of idcts
      together. this moved that further by grouping the interesting data
      (q[0], q+16[0]) together to allow using wider instructions. also
      managed to drop a few instructions by recognizing that the constant
      for sinpi8sqrt2 could be downshifted all the time which avoided a
      dowshift as well as workarounds for a function which only accepted
      signed data
      
      looks like a modest gain for performance: at qcif, went from ~180
      fps to ~183
      Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
      022323bf
    • Yunqing Wang's avatar
      Restructure multi-threaded decoder · f857a850
      Yunqing Wang authored
      On each MB, loopfiltering is done right after MB decoding. This
      combines two loops in multi-threaded code into one, which reduces
      number of synchronizations to half.
      
      The above-row/left-col data are saved in temp buffers for
      next-row/next MB decoding.
      
      Tests on 4-core gLucid machine showed 10% decoder performance
      gain with threads=4 (tulip clip). Testing on other platforms
      isn't done yet.
      
      Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
      f857a850
  9. Sep 16, 2010
Loading