1. 05 Apr, 2017 1 commit
  2. 04 Apr, 2017 16 commits
  3. 03 Apr, 2017 18 commits
  4. 02 Apr, 2017 5 commits
    • Steinar Midtskogen's avatar
      Add v64_abs_s8, v128_abs_s8 and v256_abs_s8 · 6033fb85
      Steinar Midtskogen authored
      Change-Id: I529509e4e997ba123799a3a581d20624d75cf582
      6033fb85
    • Steinar Midtskogen's avatar
      CLPF: Add architecture postfix to the name of static functions · 569c7b91
      Steinar Midtskogen authored
      This makes it clear when profiling that the correct SIMD optimised
      function is run.
      
      Change-Id: I35d69b3611f40650a85f1973c4010453b2bf5a53
      569c7b91
    • Steinar Midtskogen's avatar
      Move the CLPF damping adjustment for strength up in the call chain · febe223d
      Steinar Midtskogen authored
      Rather than having the adjustment in the leaf functions, do the
      adjustment in the top-level function.  Optimising compilers would
      figure this out themselves as far as the functions are inlined, but
      probably no further and this patch gives a slightly reduced object
      code size.
      
      Change-Id: I104750962f613fa665391c9b2a9e99bcc6f47f93
      febe223d
    • Jean-Marc Valin's avatar
      Temporarily revert some 4:2:2 code · ec70797d
      Jean-Marc Valin authored
      As part of 9cf0c9cd the buffering was made
      to better handle 4:2:2, but that causes regressions in the tests, so we're
      backing out part of it for now.
      
      Change-Id: I9ca4cfeb159aa65514613989e3dcbc30f86ec5b2
      ec70797d
    • Yue Chen's avatar
      Use 1 sample per neighbor for local warping model estimation · 5558e5da
      Yue Chen authored
      Only 1 sample needs to be collected. Max of 8 neighbors are
      used.
      In LS estimation, the projection samples (sx, sy)->(dx, dy) are
      intentionally smoothed by assuming 3 shifted versions
      (sx, sy+n)->(dx, dy+n), (sx+n, sy)->(dx+n, dy), (sx+n,
      sy+n)->(dx+n, dy+n) also contribute to the estimation.
      For example, instead of using A[0] = sx^2, we use the sum of
      squares of source x of four points, A[0] += 4sx^2+4*n*sx+n^2.
      But computational cost wise, it does not add much overhead. Coding
      gain is mostly same as the old version. If no smoothing is added,
      will lose 0.3% on lowres.
      
      Change-Id: I04be32cffa525f7dc8ee583c0bf211d7bdc6e609
      5558e5da