1. 09 Mar, 2011 1 commit
  2. 08 Mar, 2011 1 commit
    • Yunqing Wang's avatar
      Write SSSE3 sub-pixel filter function · 244e2e14
      Yunqing Wang authored
      1. Process 16 pixels at one time instead of 8.
      2. Add check for both xoffset =0 and yoffset=0, which happens
         during motion search.
      This change gave encoder 1%~3% performance gain.
      
      Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
      244e2e14
  3. 22 Feb, 2011 1 commit
  4. 10 Feb, 2011 1 commit
    • John Koleszar's avatar
      Fix relative include paths · 02321de0
      John Koleszar authored
      Allow compiling without adding vp8/{common,encoder,decoder} to the
      include paths.
      
      Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
      02321de0
  5. 18 Jan, 2011 1 commit
    • Attila Nagy's avatar
      Fix encoder real-time only configuration. · cb791aaa
      Attila Nagy authored
      Remove allocation/deallocation of stats storage.
      Remove full search functions in machine specific encoder inits.
      Remove last pass validation in  validate_config.
      
      Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
      cb791aaa
  6. 14 Jan, 2011 1 commit
    • Johann's avatar
      update sse2 regular quantizer · 15f9bea7
      Johann authored
      about ~5% gain on 32bit. disabled for 64bit
      
      unset executable bit on ssse3 version (cosmetic)
      
      Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
      15f9bea7
  7. 06 Jan, 2011 1 commit
    • Johann's avatar
      x86 sse2 temporal_filter_apply · 8b0cf5f7
      Johann authored
      count can be reduced to short because the max number of filtered frames
      is set to 15. the max value for any frame is 32 (modifier = 16,
      filter_weight = 2). 15*32 = 480 which requires 9 bits
      
      this function goes from about 7000 us / 1000 iterations for the C code
      to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
      about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8
      
      Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
      8b0cf5f7
  8. 28 Dec, 2010 1 commit
    • Scott LaVarnway's avatar
      Use the fast quantizer for inter mode selection · 516ea846
      Scott LaVarnway authored
      Use the fast quantizer for inter mode selection and the
      regular quantizer for the rest of the encode for good quality,
      speed 1.  Both performance and quality were improved.  The
      quality gains will make up for the quality loss mentioned in
      I9dc089007ca08129fb6c11fe7692777ebb8647b0.
      
      Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
      516ea846
  9. 09 Dec, 2010 1 commit
  10. 10 Nov, 2010 1 commit
    • Fritz Koenig's avatar
      FDCT optimizations. · 5f0e0617
      Fritz Koenig authored
      Fixed up the fdct for mmx and 8x4 sse2 to match them
      most recent changes.
      
      Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
      5f0e0617
  11. 01 Nov, 2010 1 commit
    • Scott LaVarnway's avatar
      SSSE3 version of fast quantizer · ff4a71f4
      Scott LaVarnway authored
      (test clip: tulip)
      For good quality mode with speed=1, this gave the encoder
      a small (2 - 3%) performance boost.
      
      Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
      ff4a71f4
  12. 27 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Full search SAD function optimization in SSE4.1 · 71ecb5d7
      Yunqing Wang authored
      Use mpsadbw, and calculate 8 sad at once. Function list:
      vp8_sad16x16x8_sse4
      vp8_sad16x8x8_sse4
      vp8_sad8x16x8_sse4
      vp8_sad8x8x8_sse4
      vp8_sad4x4x8_sse4
      
      (test clip: tulip)
      For best quality mode, this gave encoder a 5% performance boost.
      For good quality mode with speed=1, this gave encoder a 3%
      performance boost.
      
      Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
      71ecb5d7
    • John Koleszar's avatar
      Fix half-pixel variance RTCD functions · a0ae3682
      John Koleszar authored
      This patch fixes the system dependent entries for the half-pixel
      variance functions in both the RTCD and non-RTCD cases:
      
        - The generic C versions of these functions are now correct.
          Before all three cases called the hv code.
      
        - Wire up the ARM functions in RTCD mode
      
        - Created stubs for x86 to call the optimized subpixel functions
          with the correct parameters, rather than falling back to C
          code.
      
      Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
      a0ae3682
  13. 22 Oct, 2010 1 commit
    • Timothy B. Terriberry's avatar
      Convert [4][4] matrices to [16] arrays. · 8f75ea6b
      Timothy B. Terriberry authored
      Most of the code that actually uses these matrices indexes them as
       if they were a single contiguous array, and coverity produces
       reports about the resulting accesses that overflow the static
       bounds of the first row.
      This is perfectly legal in C, but converting them to actual [16]
       arrays should eliminate the report, and removes a good deal of
       extraneous indexing and address operators from the code.
      
      Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
      8f75ea6b
  14. 21 Oct, 2010 1 commit
    • Yunqing Wang's avatar
      Rewrite vp8_short_walsh4x4_sse2() · fc94ffce
      Yunqing Wang authored
      This rewriting reflects changes made in commit "Improve the
      accuracy of forward walsh-hadamard transform". Since this function
      is not called much, only a small encoder performance gain (~0.5% )
      is seen.
      
      Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
      fc94ffce
  15. 18 Oct, 2010 1 commit
    • Yunqing Wang's avatar
      Add SSE2 subtract functions · 4db20765
      Yunqing Wang authored
      Instead of doing 8-bit data unpack and 16-bit subtraction, use
      psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
      sign information. This does not bring noticable gain since
      these functions are not called frequently.
      
      Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
      4db20765
  16. 14 Oct, 2010 1 commit
  17. 07 Oct, 2010 1 commit
    • Scott LaVarnway's avatar
      Added vp8_fast_quantize_b_sse2 · d860f685
      Scott LaVarnway authored
      Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into
      quantize_sse2.asm and renamed.  Updated the assembly code to
      match the C version.
      
      Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
      d860f685
  18. 09 Sep, 2010 1 commit
  19. 23 Jul, 2010 1 commit
    • Timothy B. Terriberry's avatar
      Make the quantizer exact. · e04e2935
      Timothy B. Terriberry authored
      This replaces the approximate division-by-multiplication in the
       quantizer with an exact one that costs just one add and one
       shift extra.
      The asm versions have not been updated in this patch, and thus
       have been disabled, since the new method requires different
       multipliers which are not compatible with the old method.
      
      Change-Id: I53ac887af0f969d906e464c88b1f4be69c6b1206
      e04e2935
  20. 29 Jun, 2010 1 commit
    • Yaowu Xu's avatar
      Improve the accuracy of forward walsh-hadamard transform · b62d093e
      Yaowu Xu authored
      Besides the slight improvement in round trip error. This
      also fixes a sign bias in the forward transform, so the
      round trip errors are evenly distributed between +1s and
      -1s. The old bias seemed to work well with the dc sign bias
      in old fdct,  which no longer exist in the improved fdct.
      
      Change-Id: I8635e7be16c69e69a8669eca5438550d23089cef
      b62d093e
  21. 24 Jun, 2010 2 commits
    • Scott LaVarnway's avatar
      Added first-pass sse2 version of Yaowu's new fdct. · f1a3b1e0
      Scott LaVarnway authored
      Change-Id: Ib479210067510162879c368428b92690591120b2
      f1a3b1e0
    • Yaowu Xu's avatar
      Redo the forward 4x4 dct · d0dd01b8
      Yaowu Xu authored
      The new fdct lowers the round trip sum squared error for a
      4x4 block ~0.12. or ~0.008/pixel. For reference, the old
      matrix multiply version has average round trip error 1.46
      for a 4x4 block.
      
      Thanks to "derf" for his suggestions and references.
      
      Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
      d0dd01b8
  22. 18 Jun, 2010 1 commit
    • John Koleszar's avatar
      cosmetics: trim trailing whitespace · 94c52e4d
      John Koleszar authored
      When the license headers were updated, they accidentally contained
      trailing whitespace, so unfortunately we have to touch all the files
      again.
      
      Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
      94c52e4d
  23. 14 Jun, 2010 1 commit
    • Scott LaVarnway's avatar
      sse2 version of vp8_regular_quantize_b · 48c84d13
      Scott LaVarnway authored
      Added sse2 version of vp8_regular_quantize_b which improved encode
      performance(for the clip used) by ~10% for 32 bit builds and ~3% for
      64 bit builds.
      
      Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.
      
      Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
      48c84d13
  24. 04 Jun, 2010 1 commit
  25. 18 May, 2010 1 commit