1. 28 Aug, 2014 1 commit
  2. 27 May, 2014 1 commit
  3. 22 May, 2014 1 commit
  4. 30 Apr, 2014 1 commit
  5. 25 Apr, 2014 1 commit
  6. 17 Mar, 2014 1 commit
  7. 04 Oct, 2013 1 commit
  8. 06 Aug, 2013 1 commit
  9. 20 Jun, 2013 2 commits
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). · 1e6a32f1
      Ronald S. Bultje authored
      Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
      3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
      which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
      perfectly interleaved, and can probably be improved further in the
      future. I've marked this with a few TODOs/FIXMEs in the code.
      
      Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
      1e6a32f1
    • Ronald S. Bultje's avatar
      Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581
      Ronald S. Bultje authored
      Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
      3min58). Specific changes to timings for each function compared to
      original assembly-optimized versions (or just new version timings if
      no previous assembly-optimized version was available):
      
      sse2   4x4:    99 ->   82 cycles
      sse2   4x8:           128 cycles
      sse2   8x4:           121 cycles
      sse2   8x8:   149 ->  129 cycles
      sse2   8x16:  235 ->  245 cycles (?)
      sse2  16x8:   269 ->  203 cycles
      sse2  16x16:  441 ->  349 cycles
      sse2  16x32:          641 cycles
      sse2  32x16:          643 cycles
      sse2  32x32: 1733 -> 1154 cycles
      sse2  32x64:         2247 cycles
      sse2  64x32:         2323 cycles
      sse2  64x64: 6984 -> 4442 cycles
      
      ssse3  4x4:           100 cycles (?)
      ssse3  4x8:           103 cycles
      ssse3  8x4:            71 cycles
      ssse3  8x8:           147 cycles
      ssse3  8x16:          158 cycles
      ssse3 16x8:   188 ->  162 cycles
      ssse3 16x16:  316 ->  273 cycles
      ssse3 16x32:          535 cycles
      ssse3 32x16:          564 cycles
      ssse3 32x32:          973 cycles
      ssse3 32x64:         1930 cycles
      ssse3 64x32:         1922 cycles
      ssse3 64x64:         3760 cycles
      
      Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
      8fb6c581
  10. 22 May, 2013 1 commit
    • Yunqing Wang's avatar
      Optimize variance functions · f4fcfe30
      Yunqing Wang authored
      Added SSE2 version of variance functions for super blocks.
      
      Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d
      f4fcfe30
  11. 06 Feb, 2013 1 commit
  12. 05 Dec, 2012 1 commit
  13. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
      fcccbcbb
  14. 07 Nov, 2012 1 commit
    • James Zern's avatar
      Fix variance (signed integer) overflow · 98473443
      James Zern authored
      In the variance calculations the difference is summed and later squared.
      When the sum exceeds sqrt(2^31) the value is treated as a negative when
      it is shifted which gives incorrect results.
      
      To fix this we force the multiplication to be unsigned.
      
      The alternative fix is to shift sum down by 4 before multiplying.
      However that will reduce precision.
      
      For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
      change).
      
      This change is based on:
      16982342 Missed some variance casts
      fea3556e Fix variance overflow
      
      Change-Id: I2c61856cca9db54b9b81de83b4505ea81a050a0f
      98473443
  15. 01 Nov, 2012 1 commit
  16. 31 Oct, 2012 2 commits
  17. 08 Aug, 2012 1 commit
    • Deb Mukherjee's avatar
      Merging in the sixteenth subpel uv experiment · 7d065653
      Deb Mukherjee authored
      Merges this experiment in to make it easier to run tests on
      filter precision, vectorized implementation etc.
      
      Also removes an experimental filter.
      
      Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
      7d065653
  18. 17 Jul, 2012 1 commit
  19. 15 Mar, 2012 1 commit
    • Yaowu Xu's avatar
      WebM Experimental Codec Branch Snapshot · 6035da54
      Yaowu Xu authored
      This is a code snapshot of experimental work currently ongoing for a
      next-generation codec.
      
      The codebase has been cut down considerably from the libvpx baseline.
      For example, we are currently only supporting VBR 2-pass rate control
      and have removed most of the code relating to coding speed, threading,
      error resilience, partitions and various other features.  This is in
      part to make the codebase easier to work on and experiment with, but
      also because we want to have an open discussion about how the bitstream
      will be structured and partitioned and not have that conversation
      constrained by past work.
      
      Our basic working pattern has been to initially encapsulate experiments
      using configure options linked to #IF CONFIG_XXX statements in the
      code. Once experiments have matured and we are reasonably happy that
      they give benefit and can be merged without breaking other experiments,
      we remove the conditional compile statements and merge them in.
      
      Current changes include:
      * Temporal coding experiment for segments (though still only 4 max, it
        will likely be increased).
      * Segment feature experiment - to allow various bits of information to
        be coded at the segment level. Features tested so far include mode
        and reference frame information, limiting end of block offset and
        transform size, alongside Q and loop filter parameters, but this set
        is very fluid.
      * Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
        in MBs using 16x16 prediction modes within inter frames.
      * Compound prediction (combination of signals from existing predictors
        to create a new predictor).
      * 8 tap interpolation filters and 1/8th pel motion vectors.
      * Loop filter modifications.
      * Various entropy modifications and changes to how entropy contexts and
        updates are handled.
      * Extended quantizer range matched to transform precision improvements.
      
      There are also ongoing further experiments that we hope to merge in the
      near future: For example, coding of motion and other aspects of the
      prediction signal to better support larger image formats, use of larger
      block sizes (e.g. 32x32 and up) and lossless non-transform based coding
      options (especially for key frames). It is our hope that we will be
      able to make regular updates and we will warmly welcome community
      contributions.
      
      Please be warned that, at this stage, the codebase is currently slower
      than VP8 stable branch as most new code has not been optimized, and
      even the 'C' has been deliberately written to be simple and obvious,
      not fast.
      
      The following graphs have the initial test results, numbers in the
      tables measure the compression improvement in terms of percentage. The
      build has  the following optional experiments configured:
      --enable-experimental --enable-enhanced_interp --enable-uvintra
      --enable-high_precision_mv --enable-sixteenth_subpel_uv
      
      CIF Size clips:
      http://getwebm.org/tmp/cif/
      HD size clips:
      http://getwebm.org/tmp/hd/
      (stable_20120309 represents encoding results of WebM master branch
      build as of commit#7a159071)
      
      They were encoded using the following encode parameters:
      --good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
      --end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
      --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
      --minsection-pct=0 --maxsection-pct=800 --sharpness=0
      --arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
      --arnr-type=3
      
      Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
      6035da54
  20. 06 Mar, 2012 1 commit
  21. 23 Feb, 2012 1 commit
    • Deb Mukherjee's avatar
      Supporting high precision 1/8-pel motion vectors · 18e90d74
      Deb Mukherjee authored
      This is the initial patch for supporting 1/8th pel
      motion. Currently if we configure with enable-high-precision-mv,
      all motion vectors would default to 1/8 pel. Encode and
      decode syncs fine with the current code. In the next phase
      the code will be refactored so that we can choose the 1/8
      pel mode adaptively at a frame/segment/mb level.
      
      Derf results:
      http://www.corp.google.com/~debargha/vp8_results/enhinterp_hpmv.html
      (about 0.83% better than 8-tap interpoaltion)
      
      Patch 3: Rebased. Also adding 1/16th pel interpolation for U and V
      
      Patch 4: HD results.
      http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd_hpmv.html
      Seems impressive (unless I am doing something wrong).
      
      Patch 5: Added mmx/sse for bilateral filtering, as well as enforced
      use of c-versions of subpel filters with 8-taps and 1/16th pel;
      Also redesigned the 8-tap filters to reduce the cut-off in order to
      introduce a denoising effect. There is a new configure option
      sixteenth-subpel-uv which will use 1/16 th pel interpolation for
      uv, if the motion vectors have 1/8 pel accuracy.
      
      With the fixes the results are promising on the derf set. The enhanced
      interpolation option with 8-taps alone gives 3% improvement over thei
      derf set:
      http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html
      
      Results on high precision mv and on the hd set are to follow.
      
      Patch 6: Adding a missing condition for CONFIG_SIXTEENTH_SUBPEL_UV in
      vp8/common/x86/x86_systemdependent.c
      
      Patch 7: Cleaning up various debug messages.
      
      Patch 8: Merge conflict
      
      Change-Id: I5b1d844457aefd7414a9e4e0e06c6ed38fd8cc04
      18e90d74
  22. 10 Feb, 2012 1 commit
  23. 09 Feb, 2012 1 commit
    • Johann's avatar
      Fix variance overflow · fea3556e
      Johann authored
      In the variance calculations the difference is summed and later squared.
      When the sum exceeds sqrt(2^31) the value is treated as a negative when
      it is shifted which gives incorrect results.
      
      To fix this we cast the result of the multiplication as unsigned.
      
      The alternative fix is to shift sum down by 4 before multiplying.
      However that will reduce precision.
      
      For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
      change).
      
      PPC change is untested.
      
      Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
      fea3556e
  24. 19 Nov, 2011 1 commit
    • Johann's avatar
      Move shared data to shared location · f2cd4ded
      Johann authored
      Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
      file is bad
      
      Moving towards allowing --disable-mmx
      
      Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
      f2cd4ded
  25. 09 Jun, 2011 1 commit
  26. 25 May, 2011 1 commit
  27. 09 Mar, 2011 1 commit
  28. 08 Mar, 2011 1 commit
  29. 10 Feb, 2011 1 commit
    • John Koleszar's avatar
      Fix relative include paths · 02321de0
      John Koleszar authored
      Allow compiling without adding vp8/{common,encoder,decoder} to the
      include paths.
      
      Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
      02321de0
  30. 21 Jan, 2011 1 commit
  31. 27 Oct, 2010 1 commit
    • John Koleszar's avatar
      Fix half-pixel variance RTCD functions · a0ae3682
      John Koleszar authored
      This patch fixes the system dependent entries for the half-pixel
      variance functions in both the RTCD and non-RTCD cases:
      
        - The generic C versions of these functions are now correct.
          Before all three cases called the hv code.
      
        - Wire up the ARM functions in RTCD mode
      
        - Created stubs for x86 to call the optimized subpixel functions
          with the correct parameters, rather than falling back to C
          code.
      
      Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
      a0ae3682
  32. 12 Oct, 2010 1 commit
  33. 09 Sep, 2010 1 commit
  34. 18 Jun, 2010 1 commit
    • John Koleszar's avatar
      cosmetics: trim trailing whitespace · 94c52e4d
      John Koleszar authored
      When the license headers were updated, they accidentally contained
      trailing whitespace, so unfortunately we have to touch all the files
      again.
      
      Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
      94c52e4d
  35. 04 Jun, 2010 1 commit
  36. 18 May, 2010 1 commit