1. 18 Oct, 2012 1 commit
    • Scott LaVarnway's avatar
      sse2 intrinsic version of vp8_mbloop_filter_horizontal_edge() · 992b5e2d
      Scott LaVarnway authored
      First sse2 version of vp8_mbloop_filter_horizontal_edge().  For now,
      intrinsics are being used until the bitstream is finalized.  This function
      will be revisited later for further performance improvements.
      For the test clip used, a 31+% decoder performance improvement
      was seen.  This will vary depending on material.
      
      Change-Id: I03ed3a7182478bdd1f094644ff3e0442625600e7
      992b5e2d
  2. 17 Oct, 2012 1 commit
  3. 24 Aug, 2012 1 commit
    • Paul Wilkins's avatar
      New Motion Reference Search · 2d60bee1
      Paul Wilkins authored
      Alternative strategy for finding a list of candidate motion
      vectors to use as reference values in mv coding and as
      nearest and near.
      
      Sort by sad in vp8_find_best_ref_mvs() rather than just
      pick the best. Allow 0,0 as a best ref option but not a
      nearest or near unless there are no alternatives.
      
      Encode/Decode verified on at least some clips.
      
      Some commented out experimental and stats code still in place.
      
      Gain over existing code averages about 1% on derf (alll metrics)
      with improvement on all clips. Other test results pending.
      
      The entropy coding of the mode (nearest/near etc) still
      depends upon and requires the old "findnear" code so
      this needs looking at and may provide room for further gains.
      
      Change-Id: I871d7cba1d1c379c4bad9bcccce1fb19c46b8247
      2d60bee1
  4. 21 Aug, 2012 2 commits
  5. 16 Aug, 2012 1 commit
  6. 08 Aug, 2012 1 commit
  7. 23 May, 2012 1 commit
    • Yaowu Xu's avatar
      changed the way that default probs for 8x8 is set. · e9818bb6
      Yaowu Xu authored
      The commit changed how baseline 8x8 coefficient probabilities are
      initialized, to be consistent with the initialization of baseline
      4x4 coefficient probabilities.
      
      The commit does not have any effect on compression.
      
      Change-Id: Ifb3902b5dc0b0c2e6dc3aa5d4a6589d528e58355
      e9818bb6
  8. 15 Mar, 2012 1 commit
    • Yaowu Xu's avatar
      WebM Experimental Codec Branch Snapshot · 6035da54
      Yaowu Xu authored
      This is a code snapshot of experimental work currently ongoing for a
      next-generation codec.
      
      The codebase has been cut down considerably from the libvpx baseline.
      For example, we are currently only supporting VBR 2-pass rate control
      and have removed most of the code relating to coding speed, threading,
      error resilience, partitions and various other features.  This is in
      part to make the codebase easier to work on and experiment with, but
      also because we want to have an open discussion about how the bitstream
      will be structured and partitioned and not have that conversation
      constrained by past work.
      
      Our basic working pattern has been to initially encapsulate experiments
      using configure options linked to #IF CONFIG_XXX statements in the
      code. Once experiments have matured and we are reasonably happy that
      they give benefit and can be merged without breaking other experiments,
      we remove the conditional compile statements and merge them in.
      
      Current changes include:
      * Temporal coding experiment for segments (though still only 4 max, it
        will likely be increased).
      * Segment feature experiment - to allow various bits of information to
        be coded at the segment level. Features tested so far include mode
        and reference frame information, limiting end of block offset and
        transform size, alongside Q and loop filter parameters, but this set
        is very fluid.
      * Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
        in MBs using 16x16 prediction modes within inter frames.
      * Compound prediction (combination of signals from existing predictors
        to create a new predictor).
      * 8 tap interpolation filters and 1/8th pel motion vectors.
      * Loop filter modifications.
      * Various entropy modifications and changes to how entropy contexts and
        updates are handled.
      * Extended quantizer range matched to transform precision improvements.
      
      There are also ongoing further experiments that we hope to merge in the
      near future: For example, coding of motion and other aspects of the
      prediction signal to better support larger image formats, use of larger
      block sizes (e.g. 32x32 and up) and lossless non-transform based coding
      options (especially for key frames). It is our hope that we will be
      able to make regular updates and we will warmly welcome community
      contributions.
      
      Please be warned that, at this stage, the codebase is currently slower
      than VP8 stable branch as most new code has not been optimized, and
      even the 'C' has been deliberately written to be simple and obvious,
      not fast.
      
      The following graphs have the initial test results, numbers in the
      tables measure the compression improvement in terms of percentage. The
      build has  the following optional experiments configured:
      --enable-experimental --enable-enhanced_interp --enable-uvintra
      --enable-high_precision_mv --enable-sixteenth_subpel_uv
      
      CIF Size clips:
      http://getwebm.org/tmp/cif/
      HD size clips:
      http://getwebm.org/tmp/hd/
      (stable_20120309 represents encoding results of WebM master branch
      build as of commit#7a159071)
      
      They were encoded using the following encode parameters:
      --good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
      --end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
      --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
      --minsection-pct=0 --maxsection-pct=800 --sharpness=0
      --arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
      --arnr-type=3
      
      Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
      6035da54
  9. 12 Mar, 2012 1 commit
    • Yaowu Xu's avatar
      fixed .mk files to reflect add/remove of a header file · 3f5feb7d
      Yaowu Xu authored
      In a previous commit, the duplicate of headerfile defaultcoefcounts.h
      was identified. This commit updates the .mk file to ensure configure
      and make works properly for all platforms.
      
      Change-Id: I31a39c809a734ba438ee53db700f252e9a03eddd
      3f5feb7d
  10. 10 Feb, 2012 1 commit
    • Paul Wilkins's avatar
      Removal of threading code. · 2615ca5d
      Paul Wilkins authored
      For the experimental branch we are trying to slim the codebase
      down removing features such as threading for now which complicate
      the process of development and testing.
      
      Change-Id: I657c0246aef4d1fa8c8ffc6a1adfeee45bce8e24
      2615ca5d
  11. 31 Jan, 2012 1 commit
    • Paul Wilkins's avatar
      Added common prediction modules. · b2f64dff
      Paul Wilkins authored
      This function adds the common prediction modules,  some data structures
      and a config option but does not use them.
      
      It also corrects a bug in clearing down  the MODE_INFO border and introduces
      a new element that indicates if an entry corresponds to an "in image" macro block
      or is part of the border.
      
      Change-Id: Ib69eec0876173ebe9d1de9df9537d0b2447702e0
      b2f64dff
  12. 24 Jan, 2012 1 commit
    • Jim Bankoski's avatar
      vpn common -> implicit segmentation · 91325b8f
      Jim Bankoski authored
      This introduces base functions for introducing implicit segmentation.
      The code that actually stores the results to the segment map isn't
      here yet.   This just prints out the segmentation map results
      if you call it.
      
      Uses connected component labeling technique on mbmi info so that only
      if 2 mbs are horizontally or vertically touching do they get the same
      segment.
      
      vp8next - plumbing for rotation
      
      code to produce taps for rotation ( tapify. py ),  code
      for predicting using rotation ( predict_rotated.c ) ,  code
      for finding the best rotation find_rotation.c.
      
      didn't checkin code that uses this in the codec.   still work
      in progress.
      
      Fixed copyright notice
      
      Change-Id: I450c13cfa41ab2fcb699f3897760370b4935fdf8
      91325b8f
  13. 24 Oct, 2011 1 commit
    • Paul Wilkins's avatar
      Further segment feature extensions. · 01ce04bc
      Paul Wilkins authored
      This quite large check in includes the following:
      
      Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
      This is used as a basis for a simple segmentation for the normal frames
      in a gf/arf group. This code also uses satd functions from Yaowu.
      
      Adds functionality for coding the latest possible position of an EOB for
      blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
      Where the EOB position is 0 this acts like "skip" and the normal coding
      of skip at the per mb level is disabled.
      
      Added functions (seg_common.c) for setting and reading segment feature
      elements. These may want to be optimized away at some point but while the
      mecahnism is in a state of flux they provide a single location for making
      changes and keep things a bit cleaner.
      
      This is still proof of concept code. Currently the tested feature set:-
      
      Quantizer,
      Loop Filter level,
      Reference frame,
      Prediction Mode,
      EOB end stop.
      
      TBD:-
      
      Add functions for setting and reading the feature data with range
      and validity checking.
      
      Handling of signed and unsigned feature data. At the moment all is assumed
      to be signed and a sign bit is coded but many cannot be negative.
      
      Correct handling of EOB feature with intra coded blocks.
      
      Testing/trapping of legal/illegal ref frame and mode combinations.
      
      Transform size switch plus merge and test with 8c8 DCT work
      
      Merge and test with Sumans Segmenation coding optimizations
      
      Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
      01ce04bc
  14. 22 Sep, 2011 1 commit
    • John Koleszar's avatar
      Install missing default_coef_probs.h · 4a6ac727
      John Koleszar authored
      Make sure that this header is listed as one of the sources, so that it
      will be installed if necessary.
      
      Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
      4a6ac727
  15. 16 Aug, 2011 1 commit
    • Scott LaVarnway's avatar
      Faster vp8_default_coef_probs · 19987dcb
      Scott LaVarnway authored
      Copies from a generated table instead of building the
      default coeff probabilities during runtime.
      
      Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50
      19987dcb
  16. 02 Aug, 2011 1 commit
  17. 01 Aug, 2011 1 commit
  18. 28 Jun, 2011 1 commit
    • Stefan Holmer's avatar
      Adding support for independent partitions · 4cb0ebe5
      Stefan Holmer authored
      Adding support in the encoder for generating
      independent residual partitions by forcing
      equal probabilities over the prev coef entropy
      contexts.
      
      Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
      4cb0ebe5
  19. 21 Jun, 2011 1 commit
  20. 27 Apr, 2011 1 commit
    • Ronald S. Bultje's avatar
      SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}(). · 1083fe49
      Ronald S. Bultje authored
      decoding
      
      before
      10.425
      10.432
      10.423
      =10.426
      
      after:
      10.405
      10.416
      10.398
      =10.406, 0.2% faster
      
      encoding
      
      before
      14.252
      14.331
      14.250
      14.223
      14.241
      14.220
      14.221
      =14.248
      
      after
      14.095
      14.090
      14.085
      14.095
      14.064
      14.081
      14.089
      =14.086, 1.1% faster
      
      Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
      1083fe49
  21. 18 Mar, 2011 1 commit
    • John Koleszar's avatar
      Increase static linkage, remove unused functions · 429dc676
      John Koleszar authored
      A large number of functions were defined with external linkage, even
      though they were only used from within one file. This patch changes
      their linkage to static and removes the vp8_ prefix from their names,
      which should make it more obvious to the reader that the function is
      contained within the current translation unit. Functions that were
      not referenced were removed.
      
      These symbols were identified by:
      
        $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
          | sort | grep '^ *1 '
      
      Change-Id: I59609f58ab65312012c047036ae1e0634f795779
      429dc676
  22. 09 Mar, 2011 1 commit
  23. 18 Feb, 2011 2 commits
  24. 10 Feb, 2011 1 commit
    • John Koleszar's avatar
      Fix relative include paths · 02321de0
      John Koleszar authored
      Allow compiling without adding vp8/{common,encoder,decoder} to the
      include paths.
      
      Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
      02321de0
  25. 09 Feb, 2011 1 commit
    • Tero Rintaluoma's avatar
      Adds armv6 optimized variance calculation · cb14764f
      Tero Rintaluoma authored
      Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
      ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
      and adds new assembly file for variance16x16 calculation.
       - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
       - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
       - vp8_variance16x16_armv6 (new)
       - bilinearfilter_arm.h (new)
      Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
      cb14764f
  26. 08 Feb, 2011 2 commits
    • Johann's avatar
      clean up bilinear filter · e5aaac24
      Johann authored
      make reference version of bilinear_filters short.
      use reference versions of bilinear_filters and sub_pel_filters when
      possible.
      
      recognize that Width was being passed into
      filter_block2d_bil_first_pass multiple times. ARM version had already
      fixed this. propegate to C.
      
      change references to src_pixels_per_line to src_pitch and standardize on
      src/dst (instead of input/output).
      
      recognize that first_pass is only run in the verticle and second_pass
      only horizontal. ARM version had already fixed this. propegate to C
      
      Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
      e5aaac24
    • Johann's avatar
      clarify *_offsets.asm differences · 40dcae9c
      Johann authored
      it's difficult to mux the *_offsets.c files because of header conflicts.
      make three instead, name them consistently and partititon the contents
      to allow building them as required.
      
      Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
      40dcae9c
  27. 07 Feb, 2011 1 commit
    • Johann's avatar
      move one of the offset files · 3273c7b6
      Johann authored
      common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
      encoder/arm/vpx_vp8_enc_asm_offsets
      
      Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
      3273c7b6
  28. 13 Dec, 2010 1 commit
    • John Koleszar's avatar
      remove unused temporal preproc code · b1aa54ab
      John Koleszar authored
      This code is unused, as the current preproc implementation uses the
      same spatial filter that postproc uses.
      
      Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
      b1aa54ab
  29. 16 Nov, 2010 1 commit
  30. 26 Oct, 2010 2 commits
    • John Koleszar's avatar
      make vp8_recon16x16mb{,y} RTCD functions · d6c67f02
      John Koleszar authored
      ARM NEON has a platform specific version of vp8_recon16x16mb, though
      it's just a stub to extract the various parameters from the
      MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
      that function's prototype directly will be a better long term solution,
      but it's quite an invasive change.
      
      Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
      d6c67f02
    • John Koleszar's avatar
      arm: move unrolled loops back to generic code · 19638c23
      John Koleszar authored
      Some of the ARM functions differed from their generic counterparts
      only by unrolling their loops. Since this change may be useful
      on other platforms, or might even supercede the looped version
      in the generic case, move it back to the generic file.
      
      This code is left under #if ARCH_ARM for now, but it may be worth
      considering a different (possibly new) conditional for these. If
      it turns out that this should be runtime selectable, these
      functions will have to move to the RTCD infrastructure. Don't want
      to take that step at this time without more profile data.
      
      Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f
      19638c23
  31. 25 Oct, 2010 2 commits
    • Johann's avatar
      reuse common loopfilter code · 1376f061
      Johann authored
      there were four versions for the regular and
      macroblock loopfilters:
      horizontal [y|uv]
      vertical [y|uv]
      
      this moves all the common code into 2 functions:
      vp8_loop_filter_neon
      vp8_mbloop_filter_neon
      
      this provides no gain in performance. there's a bit
      of jitter, but it trends down ~0.25-0.5%. however,
      this is a huge gain maintenance. also, there is the
      potential to drop some stack usage in the macroblock
      loopfilter.
      
      Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
      1376f061
    • Timothy B. Terriberry's avatar
      Add runtime CPU detection support for ARM. · b71962fd
      Timothy B. Terriberry authored
      The primary goal is to allow a binary to be built which supports
       NEON, but can fall back to non-NEON routines, since some Android
       devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
       Tegra).
      The configure-generated flags HAVE_ARMV7, etc., are used to decide
       which versions of each function to build, and when
       CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
       at run time.
      In order for this to work, the CFLAGS must be set to something
       appropriate (e.g., without -mfpu=neon for ARMv7, and with
       appropriate -march and -mcpu for even earlier configurations), or
       the native C code will not be able to run.
      The ASFLAGS must remain set for the most advanced instruction set
       required at build time, since the ARM assembler will refuse to emit
       them otherwise.
      I have not attempted to make any changes to configure to do this
       automatically.
      Doing so will probably require the addition of new configure options.
      
      Many of the hooks for RTCD on ARM were already there, but a lot of
       the code had bit-rotted, and a good deal of the ARM-specific code
       is not integrated into the RTCD structs at all.
      I did not try to resolve the latter, merely to add the minimal amount
       of protection around them to allow RTCD to work.
      Those functions that were called based on an ifdef at the calling
       site were expanded to check the RTCD flags at that site, but they
       should be added to an RTCD struct somewhere in the future.
      The functions invoked with global function pointers still are, but
       these should be moved into an RTCD struct for thread safety (I
       believe every platform currently supported has atomic pointer
       stores, but this is not guaranteed).
      
      The encoder's boolhuff functions did not even have _c and armv7
       suffixes, and the correct version was resolved at link time.
      The token packing functions did have appropriate suffixes, but the
       version was selected with a define, with no associated RTCD struct.
      However, for both of these, the only armv7 instruction they actually
       used was rbit, and this was completely superfluous, so I reworked
       them to avoid it.
      The only non-ARMv4 instruction remaining in them is clz, which is
       ARMv5 (not even ARMv5TE is required).
      Considering that there are no ARM-specific configs which are not at
       least ARMv5TE, I did not try to detect these at runtime, and simply
       enable them for ARMv5 and above.
      
      Finally, the NEON register saving code was completely non-reentrant,
       since it saved the registers to a global, static variable.
      I moved the storage for this onto the stack.
      A single binary built with this code was tested on an ARM11 (ARMv6)
       and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
       and produced identical output, while using the correct accelerated
       functions on each.
      I did not test on any earlier processors.
      
      Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
      b71962fd
  32. 24 Sep, 2010 1 commit
    • John Koleszar's avatar
      move reconintra_mt to decoder (for now) · 48e76ff4
      John Koleszar authored
      reconintra_mt.c is only required for building the decoder right now.
      It could definitely be used for the encoder in the future, but it
      currently depends on decoder only data structures. (onyxd_int.h,
      VP8D_COMP, etc). Move it from common/ to decoder/ until the
      necessary changes to the common multithread code are complete.
      
      This patch is needed to build with --disable-vp8-decoder.
      
      Change-Id: I568c52221a2b309234d269675cba97131ce35c86
      48e76ff4
  33. 17 Sep, 2010 1 commit
    • Yunqing Wang's avatar
      Restructure multi-threaded decoder · f857a850
      Yunqing Wang authored
      On each MB, loopfiltering is done right after MB decoding. This
      combines two loops in multi-threaded code into one, which reduces
      number of synchronizations to half.
      
      The above-row/left-col data are saved in temp buffers for
      next-row/next MB decoding.
      
      Tests on 4-core gLucid machine showed 10% decoder performance
      gain with threads=4 (tulip clip). Testing on other platforms
      isn't done yet.
      
      Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
      f857a850
  34. 09 Sep, 2010 1 commit
  35. 02 Sep, 2010 1 commit
    • James Zern's avatar
      encoder: remove postproc dependency · 76640f85
      James Zern authored
      Remove the dependency on postproc.c for the encoder in general, the only
      unchecked need for it is when CONFIG_PSNR is enabled. All other cases
      are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
      will still be included.
      
      Additionally, when VP8_SET_POSTPROC is used with the encoder when post
      processing has been disabled an error will be returned.
      
      This addresses issue #153.
      
      Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
      76640f85