1. 25 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Add runtime CPU detection support for ARM. · b71962fd
      Timothy B. Terriberry authored
      The primary goal is to allow a binary to be built which supports
       NEON, but can fall back to non-NEON routines, since some Android
       devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
       Tegra).
      The configure-generated flags HAVE_ARMV7, etc., are used to decide
       which versions of each function to build, and when
       CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
       at run time.
      In order for this to work, the CFLAGS must be set to something
       appropriate (e.g., without -mfpu=neon for ARMv7, and with
       appropriate -march and -mcpu for even earlier configurations), or
       the native C code will not be able to run.
      The ASFLAGS must remain set for the most advanced instruction set
       required at build time, since the ARM assembler will refuse to emit
       them otherwise.
      I have not attempted to make any changes to configure to do this
       automatically.
      Doing so will probably require the addition of new configure options.
      
      Many of the hooks for RTCD on ARM were already there, but a lot of
       the code had bit-rotted, and a good deal of the ARM-specific code
       is not integrated into the RTCD structs at all.
      I did not try to resolve the latter, merely to add the minimal amount
       of protection around them to allow RTCD to work.
      Those functions that were called based on an ifdef at the calling
       site were expanded to check the RTCD flags at that site, but they
       should be added to an RTCD struct somewhere in the future.
      The functions invoked with global function pointers still are, but
       these should be moved into an RTCD struct for thread safety (I
       believe every platform currently supported has atomic pointer
       stores, but this is not guaranteed).
      
      The encoder's boolhuff functions did not even have _c and armv7
       suffixes, and the correct version was resolved at link time.
      The token packing functions did have appropriate suffixes, but the
       version was selected with a define, with no associated RTCD struct.
      However, for both of these, the only armv7 instruction they actually
       used was rbit, and this was completely superfluous, so I reworked
       them to avoid it.
      The only non-ARMv4 instruction remaining in them is clz, which is
       ARMv5 (not even ARMv5TE is required).
      Considering that there are no ARM-specific configs which are not at
       least ARMv5TE, I did not try to detect these at runtime, and simply
       enable them for ARMv5 and above.
      
      Finally, the NEON register saving code was completely non-reentrant,
       since it saved the registers to a global, static variable.
      I moved the storage for this onto the stack.
      A single binary built with this code was tested on an ARM11 (ARMv6)
       and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
       and produced identical output, while using the correct accelerated
       functions on each.
      I did not test on any earlier processors.
      
      Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
      b71962fd
    • Johann's avatar
      isolate new temporal filtering code · e81e30c2
      Johann authored
      onyx_if is getting pretty big. split out the temporal code to make it
      easier to look at.
      
      Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
      e81e30c2
  2. 22 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Improve handling of invalid frames. · 09bcc1f7
      Timothy B. Terriberry authored
      The code was not checking for frame sizes smaller than 3 bytes, and the
       partition size checks might have failed if the input buffer was within
       16MB of the top of the heap.
      In addition, the reference count on the current frame buffer was not
       being decremented on error, so after a small number of errors, no new
       frame buffer could be found and it would run off the list of them.
      
      Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
      09bcc1f7
    • Timothy B. Terriberry's avatar
      Convert [4][4] matrices to [16] arrays. · 8f75ea6b
      Timothy B. Terriberry authored
      Most of the code that actually uses these matrices indexes them as
       if they were a single contiguous array, and coverity produces
       reports about the resulting accesses that overflow the static
       bounds of the first row.
      This is perfectly legal in C, but converting them to actual [16]
       arrays should eliminate the report, and removes a good deal of
       extraneous indexing and address operators from the code.
      
      Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
      8f75ea6b
  3. 21 Oct, 2010 4 commits
    • Frank Galligan's avatar
      Change altref times to preceding pts+1. · 45e64941
      Frank Galligan authored
      Change the pts of the altref frame to be as close as possible to the
      pts of the preceding frame and still be strictly increasing.
      
      Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7
      45e64941
    • John Koleszar's avatar
      Move firstpass motion map to stats packet · bb7dd5b1
      John Koleszar authored
      The first implementation of the firstpass motion map for motion
      compensated temporal filtering created a file, fpmotionmap.stt,
      in the current working directory. This was not safe for multiple
      encoder instances. This patch merges this data into the first pass
      stats packet interface, so that it is handled like the other
      (numerical) firstpass stats.
      
      The new stats packet is defined as follows:
          Numerical Stats (16 doubles) -- 128 bytes
          Motion Map                   -- 1 byte / Macroblock
          Padding                      -- to align packet to 8 bytes
      
      The fpmotionmap.stt file can still be generated for debugging
      purposes in the same way that the textual version of the stats
      are available (defining OUTPUT_FPF in firstpass.c)
      
      Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
      bb7dd5b1
    • Yunqing Wang's avatar
      Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm · 4cefb443
      Yunqing Wang authored
      Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
      4cefb443
    • Yunqing Wang's avatar
      Rewrite vp8_short_walsh4x4_sse2() · fc94ffce
      Yunqing Wang authored
      This rewriting reflects changes made in commit "Improve the
      accuracy of forward walsh-hadamard transform". Since this function
      is not called much, only a small encoder performance gain (~0.5% )
      is seen.
      
      Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
      fc94ffce
  4. 20 Oct, 2010 1 commit
  5. 18 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Add SSE2 subtract functions · 4db20765
      Yunqing Wang authored
      Instead of doing 8-bit data unpack and 16-bit subtraction, use
      psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
      sign information. This does not bring noticable gain since
      these functions are not called frequently.
      
      Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
      4db20765
    • Johann's avatar
      copy compiler warning fixes · ce1ce992
      Johann authored
      generic version got fixed, but not the arm version. fixes:
      vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
      vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
      argument 5 of 'fn_ptr->sdx3f' differ in signedness
      vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
      argument is of type 'int *'
      
      and another unsigned change to keep the files similar
      
      Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
      ce1ce992
  6. 15 Oct, 2010 2 commits
    • Johann's avatar
      remove dead code · 963bcd6c
      Johann authored
      vp8_diamond_search_sadx4 isn't used in arm because there is no
      corrosponding sdx4df as in x86. rather than keep it in sync with
      ../mcomp.c, delete it
      
      vp8_hex_search had the original, more readable/understandable code if`d
      out. it's also available in ../mcomp.c, so remove the dead copy
      
      Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
      963bcd6c
    • Yaowu Xu's avatar
      change to make use of more trellis quantization · 2e53e9e5
      Yaowu Xu authored
      when a subsequent frame is encoded as an alt reference frame, it is
      unlikely that any mb in current frame will be used as reference for
      future frames, so we can enable quantization optimization even when
      the RD constant is slightly rate-biased. The change has an overall
      benefit between 0.1% to 0.2% bit savings on the test sets based on
      vpxssim scores.
      
      Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
      2e53e9e5
  7. 14 Oct, 2010 3 commits
    • Yunqing Wang's avatar
      Fix one gcc compiler warning · 7804befb
      Yunqing Wang authored
      ../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’:
      ../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’
      
      Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f
      7804befb
    • Yunqing Wang's avatar
      Improve bounds checking in vp8_diamond_search_sadx4() · d6da7b8e
      Yunqing Wang authored
      In order to know if all 4/8 neighbor points are within the bounds,
      4 bounds checking are enough instead of checking 4 bounds for
      each points (16/32 checkings). This improvement reduces cost of
      vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
      performance gain (test options: 1 pass, good, speed=4).
      
      Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
      d6da7b8e
    • Fritz Koenig's avatar
      Fix compiler warning about vp8_fast_quantize_b_impl_ssse2. · 1dc0ca13
      Fritz Koenig authored
      Typo had function defined as _ssse2 and prototyped as _sse2.
      
      Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
      1dc0ca13
  8. 13 Oct, 2010 1 commit
  9. 12 Oct, 2010 2 commits
    • John Koleszar's avatar
      Centralize mb skip state calculation · 13685747
      John Koleszar authored
      This patch moves the scattered updates to the mb skip state
      (mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
      changes to the quantizer exposed a bug where if a macroblock
      could be coded as a skip but isn't, the encoder would run the
      loopfilter but the decoder wouldn't, causing a reference buffer
      mismatch.
      
      The loopfilter is controlled by a flag called dc_diff. The decoder
      looks at the number of decoded coefficients when setting this flag.
      The encoder sets this flag based on the skip state, since any
      skippable macroblock should be transmitted as a skip. The coefficient
      optimization pass (vp8_optimize_b()) could change the coefficients
      such that a block that was not a skip becomes one. The encoder was
      not updating the skip state in this situation for intra coded blocks.
      
      The underlying issue predates it, but this bug was recently triggered
      by enabling trellis quantization on the Y2 block in commit dcd29e36,
      and by changing the quantizer range control in commit 305be4e4.
      
      Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
      13685747
    • Timothy B. Terriberry's avatar
      Add const qualifiers to variance/SAD functions. · f4a85944
      Timothy B. Terriberry authored
      These functions should never change their input, and there's no
       reason not to declare that.
      This allows them to be passed static const data.
      
      Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
      f4a85944
  10. 11 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Move vp8_strict_quantize_b inside EXACT_QUANT #define. · 82c43398
      Timothy B. Terriberry authored
      There is currently no inexact version of this function, so do not
       even compile it without EXACT_QUANT.
      This will prevent someone from inadvertently trying to use it without
       the proper EXACT_QUANT setup.
      
      Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
      82c43398
    • Timothy B. Terriberry's avatar
      Remove INTRARDOPT #define and intra_rd_opt option. · dd08db93
      Timothy B. Terriberry authored
      This is just eliminating some cruft.
      Although a number of variables are declared only when INTRARDOPT
       is defined, they are used elsewhere without that protection, and
       no longer just for intra RDO.
      The intra_rd_opt flag was hard-coded to 1 and never checked.
      
      Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
      dd08db93
  11. 07 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Remove unused file in encoder · 7e6f7b57
      Yunqing Wang authored
      Remove vp8/encoder/x86/csystemdependent.c
      
      Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
      7e6f7b57
    • Scott LaVarnway's avatar
      Added vp8_fast_quantize_b_sse2 · d860f685
      Scott LaVarnway authored
      Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into
      quantize_sse2.asm and renamed.  Updated the assembly code to
      match the C version.
      
      Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
      d860f685
  12. 06 Oct, 2010 1 commit
    • Yaowu Xu's avatar
      optimize fast_quantizer c version · d338d14c
      Yaowu Xu authored
      As the zbin and rounding constants are normalized, rounding effectively
      does the zbinning, therefore the zbin operation can be removed. In
      addition, the memset on the two arrays are no longer necessary.
      
      Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
      d338d14c
  13. 05 Oct, 2010 1 commit
  14. 04 Oct, 2010 3 commits
    • Jan Kratochvil's avatar
      nasm: address labels 'rel label' vice 'wrt rip' · 5cdc3a4c
      Jan Kratochvil authored
      nasm does not support `label wrt rip', it requires `rel label'. It is
      still fully compatible with yasm.
      
      Provide nasm compatibility. No binary change by this patch with yasm on
      {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
      {x86_64,i686}-fedora13-linux-gnu have been checked as safe.
      
      Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
      5cdc3a4c
    • Jan Kratochvil's avatar
      nasm: match instruction length (movd/movq) to parameters · e114f699
      Jan Kratochvil authored
      nasm requires the instruction length (movd/movq) to match to its
      parameters. I find it more clear to really use 64bit instructions when
      we use 64bit registers in the assembly.
      
      Provide nasm compatibility. No binary change by this patch with yasm on
      {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
      {x86_64,i686}-fedora13-linux-gnu have been checked as safe.
      
      Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
      e114f699
    • Yaowu Xu's avatar
      fixed a typo that mis-used Y plane stride for UV blocks. · 49fdb7c4
      Yaowu Xu authored
      Raised by Lei Yang, the Y plane stride was used for UV blocks.
      This is clearly a typo. But as the comments in the code suggested
      that this port of code has not been used yet, so the typo should
      not have created any damage yet.
      
      Change-Id: Iea895edc17469a51c803a8cc6d0fce65a1a7fc2f
      49fdb7c4
  15. 02 Oct, 2010 2 commits
    • Paul Wilkins's avatar
      Tune effect of motion on KF/GF boost in two pass; · 788c0eb5
      Paul Wilkins authored
      This code adjust the impact of the amount and speed of motion
      on GF and KF boost.
      
      Sections with lots of slow motion will tend to have a
      somewhat bigger boost and sections with fast motion may
      have less.
      
      There is a knock on effect to the selection of the active
      quantizer range.
      
      This will likely require further tuning but helps with a couple
      of particularly bad edge cases.
      
      Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
      788c0eb5
    • Yaowu Xu's avatar
      enable trellis quantization for 2nd order blocks · dcd29e36
      Yaowu Xu authored
      Experimented with different value for Y2_RD_MULT ranging f[1, 32],
      without adapting the value to MB coding mode/frame type/Q value,
      4 works out best among all values, providing overall 0.1% coding
      gain on the test set.
      
      Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
      dcd29e36
  16. 01 Oct, 2010 2 commits
  17. 30 Sep, 2010 1 commit
    • Adrian Grange's avatar
      Changed defaults & range checking for AltRef params · 8ee7284d
      Adrian Grange authored
      Modified the range checking of parameters used in the
      AltRef temporal filter (arnr-max-frames, arnr-strength,
      arnr-type) and default values for each of them.
      
      Change-Id: Ib261028d501b9523f6e44cb4790cc52167b6e92b
      8ee7284d
  18. 29 Sep, 2010 5 commits
    • John Koleszar's avatar
      Rename mode_ref_lf_test_function · 7e5e3151
      John Koleszar authored
      This function graduated from being a test func to something that's on
      by default. Rename it and remove some spurious comments that confuse
      its status.
      
      Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
      7e5e3151
    • John Koleszar's avatar
      Fix loopfilter delta zero transitions · b9be7a46
      John Koleszar authored
      Loopfilter deltas are initialized to zero on keyframes in the decoder.
      The values then persist from the previous frame unless an update bit
      is set in the bitstream. This data is not included in the entropy
      data saved by the 'refresh entropy' bit in the bitstream, so it is
      effectively an additional contextual element beyond the 3 ref-frames
      and the entropy data.
      
      The encoder was treating this delta update bit as update-if-nonzero,
      meaning that the value would be refreshed even if it hadn't changed,
      and more significantly, if the correct value for the delta changed
      to zero, the update wouldn't be sent, and the decoder would preserve
      the last (presumably non-zero) value.
      
      This patch updates the encoder to send an update only if the value
      has changed from the previously transmitted value. It also forces the
      value to be transmitted in error resilient mode, to account for lost
      context in the event of lost frames.
      
      Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
      b9be7a46
    • Paul Wilkins's avatar
      Change to coefficient optimization rules. · 7288cdf7
      Paul Wilkins authored
      Allow coefficient optimization for good quality speed 0.
      
      Change-Id: Id0cb363df6823c6798671584fbba097916a7df2c
      7288cdf7
    • Adrian Grange's avatar
      Moved row-specific computation of MV bounds out of col loop · 0e7c45b3
      Adrian Grange authored
      Moved the bounds computation on vertical MV component out
      of the loop that processes MBs within a MB row.
      0e7c45b3
    • Paul Wilkins's avatar
      Control of active min quantizer for two pass. · ff3068d6
      Paul Wilkins authored
      Create  look up tables for controlling the active quantizer range.
      Some initial tuning to improve quality circa 0.5% on test set.
      Clean up of some stats output code
      
      Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
      ff3068d6
  19. 28 Sep, 2010 2 commits
    • Fritz Koenig's avatar
      Optimizations on the loopfilters. · 0964ef0e
      Fritz Koenig authored
      - Scheduling for Atom processors
      - Combining of macros to allow for better interleaving
      - Change from multiplies to adds for main filter
      - Use of movhps/movlps to fill xmm registers without
        shifting and orring
      
      Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
      0964ef0e
    • Adrian Grange's avatar
      Enabled AltRef motion map creation · 47fc8f26
      Adrian Grange authored
      Enabled the first-pass encode to output the
      map of macroblock coding modes required by
      the AltRef filter.
      47fc8f26