1. 10 Nov, 2010 2 commits
    • Paul Wilkins's avatar
      Relax rate control for last few frames · 513f8e68
      Paul Wilkins authored
      VBR rate control can become very noisy for the last few frames.
      If there are a few bits to spare or a small overshoot then the
      target rate and hence quantizer may start to fluctuate wildly.
      
      This patch prevents further adjustment of the active Q limits for
      the last few frames.
      
      Patch also removes some redundant variables and makes one small bug fix.
      
      Change-Id: Ic167831bec79acc9f0d7e4698bcc4bb188840c45
      513f8e68
    • Paul Wilkins's avatar
      Tuning for the more exact quantizer. · 6adbe090
      Paul Wilkins authored
      Small changes to the default zero bin and rounding tables.
      Though the tables are currently the same for the Y1 and Y2 cases
      I have left them as separate tables in case we want to tune this later.
      
      There is now some adjustment of the zbin based on the prediction mode.
      Previously this was restricted to an adjustment for gf/arf 0,0 MV.
      
      The exact quantizer now marginal outperforms and is the default.
      
      The overall average gain is about 0.5%
      
      Change-Id: I5e4353f3d5326dde4e86823684b236a1e9ea7f47
      6adbe090
  2. 05 Nov, 2010 1 commit
    • John Koleszar's avatar
      improve average framerate calculation · f7e187d3
      John Koleszar authored
      Change Ice204e86 identified a problem with bitrate undershoot due to
      low precision in the timestamps passed to the library. This patch
      takes a different approach by calculating the duration of this frame
      and passing it to the library, rather than using a fixed duration
      and letting the library average it out with higher precision
      timestamps. This part of the fix only applies to vpxenc.
      
      This patch also attempts to fix the problem for generic applications
      that may have made the same mistake vpxenc did. Instead of
      calculating this frame's duration by the difference of this frame's
      and the last frame's start time, we use the end times instead. This
      allows the framerate calculation to scavenge "unclaimed" time from
      the last frame. For instance:
      
        start |  end  | calculated duration
        ======+=======+====================
          0ms    33ms   33ms
         33ms    66ms   33ms
         66ms    99ms   33ms
        100ms   133ms   34ms
      
      Change-Id: I92be4b3518e0bd530e97f90e69e75330a4c413fc
      f7e187d3
  3. 01 Nov, 2010 1 commit
    • Scott LaVarnway's avatar
      SSSE3 version of fast quantizer · ff4a71f4
      Scott LaVarnway authored
      (test clip: tulip)
      For good quality mode with speed=1, this gave the encoder
      a small (2 - 3%) performance boost.
      
      Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
      ff4a71f4
  4. 29 Oct, 2010 1 commit
    • Scott LaVarnway's avatar
      Finding first label · dcee88ea
      Scott LaVarnway authored
      Using tables for the label count and label offset.
      
      Change-Id: Iac3d5b292c37341a881be0af282f5cac3b3e01eb
      dcee88ea
  5. 28 Oct, 2010 4 commits
    • Yunqing Wang's avatar
      Save XMM registers in asm functions · 6614563b
      Yunqing Wang authored
      XMM6/7 are used in these functions, and need to be saved.
      
      Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
      6614563b
    • Yunqing Wang's avatar
      Fix full-search SAD function crash in Visual Studio · 7e3a1e73
      Yunqing Wang authored
      Unlike GCC, Visual Studio compiler doesn't allocate SAD output
      array 16-byte aligned, which causes crash in visual studio.
      
      Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
      7e3a1e73
    • Timothy B. Terriberry's avatar
      Eliminate more warnings. · 97b766a4
      Timothy B. Terriberry authored
      This eliminates a large set of warnings exposed by the Mozilla build
       system (Use of C++ comments in ISO C90 source, commas at the end of
       enum lists, a couple incomplete initializers, and signed/unsigned
       comparisons).
      It also eliminates many (but not all) of the warnings expose by newer
       GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
       without checking the return values).
      There are a few spurious warnings left on my system:
      
      ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
       uninitialized in this function
      gcc seems to be unable to figure out that the value shortcut doesn't
       change between the two if blocks that test it here.
      
      ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
       expression >= 0 is always true
      ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
       expression >= 0 is always true
      This is true, so far as it goes, but it's comparing against an enum,
       and the C standard does not mandate that enums be unsigned, so the
       checks can't be removed.
      
      Change-Id: Iead6cd561a2afaa3d801fd63f1d8d58953da7426
      97b766a4
    • Timothy B. Terriberry's avatar
      Eliminate more warnings. · c4d7e5e6
      Timothy B. Terriberry authored
      This eliminates a large set of warnings exposed by the Mozilla build
       system (Use of C++ comments in ISO C90 source, commas at the end of
       enum lists, a couple incomplete initializers, and signed/unsigned
       comparisons).
      It also eliminates many (but not all) of the warnings expose by newer
       GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
       without checking the return values).
      There are a few spurious warnings left on my system:
      
      ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
       uninitialized in this function
      gcc seems to be unable to figure out that the value shortcut doesn't
       change between the two if blocks that test it here.
      
      ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
       expression >= 0 is always true
      ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
       expression >= 0 is always true
      This is true, so far as it goes, but it's comparing against an enum, and the C
       standard does not mandate that enums be unsigned, so the checks can't be
       removed.
      
      Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
      c4d7e5e6
  6. 27 Oct, 2010 3 commits
    • Yunqing Wang's avatar
      Full search SAD function optimization in SSE4.1 · 71ecb5d7
      Yunqing Wang authored
      Use mpsadbw, and calculate 8 sad at once. Function list:
      vp8_sad16x16x8_sse4
      vp8_sad16x8x8_sse4
      vp8_sad8x16x8_sse4
      vp8_sad8x8x8_sse4
      vp8_sad4x4x8_sse4
      
      (test clip: tulip)
      For best quality mode, this gave encoder a 5% performance boost.
      For good quality mode with speed=1, this gave encoder a 3%
      performance boost.
      
      Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
      71ecb5d7
    • John Koleszar's avatar
      Fix half-pixel variance RTCD functions · a0ae3682
      John Koleszar authored
      This patch fixes the system dependent entries for the half-pixel
      variance functions in both the RTCD and non-RTCD cases:
      
        - The generic C versions of these functions are now correct.
          Before all three cases called the hv code.
      
        - Wire up the ARM functions in RTCD mode
      
        - Created stubs for x86 to call the optimized subpixel functions
          with the correct parameters, rather than falling back to C
          code.
      
      Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
      a0ae3682
    • John Koleszar's avatar
      Add half-pixel variance RTCD functions · 209d82ad
      John Koleszar authored
      NEON has optimized 16x16 half-pixel variance functions, but they
      were not part of the RTCD framework. Add these functions to RTCD,
      so that other platforms can make use of this optimization in the
      future and special-case ARM code can be removed.
      
      A number of functions were taking two variance functions as
      parameters. These functions were changed to take a single
      parameter, a pointer to a struct containing all the variance
      functions for that block size. This provides additional flexibility
      for calling additional variance functions (the half-pixel special
      case, for example) and by initializing the table for all block sizes,
      we don't have to construct this function pointer table for each
      macroblock.
      
      Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
      209d82ad
  7. 26 Oct, 2010 4 commits
    • John Koleszar's avatar
      make vp8_recon16x16mb{,y} RTCD functions · d6c67f02
      John Koleszar authored
      ARM NEON has a platform specific version of vp8_recon16x16mb, though
      it's just a stub to extract the various parameters from the
      MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
      that function's prototype directly will be a better long term solution,
      but it's quite an invasive change.
      
      Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
      d6c67f02
    • John Koleszar's avatar
      make arm hex search the generic implementation · 96cf6588
      John Koleszar authored
      The ARM version of vp8_hex_search() is a faster implementation
      of the same algorithm. Since it doesn't use any ARM specific
      code, it can be made the default implementation. This removes
      a linking error.
      
      Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
      96cf6588
    • John Koleszar's avatar
      arm: remove duplicate functions · d330a587
      John Koleszar authored
      These functions were true duplicates of functions present in the
      generic code. This fixes some of the link errors when building
      with --enable-shared --enable-pic.
      
      Change-Id: Idff26599d510d954e439207883607ad6b74df20c
      d330a587
    • John Koleszar's avatar
      add missing GET_GOT/RESTORE_GOT pairs · b523dd51
      John Koleszar authored
      These functions made global references but did not set up the GOT,
      causing compilation failures in PIC mode.
      
      Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
      b523dd51
  8. 25 Oct, 2010 4 commits
    • Martin Ettl's avatar
      Fix leaked file descriptor with ENTROPY_STATS · c3fd2c4e
      Martin Ettl authored
      cppcheck found a leaked file descriptor in the debugging code
      enabled by defining ENTROPY_STATS. Fixes issue #60.
      
      Change-Id: I0c1d0669cb94d44fed77860f97b82763be06b7cb
      c3fd2c4e
    • Johann's avatar
      quiet compiler · 385865f8
      Johann authored
      clean up compiler warnings, man in the yellow hat warnings, and start to
      remove unused #includes
      
      Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6
      385865f8
    • Timothy B. Terriberry's avatar
      Add runtime CPU detection support for ARM. · b71962fd
      Timothy B. Terriberry authored
      The primary goal is to allow a binary to be built which supports
       NEON, but can fall back to non-NEON routines, since some Android
       devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
       Tegra).
      The configure-generated flags HAVE_ARMV7, etc., are used to decide
       which versions of each function to build, and when
       CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
       at run time.
      In order for this to work, the CFLAGS must be set to something
       appropriate (e.g., without -mfpu=neon for ARMv7, and with
       appropriate -march and -mcpu for even earlier configurations), or
       the native C code will not be able to run.
      The ASFLAGS must remain set for the most advanced instruction set
       required at build time, since the ARM assembler will refuse to emit
       them otherwise.
      I have not attempted to make any changes to configure to do this
       automatically.
      Doing so will probably require the addition of new configure options.
      
      Many of the hooks for RTCD on ARM were already there, but a lot of
       the code had bit-rotted, and a good deal of the ARM-specific code
       is not integrated into the RTCD structs at all.
      I did not try to resolve the latter, merely to add the minimal amount
       of protection around them to allow RTCD to work.
      Those functions that were called based on an ifdef at the calling
       site were expanded to check the RTCD flags at that site, but they
       should be added to an RTCD struct somewhere in the future.
      The functions invoked with global function pointers still are, but
       these should be moved into an RTCD struct for thread safety (I
       believe every platform currently supported has atomic pointer
       stores, but this is not guaranteed).
      
      The encoder's boolhuff functions did not even have _c and armv7
       suffixes, and the correct version was resolved at link time.
      The token packing functions did have appropriate suffixes, but the
       version was selected with a define, with no associated RTCD struct.
      However, for both of these, the only armv7 instruction they actually
       used was rbit, and this was completely superfluous, so I reworked
       them to avoid it.
      The only non-ARMv4 instruction remaining in them is clz, which is
       ARMv5 (not even ARMv5TE is required).
      Considering that there are no ARM-specific configs which are not at
       least ARMv5TE, I did not try to detect these at runtime, and simply
       enable them for ARMv5 and above.
      
      Finally, the NEON register saving code was completely non-reentrant,
       since it saved the registers to a global, static variable.
      I moved the storage for this onto the stack.
      A single binary built with this code was tested on an ARM11 (ARMv6)
       and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
       and produced identical output, while using the correct accelerated
       functions on each.
      I did not test on any earlier processors.
      
      Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
      b71962fd
    • Johann's avatar
      isolate new temporal filtering code · e81e30c2
      Johann authored
      onyx_if is getting pretty big. split out the temporal code to make it
      easier to look at.
      
      Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
      e81e30c2
  9. 22 Oct, 2010 1 commit
    • Timothy B. Terriberry's avatar
      Convert [4][4] matrices to [16] arrays. · 8f75ea6b
      Timothy B. Terriberry authored
      Most of the code that actually uses these matrices indexes them as
       if they were a single contiguous array, and coverity produces
       reports about the resulting accesses that overflow the static
       bounds of the first row.
      This is perfectly legal in C, but converting them to actual [16]
       arrays should eliminate the report, and removes a good deal of
       extraneous indexing and address operators from the code.
      
      Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
      8f75ea6b
  10. 21 Oct, 2010 3 commits
    • John Koleszar's avatar
      Move firstpass motion map to stats packet · bb7dd5b1
      John Koleszar authored
      The first implementation of the firstpass motion map for motion
      compensated temporal filtering created a file, fpmotionmap.stt,
      in the current working directory. This was not safe for multiple
      encoder instances. This patch merges this data into the first pass
      stats packet interface, so that it is handled like the other
      (numerical) firstpass stats.
      
      The new stats packet is defined as follows:
          Numerical Stats (16 doubles) -- 128 bytes
          Motion Map                   -- 1 byte / Macroblock
          Padding                      -- to align packet to 8 bytes
      
      The fpmotionmap.stt file can still be generated for debugging
      purposes in the same way that the textual version of the stats
      are available (defining OUTPUT_FPF in firstpass.c)
      
      Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
      bb7dd5b1
    • Yunqing Wang's avatar
      Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm · 4cefb443
      Yunqing Wang authored
      Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
      4cefb443
    • Yunqing Wang's avatar
      Rewrite vp8_short_walsh4x4_sse2() · fc94ffce
      Yunqing Wang authored
      This rewriting reflects changes made in commit "Improve the
      accuracy of forward walsh-hadamard transform". Since this function
      is not called much, only a small encoder performance gain (~0.5% )
      is seen.
      
      Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
      fc94ffce
  11. 18 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Add SSE2 subtract functions · 4db20765
      Yunqing Wang authored
      Instead of doing 8-bit data unpack and 16-bit subtraction, use
      psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
      sign information. This does not bring noticable gain since
      these functions are not called frequently.
      
      Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
      4db20765
    • Johann's avatar
      copy compiler warning fixes · ce1ce992
      Johann authored
      generic version got fixed, but not the arm version. fixes:
      vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
      vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
      argument 5 of 'fn_ptr->sdx3f' differ in signedness
      vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
      argument is of type 'int *'
      
      and another unsigned change to keep the files similar
      
      Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
      ce1ce992
  12. 15 Oct, 2010 2 commits
    • Johann's avatar
      remove dead code · 963bcd6c
      Johann authored
      vp8_diamond_search_sadx4 isn't used in arm because there is no
      corrosponding sdx4df as in x86. rather than keep it in sync with
      ../mcomp.c, delete it
      
      vp8_hex_search had the original, more readable/understandable code if`d
      out. it's also available in ../mcomp.c, so remove the dead copy
      
      Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
      963bcd6c
    • Yaowu Xu's avatar
      change to make use of more trellis quantization · 2e53e9e5
      Yaowu Xu authored
      when a subsequent frame is encoded as an alt reference frame, it is
      unlikely that any mb in current frame will be used as reference for
      future frames, so we can enable quantization optimization even when
      the RD constant is slightly rate-biased. The change has an overall
      benefit between 0.1% to 0.2% bit savings on the test sets based on
      vpxssim scores.
      
      Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
      2e53e9e5
  13. 14 Oct, 2010 3 commits
  14. 13 Oct, 2010 1 commit
  15. 12 Oct, 2010 2 commits
    • John Koleszar's avatar
      Centralize mb skip state calculation · 13685747
      John Koleszar authored
      This patch moves the scattered updates to the mb skip state
      (mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
      changes to the quantizer exposed a bug where if a macroblock
      could be coded as a skip but isn't, the encoder would run the
      loopfilter but the decoder wouldn't, causing a reference buffer
      mismatch.
      
      The loopfilter is controlled by a flag called dc_diff. The decoder
      looks at the number of decoded coefficients when setting this flag.
      The encoder sets this flag based on the skip state, since any
      skippable macroblock should be transmitted as a skip. The coefficient
      optimization pass (vp8_optimize_b()) could change the coefficients
      such that a block that was not a skip becomes one. The encoder was
      not updating the skip state in this situation for intra coded blocks.
      
      The underlying issue predates it, but this bug was recently triggered
      by enabling trellis quantization on the Y2 block in commit dcd29e36,
      and by changing the quantizer range control in commit 305be4e4.
      
      Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
      13685747
    • Timothy B. Terriberry's avatar
      Add const qualifiers to variance/SAD functions. · f4a85944
      Timothy B. Terriberry authored
      These functions should never change their input, and there's no
       reason not to declare that.
      This allows them to be passed static const data.
      
      Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
      f4a85944
  16. 11 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Move vp8_strict_quantize_b inside EXACT_QUANT #define. · 82c43398
      Timothy B. Terriberry authored
      There is currently no inexact version of this function, so do not
       even compile it without EXACT_QUANT.
      This will prevent someone from inadvertently trying to use it without
       the proper EXACT_QUANT setup.
      
      Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
      82c43398
    • Timothy B. Terriberry's avatar
      Remove INTRARDOPT #define and intra_rd_opt option. · dd08db93
      Timothy B. Terriberry authored
      This is just eliminating some cruft.
      Although a number of variables are declared only when INTRARDOPT
       is defined, they are used elsewhere without that protection, and
       no longer just for intra RDO.
      The intra_rd_opt flag was hard-coded to 1 and never checked.
      
      Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
      dd08db93
  17. 07 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Remove unused file in encoder · 7e6f7b57
      Yunqing Wang authored
      Remove vp8/encoder/x86/csystemdependent.c
      
      Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
      7e6f7b57
    • Scott LaVarnway's avatar
      Added vp8_fast_quantize_b_sse2 · d860f685
      Scott LaVarnway authored
      Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into
      quantize_sse2.asm and renamed.  Updated the assembly code to
      match the C version.
      
      Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
      d860f685
  18. 06 Oct, 2010 1 commit
    • Yaowu Xu's avatar
      optimize fast_quantizer c version · d338d14c
      Yaowu Xu authored
      As the zbin and rounding constants are normalized, rounding effectively
      does the zbinning, therefore the zbin operation can be removed. In
      addition, the memset on the two arrays are no longer necessary.
      
      Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
      d338d14c
  19. 04 Oct, 2010 1 commit
    • Jan Kratochvil's avatar
      nasm: address labels 'rel label' vice 'wrt rip' · 5cdc3a4c
      Jan Kratochvil authored
      nasm does not support `label wrt rip', it requires `rel label'. It is
      still fully compatible with yasm.
      
      Provide nasm compatibility. No binary change by this patch with yasm on
      {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
      {x86_64,i686}-fedora13-linux-gnu have been checked as safe.
      
      Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
      5cdc3a4c