Skip to content
Snippets Groups Projects
  1. Oct 25, 2010
    • Johann Koenig's avatar
      quiet compiler · 385865f8
      Johann Koenig authored
      clean up compiler warnings, man in the yellow hat warnings, and start to
      remove unused #includes
      
      Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6
      385865f8
    • Johann Koenig's avatar
      reuse common loopfilter code · 1376f061
      Johann Koenig authored
      there were four versions for the regular and
      macroblock loopfilters:
      horizontal [y|uv]
      vertical [y|uv]
      
      this moves all the common code into 2 functions:
      vp8_loop_filter_neon
      vp8_mbloop_filter_neon
      
      this provides no gain in performance. there's a bit
      of jitter, but it trends down ~0.25-0.5%. however,
      this is a huge gain maintenance. also, there is the
      potential to drop some stack usage in the macroblock
      loopfilter.
      
      Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
      1376f061
    • Timothy B. Terriberry's avatar
      Add runtime CPU detection support for ARM. · b71962fd
      Timothy B. Terriberry authored
      The primary goal is to allow a binary to be built which supports
       NEON, but can fall back to non-NEON routines, since some Android
       devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
       Tegra).
      The configure-generated flags HAVE_ARMV7, etc., are used to decide
       which versions of each function to build, and when
       CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
       at run time.
      In order for this to work, the CFLAGS must be set to something
       appropriate (e.g., without -mfpu=neon for ARMv7, and with
       appropriate -march and -mcpu for even earlier configurations), or
       the native C code will not be able to run.
      The ASFLAGS must remain set for the most advanced instruction set
       required at build time, since the ARM assembler will refuse to emit
       them otherwise.
      I have not attempted to make any changes to configure to do this
       automatically.
      Doing so will probably require the addition of new configure options.
      
      Many of the hooks for RTCD on ARM were already there, but a lot of
       the code had bit-rotted, and a good deal of the ARM-specific code
       is not integrated into the RTCD structs at all.
      I did not try to resolve the latter, merely to add the minimal amount
       of protection around them to allow RTCD to work.
      Those functions that were called based on an ifdef at the calling
       site were expanded to check the RTCD flags at that site, but they
       should be added to an RTCD struct somewhere in the future.
      The functions invoked with global function pointers still are, but
       these should be moved into an RTCD struct for thread safety (I
       believe every platform currently supported has atomic pointer
       stores, but this is not guaranteed).
      
      The encoder's boolhuff functions did not even have _c and armv7
       suffixes, and the correct version was resolved at link time.
      The token packing functions did have appropriate suffixes, but the
       version was selected with a define, with no associated RTCD struct.
      However, for both of these, the only armv7 instruction they actually
       used was rbit, and this was completely superfluous, so I reworked
       them to avoid it.
      The only non-ARMv4 instruction remaining in them is clz, which is
       ARMv5 (not even ARMv5TE is required).
      Considering that there are no ARM-specific configs which are not at
       least ARMv5TE, I did not try to detect these at runtime, and simply
       enable them for ARMv5 and above.
      
      Finally, the NEON register saving code was completely non-reentrant,
       since it saved the registers to a global, static variable.
      I moved the storage for this onto the stack.
      A single binary built with this code was tested on an ARM11 (ARMv6)
       and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
       and produced identical output, while using the correct accelerated
       functions on each.
      I did not test on any earlier processors.
      
      Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
      b71962fd
    • Johann Koenig's avatar
      isolate new temporal filtering code · e81e30c2
      Johann Koenig authored
      onyx_if is getting pretty big. split out the temporal code to make it
      easier to look at.
      
      Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
      e81e30c2
  2. Oct 22, 2010
    • John Koleszar's avatar
      Merge "Improve handling of invalid frames." · 3b9e72b2
      John Koleszar authored
      Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
      3b9e72b2
    • Timothy B. Terriberry's avatar
      Improve handling of invalid frames. · 09bcc1f7
      Timothy B. Terriberry authored
      The code was not checking for frame sizes smaller than 3 bytes, and the
       partition size checks might have failed if the input buffer was within
       16MB of the top of the heap.
      In addition, the reference count on the current frame buffer was not
       being decremented on error, so after a small number of errors, no new
       frame buffer could be found and it would run off the list of them.
      
      Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
      09bcc1f7
    • Timothy B. Terriberry's avatar
      Convert [4][4] matrices to [16] arrays. · 8f75ea6b
      Timothy B. Terriberry authored
      Most of the code that actually uses these matrices indexes them as
       if they were a single contiguous array, and coverity produces
       reports about the resulting accesses that overflow the static
       bounds of the first row.
      This is perfectly legal in C, but converting them to actual [16]
       arrays should eliminate the report, and removes a good deal of
       extraneous indexing and address operators from the code.
      
      Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
      8f75ea6b
  3. Oct 21, 2010
    • Frank Galligan's avatar
      Change altref times to preceding pts+1. · 45e64941
      Frank Galligan authored
      Change the pts of the altref frame to be as close as possible to the
      pts of the preceding frame and still be strictly increasing.
      
      Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7
      45e64941
    • John Koleszar's avatar
      1ee3ebcd
    • John Koleszar's avatar
      Move firstpass motion map to stats packet · bb7dd5b1
      John Koleszar authored
      The first implementation of the firstpass motion map for motion
      compensated temporal filtering created a file, fpmotionmap.stt,
      in the current working directory. This was not safe for multiple
      encoder instances. This patch merges this data into the first pass
      stats packet interface, so that it is handled like the other
      (numerical) firstpass stats.
      
      The new stats packet is defined as follows:
          Numerical Stats (16 doubles) -- 128 bytes
          Motion Map                   -- 1 byte / Macroblock
          Padding                      -- to align packet to 8 bytes
      
      The fpmotionmap.stt file can still be generated for debugging
      purposes in the same way that the textual version of the stats
      are available (defining OUTPUT_FPF in firstpass.c)
      
      Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
      bb7dd5b1
    • Yunqing Wang's avatar
      Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm · 4cefb443
      Yunqing Wang authored
      Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
      4cefb443
    • Yunqing Wang's avatar
      Merge "Rewrite vp8_short_walsh4x4_sse2()" · 31752f2f
      Yunqing Wang authored
      31752f2f
    • Yunqing Wang's avatar
      Merge "Add SSE2 subtract functions" · 09187475
      Yunqing Wang authored
      09187475
    • Yunqing Wang's avatar
      Rewrite vp8_short_walsh4x4_sse2() · fc94ffce
      Yunqing Wang authored
      This rewriting reflects changes made in commit "Improve the
      accuracy of forward walsh-hadamard transform". Since this function
      is not called much, only a small encoder performance gain (~0.5% )
      is seen.
      
      Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
      fc94ffce
  4. Oct 20, 2010
  5. Oct 19, 2010
  6. Oct 18, 2010
    • Yunqing Wang's avatar
      Add SSE2 subtract functions · 4db20765
      Yunqing Wang authored
      Instead of doing 8-bit data unpack and 16-bit subtraction, use
      psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
      sign information. This does not bring noticable gain since
      these functions are not called frequently.
      
      Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
      4db20765
    • Johann Koenig's avatar
      copy compiler warning fixes · ce1ce992
      Johann Koenig authored
      generic version got fixed, but not the arm version. fixes:
      vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
      vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
      argument 5 of 'fn_ptr->sdx3f' differ in signedness
      vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
      argument is of type 'int *'
      
      and another unsigned change to keep the files similar
      
      Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
      ce1ce992
  7. Oct 15, 2010
    • Johann Koenig's avatar
      remove dead code · 963bcd6c
      Johann Koenig authored
      vp8_diamond_search_sadx4 isn't used in arm because there is no
      corrosponding sdx4df as in x86. rather than keep it in sync with
      ../mcomp.c, delete it
      
      vp8_hex_search had the original, more readable/understandable code if`d
      out. it's also available in ../mcomp.c, so remove the dead copy
      
      Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
      963bcd6c
    • Yaowu Xu's avatar
      change to make use of more trellis quantization · 2e53e9e5
      Yaowu Xu authored
      when a subsequent frame is encoded as an alt reference frame, it is
      unlikely that any mb in current frame will be used as reference for
      future frames, so we can enable quantization optimization even when
      the RD constant is slightly rate-biased. The change has an overall
      benefit between 0.1% to 0.2% bit savings on the test sets based on
      vpxssim scores.
      
      Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
      2e53e9e5
  8. Oct 14, 2010
  9. Oct 13, 2010
  10. Oct 12, 2010
  11. Oct 11, 2010
Loading