1. 12 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Add simple version of activity masking. · 8d0f7a01
      Timothy B. Terriberry authored
      This uses MB variance to change the RDO weight for mode decision
       and quantization.
      Activity is normalized against the average for the frame, which is
       currently tracked using feed-forward statistics.
      This could also be used to adjust the quantizer for the entire
       frame, but that requires more extensive rate control changes.
      This does not yet attempt to adapt the quantizer within the frame,
       but the signaling cost means that will likely only be useful at
       very high rates.
      
      Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
      8d0f7a01
    • Timothy B. Terriberry's avatar
      Add const qualifiers to variance/SAD functions. · f4a85944
      Timothy B. Terriberry authored
      These functions should never change their input, and there's no
       reason not to declare that.
      This allows them to be passed static const data.
      
      Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
      f4a85944
  2. 11 Oct, 2010 2 commits
    • Timothy B. Terriberry's avatar
      Move vp8_strict_quantize_b inside EXACT_QUANT #define. · 82c43398
      Timothy B. Terriberry authored
      There is currently no inexact version of this function, so do not
       even compile it without EXACT_QUANT.
      This will prevent someone from inadvertently trying to use it without
       the proper EXACT_QUANT setup.
      
      Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
      82c43398
    • Timothy B. Terriberry's avatar
      Remove INTRARDOPT #define and intra_rd_opt option. · dd08db93
      Timothy B. Terriberry authored
      This is just eliminating some cruft.
      Although a number of variables are declared only when INTRARDOPT
       is defined, they are used elsewhere without that protection, and
       no longer just for intra RDO.
      The intra_rd_opt flag was hard-coded to 1 and never checked.
      
      Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
      dd08db93
  3. 07 Oct, 2010 1 commit
  4. 06 Oct, 2010 1 commit
    • Yaowu Xu's avatar
      optimize fast_quantizer c version · d338d14c
      Yaowu Xu authored
      As the zbin and rounding constants are normalized, rounding effectively
      does the zbinning, therefore the zbin operation can be removed. In
      addition, the memset on the two arrays are no longer necessary.
      
      Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
      d338d14c
  5. 05 Oct, 2010 1 commit
  6. 04 Oct, 2010 2 commits
    • Jan Kratochvil's avatar
      nasm: address labels 'rel label' vice 'wrt rip' · 5cdc3a4c
      Jan Kratochvil authored
      nasm does not support `label wrt rip', it requires `rel label'. It is
      still fully compatible with yasm.
      
      Provide nasm compatibility. No binary change by this patch with yasm on
      {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
      {x86_64,i686}-fedora13-linux-gnu have been checked as safe.
      
      Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
      5cdc3a4c
    • Jan Kratochvil's avatar
      nasm: match instruction length (movd/movq) to parameters · e114f699
      Jan Kratochvil authored
      nasm requires the instruction length (movd/movq) to match to its
      parameters. I find it more clear to really use 64bit instructions when
      we use 64bit registers in the assembly.
      
      Provide nasm compatibility. No binary change by this patch with yasm on
      {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
      {x86_64,i686}-fedora13-linux-gnu have been checked as safe.
      
      Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
      e114f699
  7. 02 Oct, 2010 2 commits
    • Paul Wilkins's avatar
      Tune effect of motion on KF/GF boost in two pass; · 788c0eb5
      Paul Wilkins authored
      This code adjust the impact of the amount and speed of motion
      on GF and KF boost.
      
      Sections with lots of slow motion will tend to have a
      somewhat bigger boost and sections with fast motion may
      have less.
      
      There is a knock on effect to the selection of the active
      quantizer range.
      
      This will likely require further tuning but helps with a couple
      of particularly bad edge cases.
      
      Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
      788c0eb5
    • Yaowu Xu's avatar
      enable trellis quantization for 2nd order blocks · dcd29e36
      Yaowu Xu authored
      Experimented with different value for Y2_RD_MULT ranging f[1, 32],
      without adapting the value to MB coding mode/frame type/Q value,
      4 works out best among all values, providing overall 0.1% coding
      gain on the test set.
      
      Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
      dcd29e36
  8. 01 Oct, 2010 2 commits
  9. 30 Sep, 2010 1 commit
    • Adrian Grange's avatar
      Changed defaults & range checking for AltRef params · 8ee7284d
      Adrian Grange authored
      Modified the range checking of parameters used in the
      AltRef temporal filter (arnr-max-frames, arnr-strength,
      arnr-type) and default values for each of them.
      
      Change-Id: Ib261028d501b9523f6e44cb4790cc52167b6e92b
      8ee7284d
  10. 29 Sep, 2010 5 commits
    • John Koleszar's avatar
      Rename mode_ref_lf_test_function · 7e5e3151
      John Koleszar authored
      This function graduated from being a test func to something that's on
      by default. Rename it and remove some spurious comments that confuse
      its status.
      
      Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
      7e5e3151
    • John Koleszar's avatar
      Fix loopfilter delta zero transitions · b9be7a46
      John Koleszar authored
      Loopfilter deltas are initialized to zero on keyframes in the decoder.
      The values then persist from the previous frame unless an update bit
      is set in the bitstream. This data is not included in the entropy
      data saved by the 'refresh entropy' bit in the bitstream, so it is
      effectively an additional contextual element beyond the 3 ref-frames
      and the entropy data.
      
      The encoder was treating this delta update bit as update-if-nonzero,
      meaning that the value would be refreshed even if it hadn't changed,
      and more significantly, if the correct value for the delta changed
      to zero, the update wouldn't be sent, and the decoder would preserve
      the last (presumably non-zero) value.
      
      This patch updates the encoder to send an update only if the value
      has changed from the previously transmitted value. It also forces the
      value to be transmitted in error resilient mode, to account for lost
      context in the event of lost frames.
      
      Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
      b9be7a46
    • Paul Wilkins's avatar
      Change to coefficient optimization rules. · 7288cdf7
      Paul Wilkins authored
      Allow coefficient optimization for good quality speed 0.
      
      Change-Id: Id0cb363df6823c6798671584fbba097916a7df2c
      7288cdf7
    • Adrian Grange's avatar
      Moved row-specific computation of MV bounds out of col loop · 0e7c45b3
      Adrian Grange authored
      Moved the bounds computation on vertical MV component out
      of the loop that processes MBs within a MB row.
      0e7c45b3
    • Paul Wilkins's avatar
      Control of active min quantizer for two pass. · ff3068d6
      Paul Wilkins authored
      Create  look up tables for controlling the active quantizer range.
      Some initial tuning to improve quality circa 0.5% on test set.
      Clean up of some stats output code
      
      Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
      ff3068d6
  11. 28 Sep, 2010 4 commits
    • Fritz Koenig's avatar
      Optimizations on the loopfilters. · 0964ef0e
      Fritz Koenig authored
      - Scheduling for Atom processors
      - Combining of macros to allow for better interleaving
      - Change from multiplies to adds for main filter
      - Use of movhps/movlps to fill xmm registers without
        shifting and orring
      
      Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
      0964ef0e
    • Adrian Grange's avatar
      Enabled AltRef motion map creation · 47fc8f26
      Adrian Grange authored
      Enabled the first-pass encode to output the
      map of macroblock coding modes required by
      the AltRef filter.
      47fc8f26
    • Adrian Grange's avatar
      Made AltRef filter adaptive & added motion compensation · 1b2f8308
      Adrian Grange authored
      Modified AltRef temporal filter to adapt filter length based
      on macroblock coding modes selected during first-pass
      encode.
      
      Also added sub-pixel motion compensation to the AltRef
      filter.
      1b2f8308
    • Timothy B. Terriberry's avatar
      Add 4-tap version of 2nd-pass ARMv6 MC filter. · 18dc92fd
      Timothy B. Terriberry authored
      The existing code applied a 6-tap filter with 0's on either end.
      We're already paying the branch penalty to avoid computing the two
       extra columns needed as input to this filter.
      We might as well save time computing the filter as well.
      This reduces the inner loop from 21 instructions to 16, the number
       of loads per iteration from 4 to 1, and the number of multiplies
       from 7 to 4.
      The gain in overall decoding performance, however, is small (less
       than 1%).
      
      This change also means we now valgrind clean on ARMv6, which is
       its real purpose.
      The errors reported here were valgrind's fault (it does not detect
       that 0 times an uninitialized value is initialized), but Julian
       Seward says it would slow down valgrind considerably to make such
       checks.
      Speeding up libvpx rather, even by a small amount, seems a much
       better idea if only to enable proper valgrind checking of the
       rest of the codec.
      
      Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
      18dc92fd
  12. 27 Sep, 2010 2 commits
  13. 24 Sep, 2010 4 commits
    • Timothy B. Terriberry's avatar
      Fix valgrind errors in vp8_sixtap_predict8x4_armv6(). · e2795e99
      Timothy B. Terriberry authored
      This function was accessing values below the stack pointer, which
       can be corrupted by signal delivery at any time.
      
      Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
      e2795e99
    • Johann's avatar
      combine max values and compare once · f30e8dd7
      Johann authored
      previous implementation compared each set of values to limit and then
      &'d them together, requiring a compare and & for each value.
      
      this does the accumulation first, requiring only one compare
      
      Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
      f30e8dd7
    • John Koleszar's avatar
      disable compilation of debugging code · 8ca779ab
      John Koleszar authored
      This patch avoids compiling some debugging code in onyx_if.c. The most
      significant fix is to avoid generating code for vp8_write_yuv_frame,
      which is never called. Some other code was removed by the dead code
      elimination performed by the compiler, and this patch does it with the
      preprocessor instead. There are advantages both ways.
      
      Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
      8ca779ab
    • John Koleszar's avatar
      move reconintra_mt to decoder (for now) · 48e76ff4
      John Koleszar authored
      reconintra_mt.c is only required for building the decoder right now.
      It could definitely be used for the encoder in the future, but it
      currently depends on decoder only data structures. (onyxd_int.h,
      VP8D_COMP, etc). Move it from common/ to decoder/ until the
      necessary changes to the common multithread code are complete.
      
      This patch is needed to build with --disable-vp8-decoder.
      
      Change-Id: I568c52221a2b309234d269675cba97131ce35c86
      48e76ff4
  14. 23 Sep, 2010 2 commits
    • John Koleszar's avatar
      Add getter functions for the interface data symbols · fa7a55bb
      John Koleszar authored
      Having these symbols be available as functions rather than data is
      occasionally more convenient. Implemented this way rather than a
      get-codec-by-id style to avoid creating a link-time dependency
      between the encoder and the decoder.
      
      Fixes issue #169
      
      Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
      fa7a55bb
    • Yunqing Wang's avatar
      Adjust multi-thread sync ranges according to image sizes · 8db5da29
      Yunqing Wang authored
      In multi-threaded decoder, set different sync ranges for
      different video resolutions.
      
      Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
      8db5da29
  15. 22 Sep, 2010 1 commit
    • Johann's avatar
      Remove dead code · 7fed3832
      Johann authored
      The new loopfilter was originally introduced as an experimental change.
      It's permanent now.
      
      Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
      7fed3832
  16. 21 Sep, 2010 2 commits
    • John Koleszar's avatar
      unset execute bit on c source · cdd20666
      John Koleszar authored
      Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
      cdd20666
    • John Koleszar's avatar
      Don't reset mb clamping state during splitmv decoding · 4d391e8e
      John Koleszar authored
      The MV decoding changes in c5fb0eb8 introduced a bug where the
      macroblock clamping state was reset for each partition, so if an
      earlier partition needed clamping but a subsequent one didn't,
      the MB wouldn't receive clamping. Instead, the state is only
      set during splitmv decoding, never cleared.
      
      Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
      4d391e8e
  17. 20 Sep, 2010 3 commits
    • Fritz Koenig's avatar
      Use movq instead of movdqu. · b7dc9398
      Fritz Koenig authored
      Movdqu is more expensive (throughput, uops) than movq.  Minimal
      impact for newer big cores, but ~2.25% gain on Atom.
      
      Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
      b7dc9398
    • Fritz Koenig's avatar
      Better choice of instruction filter mask comparision. · 8eae7fe7
      Fritz Koenig authored
      Use pmaxub instead of a combination of psubusb/por to
      determine if any comparisons go over the limit.
      
      Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
      8eae7fe7
    • Guillermo Ballester Valor's avatar
      Add high limit check for unsigned parameters · 23690686
      Guillermo Ballester Valor authored
      The patch related with issue #55 (5a72620d) fixed some warnings, but the
      fix was not optimal. It actually was a trick to confuse compiler rather
      than a fix.
      
      This patch fixes it by creating a new macro used when needed just a high
      limit check for an unsigned.
      
      Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
      23690686
  18. 17 Sep, 2010 2 commits
    • Johann's avatar
      reorder data to use wider instructions · 022323bf
      Johann authored
      the previous commit laid the groundwork by doing two sets of idcts
      together. this moved that further by grouping the interesting data
      (q[0], q+16[0]) together to allow using wider instructions. also
      managed to drop a few instructions by recognizing that the constant
      for sinpi8sqrt2 could be downshifted all the time which avoided a
      dowshift as well as workarounds for a function which only accepted
      signed data
      
      looks like a modest gain for performance: at qcif, went from ~180
      fps to ~183
      Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
      022323bf
    • Yunqing Wang's avatar
      Restructure multi-threaded decoder · f857a850
      Yunqing Wang authored
      On each MB, loopfiltering is done right after MB decoding. This
      combines two loops in multi-threaded code into one, which reduces
      number of synchronizations to half.
      
      The above-row/left-col data are saved in temp buffers for
      next-row/next MB decoding.
      
      Tests on 4-core gLucid machine showed 10% decoder performance
      gain with threads=4 (tulip clip). Testing on other platforms
      isn't done yet.
      
      Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
      f857a850
  19. 16 Sep, 2010 1 commit
    • John Koleszar's avatar
      cleanup: remove unused xprintf · 9100073e
      John Koleszar authored
      These files aren't currently used, and we can get them back if we
      need them.
      
      Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
      9100073e