1. 24 Jul, 2013 1 commit
  2. 20 Jul, 2013 1 commit
  3. 16 Jul, 2013 1 commit
  4. 01 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Make get_coef_context() branchless. · 26b6318d
      Ronald S. Bultje authored
      This should significantly speedup cost_coeffs(). Basically what the
      patch does is to make the neighbour arrays padded by one item to
      prevent an eob check in get_coef_context(), then it populates each
      col/row scan and left/top edge coefficient with two times the same
      neighbour - this prevents a single/double context branch in
      get_coef_context(). Lastly, it populates neighbour arrays in pixel
      order (rather than scan order), so we don't have to dereference the
      scantable to get the correct neighbours.
      
      Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
      goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
      
      Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
      26b6318d
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
  5. 28 Jun, 2013 1 commit
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
  6. 24 Jun, 2013 1 commit
  7. 21 Jun, 2013 1 commit
  8. 14 Jun, 2013 1 commit
  9. 10 Jun, 2013 1 commit
    • Adrian Grange's avatar
      Implement intra-coded frames · eac344ef
      Adrian Grange authored
      Implements ability to signal and decode frames that are
      encoded using only intra coding modes. Only the decode
      side has been implemented here.
      
      Change-Id: I53ac6a8d90422cd08ba389e5236e15b45f9e93de
      eac344ef
  10. 31 May, 2013 1 commit
  11. 30 May, 2013 1 commit
    • Sami Pietila's avatar
      Replace scatter scan 32x32 with HW friendly scan. · 5700b4ea
      Sami Pietila authored
      The first 240 coeff positions (15 top-left blocks) are scanned in the
      same order as in scatter scan, after that the coeffs are scanned in
      "block bands", each band at a time, all coeffs in one band before
      moving on to the next band. This brings down the amount of 4x4 coeff
      blocks that need to be buffered while scanning, from 15 blocks to 8 blocks.
      
      Change-Id: I478a991d63c48bd5e64d36e59fed7a00c9a651ba
      5700b4ea
  12. 29 May, 2013 2 commits
    • Deb Mukherjee's avatar
      Balancing coef-tree to reduce bool decodes · b8b3f1a4
      Deb Mukherjee authored
      This patch changes the coefficient tree to move the EOB to below
      the ZERO node in order to save number of bool decodes.
      
      The advantages of moving EOB one step down as opposed to two steps down
      in the other parallel patch are: 1. The coef modeling based on
      the One-node becomes independent of the tree structure above it, and
      2. Fewer conext/counter increases are needed.
      
      The drawback is that the potential savings in bool decodes will be
      less, but assuming that 0s are much more predominant than 1's the
      potential savings is still likely to be substantial.
      
      Results on derf300: -0.237%
      
      Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
      b8b3f1a4
    • Sami Pietila's avatar
      Residual coding to cache energy class of tokens. · 88a4d4c5
      Sami Pietila authored
      Proposal for tuning the residual coding by changing how the context
      from previous tokens is calculated. Storing the energy class of previous
      tokens instead of the token itself eases the critical path of
      HW implementations.
      
      Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
      88a4d4c5
  13. 28 May, 2013 1 commit
  14. 24 May, 2013 1 commit
  15. 23 May, 2013 1 commit
  16. 22 May, 2013 1 commit
    • Deb Mukherjee's avatar
      Using 128 entry look up table for coef models · de4d682c
      Deb Mukherjee authored
      Reverts to using 128 bit LUT for the coef models rather than 48
      to ease hardware implementation.
      
      Also incorporates some cleanups including removing various
      hooks to support different lookup tables based on block_type and
      ref_type.
      
      Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
      de4d682c
  17. 21 May, 2013 2 commits
    • Deb Mukherjee's avatar
      Merging the model coef prob experiment · 7a645e4e
      Deb Mukherjee authored
      Merges the experiment.
      
      Change-Id: I4eb19af6de6df6aa3a96a2e82f231d47ed9b3ae9
      7a645e4e
    • Deb Mukherjee's avatar
      Refinements on modelcoef expt to reduce storage · 07443f15
      Deb Mukherjee authored
      Uses more aggrerssive interpolation to reduce storage for the
      model tables by almost more than half. Only 48 lists of probs are
      stored (as opposed to 128 before), corresponding to ONE_NODE
      probabilities of:
      1,
      3, 7, 11, ..., 115, 119,
      127, 135, ..., 247, 255.
      
      Besides, only 1 table is used as opposed to 2 before. So the overall
      memory needed for the tables is just 48 * 8 = 384 bytes.
      
      The table currently used is based on a new Pareto distribution with
      heavier tail than a generalized Gaussian - which improves results on
      derf by about 0.1% over a single table Generaized Gaussian.
      
      Results overall on derfraw300 is -0.14%.
      
      Change-Id: I19bd03559cbf5894a9f8594b8023dcc3e546f6bd
      07443f15
  18. 20 May, 2013 1 commit
    • Deb Mukherjee's avatar
      Updating the model coef experiment · 39a90bc8
      Deb Mukherjee authored
      Cleans up the experiment. Actually uses reduced counts for backward
      updates, and reduced number of probabilities in the context.
      
      No change in bitstream when the experiment is on.
      
      Between expt on and off:
      derfraw300 is down only -0.062% (which is better than when expts
      were run previously).
      
      Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
      39a90bc8
  19. 13 May, 2013 1 commit
    • Paul Wilkins's avatar
      Change to band calculation. · e5f71520
      Paul Wilkins authored
      Change band calculation back to simpler model based
      on the order in which coefficients are coded in scan order
      not the absolute coefficient positions.
      
      With the scatter scan experiment enabled the results were
      appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).
      
      Without the scatterscan experiment on the results were up derf as well.
      
      Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
      e5f71520
  20. 07 May, 2013 2 commits
  21. 29 Apr, 2013 1 commit
    • Deb Mukherjee's avatar
      Turning model based reverse update on for coefs · 040eeed9
      Deb Mukherjee authored
      Turns model based reverse updates on for coefficients in an
      effort to reduce the memory requirement for counters.
      
      With this patch the counters needed will be reduced by about
      75% since only 3 counts are needed instead of 12.
      
      The impact in performance is:
      derf300: -0.252%
      stdhd250: -0.046%
      
      However retraining should alleviate some of the drop in
      performance.
      
      Change-Id: I6f2b3e13f6d5520aa3400b0b228fb5e8b4a43caa
      040eeed9
  22. 22 Apr, 2013 3 commits
  23. 19 Apr, 2013 1 commit
  24. 11 Apr, 2013 2 commits
  25. 28 Mar, 2013 2 commits
    • Deb Mukherjee's avatar
      Framework changes in nzc to allow more flexibility · fe9b5143
      Deb Mukherjee authored
      The patch adds the flexibility to use standard EOB based coding
      on smaller block sizes and nzc based coding on larger blocksizes.
      The tx-sizes that use nzc based coding and those that use EOB based
      coding are controlled by a function get_nzc_used().
      By default, this function uses nzc based coding for 16x16 and 32x32
      transform blocks, which seem to bridge the performance gap
      substantially.
      
      All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.
      
      Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
      fe9b5143
    • Ronald S. Bultje's avatar
      Fix mix-up in pt token indexing. · 9eea9fa2
      Ronald S. Bultje authored
      This fixes uninitialized reads in the trellis, and probably makes the
      trellis do something again.
      
      Change-Id: Ifac8dae9aa77574bde0954a71d4571c5c556df3c
      9eea9fa2
  26. 27 Mar, 2013 1 commit
    • Ronald S. Bultje's avatar
      Scatter-based scantables. · 513157e0
      Ronald S. Bultje authored
      This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
      this under an experimental flag if wanted, just trying to get my patch
      queue in shape.
      
      Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
      513157e0
  27. 26 Mar, 2013 4 commits
    • Ronald S. Bultje's avatar
      Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs. · d9094d8f
      Ronald S. Bultje authored
      These are mostly just for experimental purposes. I saw small gains (in
      the 0.1% range) when playing with this on derf.
      
      Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
      d9094d8f
    • Ronald S. Bultje's avatar
      Redo banding for all transforms. · 3120dbdd
      Ronald S. Bultje authored
      Now that the first AC coefficient in both directions use the same DC
      as their context, there no longer is a purpose in letting both have
      their own band. Merging these two bands allows us to split bands for
      some of the very high-frequency AC bands.
      
      In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
      don't think the old banding made any sense at all (it merged the last
      coefficient of the first row/col in the same band as the first two of
      the second row/col), which was clearly an oversight from the band being
      applied in scan-order (rather than in their actual position). Now,
      coefficients at the same position will be in the same band, regardless
      what scan order is used. I think this makes most sense for the purpose
      of banding, which is basically "predict energy for this coefficient
      depending on the energy of context coefficients" (i.e. pt).
      
      After full re-training, together with previous patch, derf gains about
      1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.
      
      Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
      3120dbdd
    • Ronald S. Bultje's avatar
      Use above/left (instead of previous in scan-order) as token context. · 790fb132
      Ronald S. Bultje authored
      Pearson correlation for above or left is significantly higher than for
      previous-in-scan-order (absolute values depend on position in scan, but
      in general, we gain about 0.1-0.2 by using either above or left; using
      both basically just makes this even better). For eob branch skipping,
      we continue to use the previous token in scan order.
      
      This helps about 0.9% on derf after re-training on a limited data set.
      Full re-training and results on larger-resolution clips are pending.
      
      Note that this commit breaks trellis, so we can probably get further
      gains out of it by fixing trellis at some later point.
      
      Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
      790fb132
    • Deb Mukherjee's avatar
      Modeling default coef probs with distribution · fd18d5df
      Deb Mukherjee authored
      Replaces the default tables for single coefficient magnitudes with
      those obtained from an appropriate distribution. The EOB node
      is left unchanged. The model is represeted as a 256-size codebook
      where the index corresponds to the probability of the Zero or the
      One node. Two variations are implemented corresponding to whether
      the Zero node or the One-node is used as the peg. The main advantage
      is that the default prob tables will become considerably smaller and
      manageable. Besides there is substantially less risk of over-fitting
      for a training set.
      
      Various distributions are tried and the one that gives the best
      results is the family of Generalized Gaussian distributions with
      shape parameter 0.75. The results are within about 0.2% of fully
      trained tables for the Zero peg variant, and within 0.1% of the
      One peg variant.
      
      The forward updates are optionally (controlled by a macro)
      model-based, i.e. restricted to only convey probabilities from the
      codebook. Backward updates can also be optionally (controlled by
      another macro) model-based, but is turned off by default. Currently
      model-based forward updates work about the same as unconstrained
      updates, but there is a drop in performance with backward-updates
      being model based.
      
      The model based approach also allows the probabilities for the key
      frames to be adjusted from the defaults based on the base_qindex of
      the frame. Currently the adjustment function is a placeholder that
      adjusts the prob of EOB and Zero node from the nominal one at higher
      quality (lower qindex) or lower quality (higher qindex) ends of the
      range. The rest of the probabilities are then derived based on the
      model from the adjusted prob of zero.
      
      Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
      fd18d5df
  28. 11 Mar, 2013 1 commit
  29. 10 Mar, 2013 1 commit
    • John Koleszar's avatar
      Optimize vp9_tree_probs_from_distribution · bd84685f
      John Koleszar authored
      The previous implementation visited each node in the tree multiple times
      because it used each symbol's encoding to revisit the branches taken and
      increment its count. Instead, we can traverse the tree depth first and
      calculate the probabilities and branch counts as we walk back up. The
      complexity goes from somewhere between O(nlogn) and O(n^2) (depending on
      how balanced the tree is) to O(n).
      
      Only tested one clip (256kbps, CIF), saw 13% decoding perf improvement.
      
      Note that this optimization should port trivially to VP8 as well. In VP8,
      the decoder doesn't use this function, but it does routinely show up
      on the profile for realtime encoding.
      
      Change-Id: I4f2848e4f41dc9a7694f73f3e75034bce08d1b12
      bd84685f