1. 29 Jul, 2013 2 commits
  2. 25 Jul, 2013 2 commits
    • Dmitry Kovalev's avatar
      General cleanups. · 7131cb0e
      Dmitry Kovalev authored
      Removing unused constants, macros, and function declarations. Using
      ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
      #include from *.h to *.c. Merging for loops for motion vectors.
      
      Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
      7131cb0e
    • Dmitry Kovalev's avatar
      Removing duplicated code for merging two probabilities. · 40358dc4
      Dmitry Kovalev authored
      Adding common merge_probs and merge_probs2 functions. Changing ints to
      usigned ints in some places.
      
      Change-Id: Icf088ffdea7cf5b95284a128916409bdd53506b0
      40358dc4
  3. 24 Jul, 2013 2 commits
  4. 20 Jul, 2013 1 commit
  5. 16 Jul, 2013 1 commit
  6. 01 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Make get_coef_context() branchless. · 26b6318d
      Ronald S. Bultje authored
      This should significantly speedup cost_coeffs(). Basically what the
      patch does is to make the neighbour arrays padded by one item to
      prevent an eob check in get_coef_context(), then it populates each
      col/row scan and left/top edge coefficient with two times the same
      neighbour - this prevents a single/double context branch in
      get_coef_context(). Lastly, it populates neighbour arrays in pixel
      order (rather than scan order), so we don't have to dereference the
      scantable to get the correct neighbours.
      
      Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
      goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
      
      Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
      26b6318d
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
  7. 28 Jun, 2013 1 commit
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
  8. 24 Jun, 2013 1 commit
  9. 21 Jun, 2013 1 commit
  10. 14 Jun, 2013 1 commit
  11. 10 Jun, 2013 1 commit
    • Adrian Grange's avatar
      Implement intra-coded frames · eac344ef
      Adrian Grange authored
      Implements ability to signal and decode frames that are
      encoded using only intra coding modes. Only the decode
      side has been implemented here.
      
      Change-Id: I53ac6a8d90422cd08ba389e5236e15b45f9e93de
      eac344ef
  12. 31 May, 2013 1 commit
  13. 30 May, 2013 1 commit
    • Sami Pietila's avatar
      Replace scatter scan 32x32 with HW friendly scan. · 5700b4ea
      Sami Pietila authored
      The first 240 coeff positions (15 top-left blocks) are scanned in the
      same order as in scatter scan, after that the coeffs are scanned in
      "block bands", each band at a time, all coeffs in one band before
      moving on to the next band. This brings down the amount of 4x4 coeff
      blocks that need to be buffered while scanning, from 15 blocks to 8 blocks.
      
      Change-Id: I478a991d63c48bd5e64d36e59fed7a00c9a651ba
      5700b4ea
  14. 29 May, 2013 2 commits
    • Deb Mukherjee's avatar
      Balancing coef-tree to reduce bool decodes · b8b3f1a4
      Deb Mukherjee authored
      This patch changes the coefficient tree to move the EOB to below
      the ZERO node in order to save number of bool decodes.
      
      The advantages of moving EOB one step down as opposed to two steps down
      in the other parallel patch are: 1. The coef modeling based on
      the One-node becomes independent of the tree structure above it, and
      2. Fewer conext/counter increases are needed.
      
      The drawback is that the potential savings in bool decodes will be
      less, but assuming that 0s are much more predominant than 1's the
      potential savings is still likely to be substantial.
      
      Results on derf300: -0.237%
      
      Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
      b8b3f1a4
    • Sami Pietila's avatar
      Residual coding to cache energy class of tokens. · 88a4d4c5
      Sami Pietila authored
      Proposal for tuning the residual coding by changing how the context
      from previous tokens is calculated. Storing the energy class of previous
      tokens instead of the token itself eases the critical path of
      HW implementations.
      
      Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
      88a4d4c5
  15. 28 May, 2013 1 commit
  16. 24 May, 2013 1 commit
  17. 23 May, 2013 1 commit
  18. 22 May, 2013 1 commit
    • Deb Mukherjee's avatar
      Using 128 entry look up table for coef models · de4d682c
      Deb Mukherjee authored
      Reverts to using 128 bit LUT for the coef models rather than 48
      to ease hardware implementation.
      
      Also incorporates some cleanups including removing various
      hooks to support different lookup tables based on block_type and
      ref_type.
      
      Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
      de4d682c
  19. 21 May, 2013 2 commits
    • Deb Mukherjee's avatar
      Merging the model coef prob experiment · 7a645e4e
      Deb Mukherjee authored
      Merges the experiment.
      
      Change-Id: I4eb19af6de6df6aa3a96a2e82f231d47ed9b3ae9
      7a645e4e
    • Deb Mukherjee's avatar
      Refinements on modelcoef expt to reduce storage · 07443f15
      Deb Mukherjee authored
      Uses more aggrerssive interpolation to reduce storage for the
      model tables by almost more than half. Only 48 lists of probs are
      stored (as opposed to 128 before), corresponding to ONE_NODE
      probabilities of:
      1,
      3, 7, 11, ..., 115, 119,
      127, 135, ..., 247, 255.
      
      Besides, only 1 table is used as opposed to 2 before. So the overall
      memory needed for the tables is just 48 * 8 = 384 bytes.
      
      The table currently used is based on a new Pareto distribution with
      heavier tail than a generalized Gaussian - which improves results on
      derf by about 0.1% over a single table Generaized Gaussian.
      
      Results overall on derfraw300 is -0.14%.
      
      Change-Id: I19bd03559cbf5894a9f8594b8023dcc3e546f6bd
      07443f15
  20. 20 May, 2013 1 commit
    • Deb Mukherjee's avatar
      Updating the model coef experiment · 39a90bc8
      Deb Mukherjee authored
      Cleans up the experiment. Actually uses reduced counts for backward
      updates, and reduced number of probabilities in the context.
      
      No change in bitstream when the experiment is on.
      
      Between expt on and off:
      derfraw300 is down only -0.062% (which is better than when expts
      were run previously).
      
      Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
      39a90bc8
  21. 13 May, 2013 1 commit
    • Paul Wilkins's avatar
      Change to band calculation. · e5f71520
      Paul Wilkins authored
      Change band calculation back to simpler model based
      on the order in which coefficients are coded in scan order
      not the absolute coefficient positions.
      
      With the scatter scan experiment enabled the results were
      appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).
      
      Without the scatterscan experiment on the results were up derf as well.
      
      Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
      e5f71520
  22. 07 May, 2013 2 commits
  23. 29 Apr, 2013 1 commit
    • Deb Mukherjee's avatar
      Turning model based reverse update on for coefs · 040eeed9
      Deb Mukherjee authored
      Turns model based reverse updates on for coefficients in an
      effort to reduce the memory requirement for counters.
      
      With this patch the counters needed will be reduced by about
      75% since only 3 counts are needed instead of 12.
      
      The impact in performance is:
      derf300: -0.252%
      stdhd250: -0.046%
      
      However retraining should alleviate some of the drop in
      performance.
      
      Change-Id: I6f2b3e13f6d5520aa3400b0b228fb5e8b4a43caa
      040eeed9
  24. 22 Apr, 2013 3 commits
  25. 19 Apr, 2013 1 commit
  26. 11 Apr, 2013 2 commits
  27. 28 Mar, 2013 2 commits
    • Deb Mukherjee's avatar
      Framework changes in nzc to allow more flexibility · fe9b5143
      Deb Mukherjee authored
      The patch adds the flexibility to use standard EOB based coding
      on smaller block sizes and nzc based coding on larger blocksizes.
      The tx-sizes that use nzc based coding and those that use EOB based
      coding are controlled by a function get_nzc_used().
      By default, this function uses nzc based coding for 16x16 and 32x32
      transform blocks, which seem to bridge the performance gap
      substantially.
      
      All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.
      
      Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
      fe9b5143
    • Ronald S. Bultje's avatar
      Fix mix-up in pt token indexing. · 9eea9fa2
      Ronald S. Bultje authored
      This fixes uninitialized reads in the trellis, and probably makes the
      trellis do something again.
      
      Change-Id: Ifac8dae9aa77574bde0954a71d4571c5c556df3c
      9eea9fa2
  28. 27 Mar, 2013 1 commit
    • Ronald S. Bultje's avatar
      Scatter-based scantables. · 513157e0
      Ronald S. Bultje authored
      This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
      this under an experimental flag if wanted, just trying to get my patch
      queue in shape.
      
      Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
      513157e0
  29. 26 Mar, 2013 1 commit