1. 10 Jul, 2013 4 commits
  2. 09 Jul, 2013 8 commits
    • John Koleszar's avatar
      Remove all asm offset files from VP9 · f0d9f10d
      John Koleszar authored
      The files are empty and unused.
      Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a
    • Frank Galligan's avatar
      Add Neon horizontal and vertical vp9_mbloop_filter · 198fa6d0
      Frank Galligan authored
      - The vp9 mbfilter C code will branch on flat and mask. This CL
        will perform both branches and combine the data. A later CL will
        perform a check to see if all patch will take one branch.
      - These functions are about 1.75 times faster than the C code on
        Nexus 7.
      PS #3
      - Changed all functions to dub limit, blimit, and thresh from
        vld {dx[]}, freeing up r4-r6.
      - Changed code to use vbif to reduce one instruction and free
        up a d register.
      Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
    • Dmitry Kovalev's avatar
      Loop filter code cleanup. · 92a9eaef
      Dmitry Kovalev authored
      Using MAX_LOOP_FILTER constant instead of number 63.
      Change-Id: If91e0c198331b3041e7cd0707a5948479e9209d8
    • Scott LaVarnway's avatar
      Removed unnecessary xd->mode_info_context assignment · 69d1d1d8
      Scott LaVarnway authored
      mi is xd->mode_info_context
      Change-Id: Ib101be922b695205ec57b5ce1828ba19bde5b41c
    • Ronald S. Bultje's avatar
      Unbreak lossless. · 059c0ba5
      Ronald S. Bultje authored
      Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
    • Jim Bankoski's avatar
      cleanup read_mode_info if (1) · 7f960223
      Jim Bankoski authored
      Change-Id: I851af23c787a2d3637d84244b9f75063cbf782f1
    • Jim Bankoski's avatar
      decoder speedup - get-segment-id only if segmentation enabled · c36d502e
      Jim Bankoski authored
      Change-Id: I9355f8446660aeb7dfdbc5ee56635c791ac35e95
    • Ronald S. Bultje's avatar
      Make intra prediction pointers RTCD-based. · 8350e7fe
      Ronald S. Bultje authored
      This probably has a mildly negative impact on performance, but will
      (in future commits - or possibly merged with this one) allow SIMD
      implementations of individual intra prediction functions. We may
      perhaps want to consider having separate functions per txfm-size
      also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
      each intra prediction mode), but I haven't played much with that
      Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
  3. 08 Jul, 2013 9 commits
    • John Koleszar's avatar
      Fix loopfilter bug · 527fc5ca
      John Koleszar authored
      In the rare case were 4x4 interior filtering was called for but no
      8x8 or larger filtering takes place, the previous code was skipping
      the filtering. This patch fixes the issue by including the interior
      mask in the overall mask for the filter application loops.
      Change-Id: I4a0b65056c64f97478827c2ff41e0914fc7779d0
    • Ronald S. Bultje's avatar
      Don't call encode_sb() for the final of 4-split subpartitions. · a5062cc6
      Ronald S. Bultje authored
      The resulting reconstruction is never used, thus it just wastes CPU
      cycles. Reduces encode time of first 50 frames of bus (speed 0) @
      1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.
      Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
    • Ronald S. Bultje's avatar
      Inline vp9_get_mv_joint(). · bd867f16
      Ronald S. Bultje authored
      Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
      prevent the call overhead.
      Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5
    • Ronald S. Bultje's avatar
      Don't recalculate mv_ref costs for each block/partition. · 8fde07a3
      Ronald S. Bultje authored
      Changes cost_mv_ref() into doing a LUT into pre-calculated cost
      arrays instead. Encode time of first 50 frames of bus (speed 0)
      @ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.
      Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
    • Ronald S. Bultje's avatar
      Remove unnecessary memset(best_index, 0) from trellis/optimize. · 5a732549
      Ronald S. Bultje authored
      First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
      2min11.6, i.e. 0.75% overall speedup.
      Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
    • Ronald S. Bultje's avatar
      Remove memcpy() in handle_inter_mode() filter selection. · fcf7998a
      Ronald S. Bultje authored
      Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
      2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.
      Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
    • Dmitry Kovalev's avatar
      Using mi_cols instead of mb_cols. · b7559258
      Dmitry Kovalev authored
      Eliminating usage of mb-units, switching to mi-units. Adding
      ALIGN_POWER_OF_TWO macro.
      Change-Id: I2491c969f713207c062011878b57e4e531818607
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      Specifically the flags and their impact are:
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
  4. 04 Jul, 2013 2 commits
  5. 03 Jul, 2013 8 commits
    • Dmitry Kovalev's avatar
      Adding write_skip_coeff function. · dda1835d
      Dmitry Kovalev authored
      Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd
    • Jingning Han's avatar
      Enable early termination in rd search · 2bd6fe08
      Jingning Han authored
      This commit allows encoder to detect the cumulative rate-distortion
      cost per transformed block inside a partition. If the cumulative
      rd cost is already above the best rd value, it terminates the rest
      operations and continue to next prediction mode test.
      It reduces the runtime of bus at target bit-rate 2000 from 308 second
      to 266 second, i.e., about 13% speed-up at no performance penalty.
      Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
    • Dmitry Kovalev's avatar
      Calling set_partition_seg_context() instead of code duplication. · 2ad62c93
      Dmitry Kovalev authored
      Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555
    • Dmitry Kovalev's avatar
      Replacing 64 / MI_SIZE with MI_BLOCK_SIZE. · 5a21de84
      Dmitry Kovalev authored
      Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
    • Yaowu Xu's avatar
      Inline a few intra predictors · 0f02dc27
      Yaowu Xu authored
      Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
    • Jingning Han's avatar
      Refactor SSE2 8x8 functional units · 2cb75c96
      Jingning Han authored
      These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
      hybrid transform coding.
      Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
    • Paul Wilkins's avatar
      Fix to comp_inter_joint_search_thresh feature. · f58b44ad
      Paul Wilkins authored
      When this is 0 (BLOCK_SIZE_AB4X4) we want to do
      the inter joint search for all sizes.
      Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
    • Paul Wilkins's avatar
      Added two new skip experiments. · 72c5778e
      Paul Wilkins authored
      sf->unused_mode_skip_lvl. Tests modes as normal for all
      sizes at or below the given level. At larger sizes it skips
      all modes that were not chosen at any smaller size.
      Hence setting BLOCK_SIZE_SB64X64 is in effect off.
      Setting BLOCK_SIZE_AB4X4 will only consider modes that
      were chosen for one or more 4x4 blocks at larger sizes.
      Do a test encode of the NONE partition at one size and create
      a reference frame mask based on the best rd choice. In the
      full search only allow this reference frame.
      Currently it is testing 64x64 and repeats this in the full search.
      This does not work well with Jim's Partition code just now and
      is disabled by default.
      Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
  6. 02 Jul, 2013 9 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
    • Dmitry Kovalev's avatar
      Adding write_selected_txfm_size function. · edb060a7
      Dmitry Kovalev authored
      Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
    • Yaowu Xu's avatar
      Added a speed feature use_square_partition_only · 0d7b7c09
      Yaowu Xu authored
      This commit adds a speed feature where only squared partition are
      evaluated in partition picking. Enable this feature in cpu-used 2
      reduces encoding time by ~30%.
      loss of compression:
      -0.9% on cif set
      -1.23% on stdhd
      Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
    • Ronald S. Bultje's avatar
      Use pmovmskb to skip quantize loops over empty coefficients. · e5fb4b61
      Ronald S. Bultje authored
      If none of the 16 coefficients that we quantize per loop iteration
      are larger than the zbin, directly skip to the next round of coeffs,
      rather than doing a full quantize loop that will eventually result
      in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
      32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
      no significantly positive results for smaller transforms, so is not
      used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).
      Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647
    • Ronald S. Bultje's avatar
      Remove unused function vp9_build_inter4x4_predictors_mbuv(). · 5b872402
      Ronald S. Bultje authored
      Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
    • Deb Mukherjee's avatar
      Clean-up in forward update to use mapping tables · 9c20cedd
      Deb Mukherjee authored
      Uses mapping tables instead of complicated modulo/division
      operations for prob mapping for forward updates.
      No bit-stream or output change.
      Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
    • Dmitry Kovalev's avatar
      Removing unused implicit segmentation code. · a3d2e6c9
      Dmitry Kovalev authored
      Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905