1. 03 Jul, 2013 15 commits
  2. 02 Jul, 2013 24 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
      be77f6bb
    • Dmitry Kovalev's avatar
      Adding write_selected_txfm_size function. · edb060a7
      Dmitry Kovalev authored
      Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
      edb060a7
    • Yaowu Xu's avatar
      Added a speed feature use_square_partition_only · 0d7b7c09
      Yaowu Xu authored
      This commit adds a speed feature where only squared partition are
      evaluated in partition picking. Enable this feature in cpu-used 2
      reduces encoding time by ~30%.
      
      loss of compression:
      -0.9% on cif set
      -1.23% on stdhd
      
      Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
      0d7b7c09
    • Ronald S. Bultje's avatar
      Use pmovmskb to skip quantize loops over empty coefficients. · e5fb4b61
      Ronald S. Bultje authored
      If none of the 16 coefficients that we quantize per loop iteration
      are larger than the zbin, directly skip to the next round of coeffs,
      rather than doing a full quantize loop that will eventually result
      in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
      32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
      no significantly positive results for smaller transforms, so is not
      used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).
      
      Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647
      e5fb4b61
    • Ronald S. Bultje's avatar
      Remove unused function vp9_build_inter4x4_predictors_mbuv(). · 5b872402
      Ronald S. Bultje authored
      Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
      5b872402
    • Jim Bankoski's avatar
      new unit test for cpu-speed · b0520b61
      Jim Bankoski authored
      Tests q0 ( lossless),  very high bitrate and low bitrates at cpu speed
      0, 1 and 2.
      
      Change-Id: I0c5cdca00acd8d01e7b13f124b3b08d4b1ae9f6d
      b0520b61
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
      37501d68
    • Deb Mukherjee's avatar
      66324d50
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Deb Mukherjee's avatar
      Clean-up in forward update to use mapping tables · 9c20cedd
      Deb Mukherjee authored
      Uses mapping tables instead of complicated modulo/division
      operations for prob mapping for forward updates.
      
      No bit-stream or output change.
      
      Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
      9c20cedd
    • Dmitry Kovalev's avatar
      904070ca
    • Ronald S. Bultje's avatar
      3cc6eb7c
    • Dmitry Kovalev's avatar
    • Dmitry Kovalev's avatar
      18fd4360
    • Dmitry Kovalev's avatar
      Removing unused implicit segmentation code. · a3d2e6c9
      Dmitry Kovalev authored
      Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
      a3d2e6c9
    • Yunqing Wang's avatar
      f4bee75c
    • Yunqing Wang's avatar
      Add speed feature to disable splitmv · b12e060b
      Yunqing Wang authored
      Added a speed feature in speed 1 to disable splitmv for HD (>=720)
      clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
      loss. Encoding speedup is 36%.
      
      (For reference: The test result on derf set showed 2% psnr loss
      and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
      enabled for small resolution videos.)
      
      Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
      b12e060b
    • Jingning Han's avatar
      Calculate rd cost per transformed block · b91a1586
      Jingning Han authored
      Compute the rate-distortion cost per transformed block, and cumulate
      the cost through all blocks inside a partition. This allows encoder
      to detect if the cumulative rd cost is already above the best rd cost,
      thereby enabling early termination in the rate-distortion optimization
      search.
      
      Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
      b91a1586
    • Ronald S. Bultje's avatar
    • Paul Wilkins's avatar
      Adjust Speed 0 settings. · 1319d9c0
      Paul Wilkins authored
      Remove the use of sf->comp_inter_joint_search_thresh
      from the baseline speed 0. Approx +0.4% on derf.
      
      Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71
      1319d9c0
    • Paul Wilkins's avatar
      Revert "New motion threshold factor - speed feature." · b7cd01ed
      Paul Wilkins authored
      This reverts commit 13772781.
      Also fixes a spelling mistake.
      
      Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
      b7cd01ed
    • Yaowu Xu's avatar
      fix the mismatch again in cpu_used 2 · 9e408e35
      Yaowu Xu authored
      Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679
      9e408e35
    • Jim Bankoski's avatar
      use partitioning from last frame · d4158283
      Jim Bankoski authored
      This cl converts use partition from last frame to do the following:
      
      if part is none,horz, vert -> try split
      if part != none and one of the children is not split - try none
      
      
      Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
      Signed-off-by: default avatarJim Bankoski <jimbankoski@google.com>
      d4158283
    • Dmitry Kovalev's avatar
      Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
      Dmitry Kovalev authored
      Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
      1ac05402
  3. 01 Jul, 2013 1 commit
    • Ronald S. Bultje's avatar
      Make get_coef_context() branchless. · 26b6318d
      Ronald S. Bultje authored
      This should significantly speedup cost_coeffs(). Basically what the
      patch does is to make the neighbour arrays padded by one item to
      prevent an eob check in get_coef_context(), then it populates each
      col/row scan and left/top edge coefficient with two times the same
      neighbour - this prevents a single/double context branch in
      get_coef_context(). Lastly, it populates neighbour arrays in pixel
      order (rather than scan order), so we don't have to dereference the
      scantable to get the correct neighbours.
      
      Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
      goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
      
      Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
      26b6318d