1. 10 Aug, 2013 1 commit
  2. 09 Aug, 2013 1 commit
  3. 08 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Adds a new subpel motion function · 1ba91a84
      Deb Mukherjee authored
      Adds a new subpel motion estimation function that uses a 2-level
      tree-structured decision tree to eliminate redundant computations.
      It searches fewer points than iterative search (which can search
      the same point multiple times) but has the same quality roughly.
      
      This is made the default setting at speeds 0 and 1, while at
      speed 2 and above only a 1-level search is used.
      
      Also includes various cleanups for consistency and redundancy removal.
      
      Results:
      derf: +0.012% psnr
      stdhd: +0.09% psnr
      Speedup of about 2-3%
      
      Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
      1ba91a84
  4. 07 Aug, 2013 2 commits
    • Jingning Han's avatar
      Use low precision 32x32fdct for encodemb in speed1 · debb9c68
      Jingning Han authored
      The low precision 32x32 fdct has all the intermediate steps within
      16-bit depth, hence allowing faster SSE2 implementation, at the
      expense of larger round-trip error. It was used in the rate-distortion
      optimization search loop only.
      
      Using the low precision version, in replace of the high precision one,
      affects the compression performance by about 0.7% (derf, stdhd) at
      speed 0. For speed 1, it makes derf set down by only 0.017%.
      
      Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
      debb9c68
    • Deb Mukherjee's avatar
      Clean ups of the subpel search functions · 71b43b0f
      Deb Mukherjee authored
      Removes some unused code and speed features, and organizes the
      interfaces for fractional mv step functions for use in new speed
      features to come.
      
      In the process a new speed feature - number of iterations per
      step during the subpel search - is exposed.
      
      No change when this parameter is set as the original value of 3.
      
      Results:
      subpel_iters_per_step = 3: baseline
      subpel_iters_per_step = 2: psnr -0.067%, 1% speedup
      subpel_iters_per_step = 1: psnr -0.331%, 3-4% speedup
      
      Change-Id: I2eba8a21f6461be8caf56af04a5337257a5693a8
      71b43b0f
  5. 06 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Flexible support for various pattern searches · 15b5a6a2
      Deb Mukherjee authored
      Adds a few pattern searches to achieve various tradeoffs
      between motion estimation complexity and performance.
      The search framework is unified across these searches so that a
      common pattern search function is used for all. Besides it will
      be easier to experiment with various patterns or combinations
      thereof at different scales in the future.
      
      The new pattern search is multi-scale and is capable of using
      different patterns at different scales.
      
      The new hex search uses 8 points at the smallest scale
      and 6 points at other scales.
      Two other pattern searches - big-diamond and square are
      also added. Big diamond uses 4 points at the smallest scale and
      8 points in diamond shape at the larger scales.
      Square is very similar conceptually to the default n-step search
      but is somewhat faster since it keeps only one survivor across
      all scales.
      
      Psnr/speed-up results on derf300:
      
      hex: -1.6% psnr%, 6-8% speed-up
      big-diamond: -0.96% psnr, 4-5% speedup
      square: -0.93% psnr, 4-5% speedup
      
      Change-Id: I02a7ef5193f762601e0994e2c99399a3535a43d2
      15b5a6a2
  6. 05 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Add variance based mode/skipping · 8b3faccb
      Deb Mukherjee authored
      Adds a speed feature to skip all intra modes other than
      DC_PRED if the source variance is small. This feature is
      made part of speed 1 and up.
      
      Results on derf300: psnr -0.07%, speedup about 1-2%
      
      Also uses the source variance to fine-tune the early
      termination criteria when FLAG_EARLY_TERMINATE is on.
      This feature is made part of speed 2 and up.
      
      Results on derf300: psnr -0.52%, speedup about 5-7%
      
      Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
      8b3faccb
  7. 01 Aug, 2013 1 commit
    • Yunqing Wang's avatar
      Comment out 2 unused speed features · 7965a6ea
      Yunqing Wang authored
      use_min_partition_size and use_max_partition_size are not used
      currently, and could be added back if needed later.
      
      Change-Id: Ib22a9c06b064567a7c1d6d5445567ed77e0d3acc
      7965a6ea
  8. 30 Jul, 2013 1 commit
  9. 29 Jul, 2013 2 commits
  10. 26 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Auto min and max partition size experiment. · fe5e2a91
      Paul Wilkins authored
      Speed feature experiment to set an upper and lower
      partition size limit based on what has been seen
      in spatial neighbors.
      
      This seems to gives quite reasonable speed gains in local
      (10-15%) and when used with speed 0 the losses are small
      (0.25% derf, 0.35% stdhd). However, for now I am only
      enabling it on speed 1 as there may be clashes with the existing
      temporal partition selection in speed 2.
      
      Using a tighter min / max around the range derived from the
      neighbors increases speed further but at the cost of a
      bigger quality loss. However,  I think this spatial method could
      be combined with data from either the last frame or a variance
      method (or both) to refine the range of minimum and maximum
      partition size. I.e. consider the min and max from spatial and
      temporal neighbors and the variance recommendation.
      
      Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
      fe5e2a91
  11. 23 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Renaming of segment constants. · 32042af1
      Paul Wilkins authored
      Renamed:
        MAX_MB_SEGMENTS to MAX_SEGMENTS
        MB_SEG_TREE_PROBS to SEG_TREE_PROBS
      
      The minimum unit for segmentation in the segment map
      is now 8x8 so it is misleading to use MB_ as macro-block
      traditionally refers to a 16x16 region.
      
      Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
      32042af1
  12. 22 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Re-order mode search in rd. · 1d189d64
      Paul Wilkins authored
      Mode search order in rd loop changed to better reflect
      observed hit counts.
      
      Also some adjustment of the baseline mode rd thresholds
      to reflect the order change and observed frequencies.
      
      Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
      1d189d64
  13. 19 Jul, 2013 2 commits
    • Deb Mukherjee's avatar
      Reworked the auto_mv_step_size speed feature · 302698fb
      Deb Mukherjee authored
      This patch modifies the auto_mv_step_size speed feature to
      use a combination of the maximum magnitude mv from the last
      inter frame, and the maximum magnitude mv for the two reference
      mvs with the same reference. For arf frames, the max mav step
      for the resolution is used.
      The bounds therefore are slightly tighter. The feature is made
      a speed 1 feature.
      
      Rebased.
      
      Results (when this feature is turned on over speed 0):
      derfraw300: -0.046% psnr, about 5+% speedup
      (tested on football: goes from 4m30.760s to 4m17.410s).
      
      Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
      302698fb
    • Paul Wilkins's avatar
      Alignment of THR_MODES to vp9_mode_order[] · f3ed9f55
      Paul Wilkins authored
      Change-Id: I4032dd0442043543954dcb3724df974b7cc7e515
      f3ed9f55
  14. 18 Jul, 2013 1 commit
  15. 17 Jul, 2013 2 commits
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      
      Further improvement will be done later.
      
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
      df90d58f
    • Paul Wilkins's avatar
      Move uv intra mode selection in rd loop. · 2ee338ce
      Paul Wilkins authored
      Use an estimate based on DC_PRED for intra uv cost
      within the rd loop then only do a full uv mode analysis
      if an intra mode is chosen.
      
      Significant speed gains in some cases. Currently only
      enabled for speed 2 pending speed/quality tests.
      
      Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
      2ee338ce
  16. 16 Jul, 2013 1 commit
  17. 15 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      
      This operation is currently turned on in settings of speed 1.
      
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
      faff6ed0
  18. 12 Jul, 2013 2 commits
    • Dmitry Kovalev's avatar
      Adding struct tx_probs and struct tx_counts to cleanup the code. · cc662dd7
      Dmitry Kovalev authored
      Also removing unused declarations from vp9_entropymode.h file.
      
      Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
      cc662dd7
    • Deb Mukherjee's avatar
      Some minor cleanups for efficiency · 94c481f9
      Deb Mukherjee authored
      Implements some of the helper functions more efficiently with
      lookups rathers than branches. Modeling function is consolidated
      to reduce some computations.
      
      Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
      one because there is no need to keep them separate (even though
      the semantics are a little different).
      
      No bitstream or output change.
      
      About 0.5% speedup
      
      Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
      94c481f9
  19. 10 Jul, 2013 2 commits
    • Deb Mukherjee's avatar
      Prunes out full-rd computation based on modeled rd · 53ff43ad
      Deb Mukherjee authored
      Adds a speed feature to eliminate full-rd computation if the modeled
      rd or rd based on a different parameter in the same mode is already
      a lot larger than the best rd yet.
      
      Specifically, only search the sharp and smooth filters if the modeled
      rd cost based on the  regular filter is within a certain factor of the
      best rd cost so far. Also, skip full-rd computation of non splitmv
      inter modes if the modeled rd cost based on pred error is within the
      same factor of the best rd cost so far.
      
      Also adds some enhancements in the rd search for splitmv mode to
      speed things up by early breakouts. Negligible impact on performance.
      
      Resuts on derfraw300:
      psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
               breakout feature on.
      speedup: 6% with splitmv enhancements, 20% with also residual breakout
               (tested on football sequence at 600 Kbps)
      
      Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
      53ff43ad
    • Yaowu Xu's avatar
      Add a feature to reduce chrome intra mode search · bed27a96
      Yaowu Xu authored
      Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
      bed27a96
  20. 08 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
      ed995afb
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      
      Specifically the flags and their impact are:
      
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
      d9b62160
  21. 03 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Added two new skip experiments. · 72c5778e
      Paul Wilkins authored
      sf->unused_mode_skip_lvl. Tests modes as normal for all
      sizes at or below the given level. At larger sizes it skips
      all modes that were not chosen at any smaller size.
      Hence setting BLOCK_SIZE_SB64X64 is in effect off.
      Setting BLOCK_SIZE_AB4X4 will only consider modes that
      were chosen for one or more 4x4 blocks at larger sizes.
      
      sf->reference_masking.
      Do a test encode of the NONE partition at one size and create
      a reference frame mask based on the best rd choice. In the
      full search only allow this reference frame.
      Currently it is testing 64x64 and repeats this in the full search.
      This does not work well with Jim's Partition code just now and
      is disabled by default.
      
      Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
      72c5778e
  22. 02 Jul, 2013 6 commits
    • Yaowu Xu's avatar
      Added a speed feature use_square_partition_only · 0d7b7c09
      Yaowu Xu authored
      This commit adds a speed feature where only squared partition are
      evaluated in partition picking. Enable this feature in cpu-used 2
      reduces encoding time by ~30%.
      
      loss of compression:
      -0.9% on cif set
      -1.23% on stdhd
      
      Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
      0d7b7c09
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
      37501d68
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Yunqing Wang's avatar
      Add speed feature to disable splitmv · b12e060b
      Yunqing Wang authored
      Added a speed feature in speed 1 to disable splitmv for HD (>=720)
      clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
      loss. Encoding speedup is 36%.
      
      (For reference: The test result on derf set showed 2% psnr loss
      and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
      enabled for small resolution videos.)
      
      Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
      b12e060b
    • Paul Wilkins's avatar
      Revert "New motion threshold factor - speed feature." · b7cd01ed
      Paul Wilkins authored
      This reverts commit 13772781.
      Also fixes a spelling mistake.
      
      Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
      b7cd01ed
    • Jim Bankoski's avatar
      use partitioning from last frame · d4158283
      Jim Bankoski authored
      This cl converts use partition from last frame to do the following:
      
      if part is none,horz, vert -> try split
      if part != none and one of the children is not split - try none
      
      
      Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
      Signed-off-by: default avatarJim Bankoski <jimbankoski@google.com>
      d4158283
  23. 01 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
    • Paul Wilkins's avatar
      New motion threshold factor - speed feature. · 13772781
      Paul Wilkins authored
      Added a speed feature that focuses only on thresholds
      for new motion modes.
      
      Moved sf->comp_inter_joint_search_thresh into speed
      1.  This has ~+0.4% impact on quality at speed 0 as
      our quality reference baseline.
      
      Slight adjustment to baseline thresholds.
      
      Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
      13772781
  24. 28 Jun, 2013 3 commits
    • Dmitry Kovalev's avatar
      Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb
      Dmitry Kovalev authored
      Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
      ones from vp9_onyx_int.h and vp9_onyxd_int.h.
      
      Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
      8e6ce6bb
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
    • Yaowu Xu's avatar
      Optimize partition search order · 1374a06b
      Yaowu Xu authored
      This commit change the partition search order to allow checking of
      rectangular partition to be done after square partitions. It also
      added a speed feature to skip rectangular partition check when
      NONE is better than SPLIT in RD sense.
      
      This feature roughly speed up encoder by 1.5X with loss on compression
      -0.91% on cif set
      -0.56% on stdhd set
      
      Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
      1374a06b
  25. 27 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make intra predictor reference buffer configurable · 861cb06c
      Jingning Han authored
      This commit enables configurable reference buffer pointer for intra
      predictor. This allows later removal of spatial dependency between
      blocks inside a 64x64 superblock in the rate-distortion optimization
      loop.
      
      Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
      861cb06c