1. 24 Sep, 2013 1 commit
  2. 19 Sep, 2013 1 commit
  3. 16 Sep, 2013 1 commit
    • Paul Wilkins's avatar
      Minor clean up. · cb50dc7f
      Paul Wilkins authored
      Removed some unused code and minor cleanup
      / reordering.
      
      Change-Id: I4083ae56aeb8edfe9b85aa2f42a16aa28d19da94
      cb50dc7f
  4. 13 Sep, 2013 1 commit
    • Jingning Han's avatar
      Adaptive motion search control · c4826c59
      Jingning Han authored
      This commit enables adaptive constraint on motion search range for
      smaller partitions, given the motion vectors of collocated larger
      partition as a candidate initial search point.
      
      It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
      167s down to 162s (3% speed-up), at 0.01dB performance gains. In
      the settings of speed 1, this makes the runtime goes from 33687 ms
      to 32142 ms (4.5% speed-up), at 0.03dB performance gains.
      
      Compression performance wise, it gains at speed 1:
      derf  0.118%
      yt    0.237%
      hd    0.203%
      stdhd 0.438%
      
      Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
      c4826c59
  5. 11 Sep, 2013 2 commits
    • Deb Mukherjee's avatar
      Clean up of the search best filter speed feature · b9646467
      Deb Mukherjee authored
      Removes this speed feature since it is very slow and unlikely
      to be used in practice. This cleanup removes a bunch of unnecessary
      complications in the outer encode loop.
      
      Change-Id: I3c66ef1ca924fbfad7dadff297c9e7f652d308a1
      b9646467
    • Deb Mukherjee's avatar
      Changes in speed 2 settings · 69fe840e
      Deb Mukherjee authored
      Propose some changes to the speed 2 settings to improve quality.
      In particular, turns off the adjust_thresholds_by_speed feature
      which improves results by 6%. Also removes the code for
      adjust_thresholds_by_speed since it conflicts with the adaptive
      rd thresh feature.
      
      Overall, with this change speed 2 is -15.2% from speed 0 settings,
      on derf, which is significantly better than -21.6% down before.
      
      Change-Id: I6e90a563470979eb0c258ec32d6183ed7ce9a505
      69fe840e
  6. 10 Sep, 2013 2 commits
    • Yunqing Wang's avatar
      Modify encode breakout for static frames · 939791a1
      Yunqing Wang authored
      Thank Paul for the suggestions. While turning on static-thresh
      for static-image videos, a big jump on bitrate was seen. In this
      patch, we detected static frames in the video using first-pass
      stats. For different cases, disable encode breakout or reduce
      encode breakout threshold to limit the skipping.
      
      More modification need be done to break incorrect partition
      picking pattern for static frames while skipping happens.
      
      Change-Id: Ia25f47041af0f04e229c70a0185e12b0ffa6047f
      939791a1
    • Paul Wilkins's avatar
      Modified mode skip functionality. · 4f660cc0
      Paul Wilkins authored
      A previous speed feature skipped modes not used in earlier
      partitions but this not longer worked as intended following
      changes to the partition coding order and in conjunction
      with some other speed features (Especially speed 2 and above).
      
      This modified mode skip feature sets a mask after the first X
      modes have been tested in each partition depending on the
      reference frame of the current best case.
      
      This patch also makes some changes to the order modes are
      tested to fit better with this skip functionality.
      
      Initial testing suggests speed and rd hit count improvements
      of up to 20% at speed 1. Quality results. (derf -1.9%, std hd  +0.23%).
      
      Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
      4f660cc0
  7. 09 Sep, 2013 1 commit
    • Ivan Maltz's avatar
      API extensions and sample app for spacial scalable encoder · 01b35c3c
      Ivan Maltz authored
      Sample app: vp9_spatial_scalable_encoder
      vpx_codec_control extensions:
        VP9E_SET_SVC
        VP9E_SET_WIDTH, VP9E_SET_HEIGHT, VP9E_SET_LAYER
        VP9E_SET_MIN_Q, VP9E_SET_MAX_Q
      expanded buffer size for vp9_convolve
      
      modified setting of initial width in vp9_onyx_if.c so that layer size
      can be set prior to initial encode
      
      Default number of layers set to 3 (VPX_SS_DEFAULT_LAYERS)
      Number of layers set explicitly in vpx_codec_enc_cfg.ss_number_layers
      
      Change-Id: I2c7a6fe6d665113671337032f7ad032430ac4197
      01b35c3c
  8. 29 Aug, 2013 1 commit
    • Paul Wilkins's avatar
      Added per pixel inter rd hit count stats · 1f4bf79d
      Paul Wilkins authored
      Added some code to output normalized rd hit count stats.
      In effect this approximates to the average number of rd
      operations/tests per pixel for the sequence.
      
      The results are not quite accurate and I have not bothered
      to account for partial SB64s at frame edges and for key frames
      However they do give some idea of the number of modes /
      prediction methods being tested for each pixel across the
      different partition sizes. This indicates how much scope their
      is for further gains either by reducing the number of partitions
      examined or the modes per partition through heuristics.
      
      Patch 3 moved place where count incremented so partial rd
      tests that are aborted with INT_MAX return are also counted.
      
      Example numbers for first 50 frames of Akiyo.
      Speed 0 ~84.4 rd operations / pixel
      Speed 1 ~28.8
      Speed 2 ~11.9
      
      Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
      1f4bf79d
  9. 28 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Adds a speed feature for fast 1-loop forw updates · e02dc84c
      Deb Mukherjee authored
      Incorporates a speed feature for fast forward updates of
      coefficients. This feature takes 3 values:
      0 - use standard 2-loop version
      1 - use a 1-loop version
      2 - use a 1-loop version with reduced updates
      
      Results: derfraw300 +0.007% (on speed 0) at feature value = 1
                          -0.160% (on speed 0) at feature value = 2
      
      There is substantial speed up at speeds 2 and above for low
      resolution sequences where the entropy updates are a big part
      of the overall computations.
      
      Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae
      e02dc84c
  10. 27 Aug, 2013 1 commit
  11. 24 Aug, 2013 2 commits
  12. 23 Aug, 2013 2 commits
    • Paul Wilkins's avatar
      Changes to adaptive inter rd thresholds. · aa5b67ad
      Paul Wilkins authored
      Values now carried over frame to frame.
      Change to algorithm for decreasing threshold after
      a hit and to max threshold (now based on speed)
      
      Removed some old commented out code relating to
      VP8 adaptive thresholds.
      
      The impact of these changes tested on Akiyo (50 frames)
      and measured in terms of unit rd hits is as follows:
      
      Speed 0 84.36 -> 84.67
      Speed 1 29.48 -> 22.22
      Speed 2 11.76 -> 8.21
      Speed 3 12.32 -> 7.21
      
      Encode speed impact is broadly in line with these.
      
      Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
      aa5b67ad
    • Paul Wilkins's avatar
      Limit Key frame Intra modes checks. · f76f52df
      Paul Wilkins authored
      Most of the focus so far has been on inter frames.
      
      At high speed settings the key frame is now taking a high %
      of the cycles.
      
      This patch puts in some masking to reduce the number
      of INTRA modes searched during key frame coding (as already
      happens for inter frames) at higher speed settings
      
      TODO: Develop this further with either adaptive rd thresholds
      when choosing which intra modes to consider or some other
      heuristic.
      
      Impact.
      At high speed settings on some clips the key frame was starting
      to dominate. In a coding of the first 50 frames of AKIYO at speed
      2 limiting the key frame intra modes to DC or TM_PRED resulted in
      ~30% overall speedup. For Bus the number was lower at ~4-5%.
      
      Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
      f76f52df
  13. 20 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Cleanup/enhancements of switchable filter search · 2ffe64ad
      Deb Mukherjee authored
      Cleans up the switchable filter search logic. Also adds a
      speed feature - a variance threshold - to disable filter search
      if source variance is lower than this value.
      
      Results: derfraw300
      threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
      threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
      threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
      threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)
      
      Based on these results, the threshold is chosen as 16 for speed 1,
      32 for speed 2, 64 for speed 3 and 96 for speed 4.
      
      Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
      2ffe64ad
  14. 15 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Speed feature to skip split partition based on var · 24856b6a
      Deb Mukherjee authored
      Adds a speed feature to disable split partition search based on a
      given threshold on the source variance. A tighter threshold derived
      from the threshold provided is used to also disable horizontal and
      vertical partitions.
      
      Results on derfraw300:
      threshold = 16, psnr = -0.057%, speedup ~1% (football)
      threshold = 32, psnr = -0.150%, speedup ~4-5% (football)
      threshold = 64, psnr = -0.570%, speedup ~10-12% (football)
      
      Results on stdhdraw250:
      threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
      because of a larger number of smoother blocks at higher resolution.
      
      Based on these results, a threshold of 32 is chosen for speed 1,
      and a threshold of 64 is chosen for speeds 2 and above.
      
      Change-Id: If08912fb6c67fd4242d12a0d094783a99f52f6c6
      24856b6a
  15. 13 Aug, 2013 1 commit
    • Paul Wilkins's avatar
      Trivial clean up. · 5459f68d
      Paul Wilkins authored
      Delete unused / commented out  variable references.
      
      Change-Id: Iaf20c0c3744f89adb296d153b516b5ea41b4f3b4
      5459f68d
  16. 10 Aug, 2013 1 commit
  17. 09 Aug, 2013 1 commit
  18. 08 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Adds a new subpel motion function · 1ba91a84
      Deb Mukherjee authored
      Adds a new subpel motion estimation function that uses a 2-level
      tree-structured decision tree to eliminate redundant computations.
      It searches fewer points than iterative search (which can search
      the same point multiple times) but has the same quality roughly.
      
      This is made the default setting at speeds 0 and 1, while at
      speed 2 and above only a 1-level search is used.
      
      Also includes various cleanups for consistency and redundancy removal.
      
      Results:
      derf: +0.012% psnr
      stdhd: +0.09% psnr
      Speedup of about 2-3%
      
      Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
      1ba91a84
  19. 07 Aug, 2013 2 commits
    • Jingning Han's avatar
      Use low precision 32x32fdct for encodemb in speed1 · debb9c68
      Jingning Han authored
      The low precision 32x32 fdct has all the intermediate steps within
      16-bit depth, hence allowing faster SSE2 implementation, at the
      expense of larger round-trip error. It was used in the rate-distortion
      optimization search loop only.
      
      Using the low precision version, in replace of the high precision one,
      affects the compression performance by about 0.7% (derf, stdhd) at
      speed 0. For speed 1, it makes derf set down by only 0.017%.
      
      Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
      debb9c68
    • Deb Mukherjee's avatar
      Clean ups of the subpel search functions · 71b43b0f
      Deb Mukherjee authored
      Removes some unused code and speed features, and organizes the
      interfaces for fractional mv step functions for use in new speed
      features to come.
      
      In the process a new speed feature - number of iterations per
      step during the subpel search - is exposed.
      
      No change when this parameter is set as the original value of 3.
      
      Results:
      subpel_iters_per_step = 3: baseline
      subpel_iters_per_step = 2: psnr -0.067%, 1% speedup
      subpel_iters_per_step = 1: psnr -0.331%, 3-4% speedup
      
      Change-Id: I2eba8a21f6461be8caf56af04a5337257a5693a8
      71b43b0f
  20. 06 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Flexible support for various pattern searches · 15b5a6a2
      Deb Mukherjee authored
      Adds a few pattern searches to achieve various tradeoffs
      between motion estimation complexity and performance.
      The search framework is unified across these searches so that a
      common pattern search function is used for all. Besides it will
      be easier to experiment with various patterns or combinations
      thereof at different scales in the future.
      
      The new pattern search is multi-scale and is capable of using
      different patterns at different scales.
      
      The new hex search uses 8 points at the smallest scale
      and 6 points at other scales.
      Two other pattern searches - big-diamond and square are
      also added. Big diamond uses 4 points at the smallest scale and
      8 points in diamond shape at the larger scales.
      Square is very similar conceptually to the default n-step search
      but is somewhat faster since it keeps only one survivor across
      all scales.
      
      Psnr/speed-up results on derf300:
      
      hex: -1.6% psnr%, 6-8% speed-up
      big-diamond: -0.96% psnr, 4-5% speedup
      square: -0.93% psnr, 4-5% speedup
      
      Change-Id: I02a7ef5193f762601e0994e2c99399a3535a43d2
      15b5a6a2
  21. 05 Aug, 2013 1 commit
    • Deb Mukherjee's avatar
      Add variance based mode/skipping · 8b3faccb
      Deb Mukherjee authored
      Adds a speed feature to skip all intra modes other than
      DC_PRED if the source variance is small. This feature is
      made part of speed 1 and up.
      
      Results on derf300: psnr -0.07%, speedup about 1-2%
      
      Also uses the source variance to fine-tune the early
      termination criteria when FLAG_EARLY_TERMINATE is on.
      This feature is made part of speed 2 and up.
      
      Results on derf300: psnr -0.52%, speedup about 5-7%
      
      Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
      8b3faccb
  22. 01 Aug, 2013 1 commit
    • Yunqing Wang's avatar
      Comment out 2 unused speed features · 7965a6ea
      Yunqing Wang authored
      use_min_partition_size and use_max_partition_size are not used
      currently, and could be added back if needed later.
      
      Change-Id: Ib22a9c06b064567a7c1d6d5445567ed77e0d3acc
      7965a6ea
  23. 30 Jul, 2013 1 commit
  24. 29 Jul, 2013 2 commits
  25. 26 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Auto min and max partition size experiment. · fe5e2a91
      Paul Wilkins authored
      Speed feature experiment to set an upper and lower
      partition size limit based on what has been seen
      in spatial neighbors.
      
      This seems to gives quite reasonable speed gains in local
      (10-15%) and when used with speed 0 the losses are small
      (0.25% derf, 0.35% stdhd). However, for now I am only
      enabling it on speed 1 as there may be clashes with the existing
      temporal partition selection in speed 2.
      
      Using a tighter min / max around the range derived from the
      neighbors increases speed further but at the cost of a
      bigger quality loss. However,  I think this spatial method could
      be combined with data from either the last frame or a variance
      method (or both) to refine the range of minimum and maximum
      partition size. I.e. consider the min and max from spatial and
      temporal neighbors and the variance recommendation.
      
      Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
      fe5e2a91
  26. 23 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Renaming of segment constants. · 32042af1
      Paul Wilkins authored
      Renamed:
        MAX_MB_SEGMENTS to MAX_SEGMENTS
        MB_SEG_TREE_PROBS to SEG_TREE_PROBS
      
      The minimum unit for segmentation in the segment map
      is now 8x8 so it is misleading to use MB_ as macro-block
      traditionally refers to a 16x16 region.
      
      Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
      32042af1
  27. 22 Jul, 2013 1 commit
    • Paul Wilkins's avatar
      Re-order mode search in rd. · 1d189d64
      Paul Wilkins authored
      Mode search order in rd loop changed to better reflect
      observed hit counts.
      
      Also some adjustment of the baseline mode rd thresholds
      to reflect the order change and observed frequencies.
      
      Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
      1d189d64
  28. 19 Jul, 2013 2 commits
    • Deb Mukherjee's avatar
      Reworked the auto_mv_step_size speed feature · 302698fb
      Deb Mukherjee authored
      This patch modifies the auto_mv_step_size speed feature to
      use a combination of the maximum magnitude mv from the last
      inter frame, and the maximum magnitude mv for the two reference
      mvs with the same reference. For arf frames, the max mav step
      for the resolution is used.
      The bounds therefore are slightly tighter. The feature is made
      a speed 1 feature.
      
      Rebased.
      
      Results (when this feature is turned on over speed 0):
      derfraw300: -0.046% psnr, about 5+% speedup
      (tested on football: goes from 4m30.760s to 4m17.410s).
      
      Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
      302698fb
    • Paul Wilkins's avatar
      Alignment of THR_MODES to vp9_mode_order[] · f3ed9f55
      Paul Wilkins authored
      Change-Id: I4032dd0442043543954dcb3724df974b7cc7e515
      f3ed9f55
  29. 18 Jul, 2013 1 commit
  30. 17 Jul, 2013 2 commits
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      
      Further improvement will be done later.
      
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
      df90d58f
    • Paul Wilkins's avatar
      Move uv intra mode selection in rd loop. · 2ee338ce
      Paul Wilkins authored
      Use an estimate based on DC_PRED for intra uv cost
      within the rd loop then only do a full uv mode analysis
      if an intra mode is chosen.
      
      Significant speed gains in some cases. Currently only
      enabled for speed 2 pending speed/quality tests.
      
      Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
      2ee338ce
  31. 16 Jul, 2013 1 commit
  32. 15 Jul, 2013 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      
      This operation is currently turned on in settings of speed 1.
      
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
      faff6ed0