1. 05 Dec, 2014 2 commits
    • Jingning Han's avatar
      Enable conditional skip path in rd_pick_intra_sby_mode · 74ded486
      Jingning Han authored
      These speed-up features for key frame coding are only turned on
      in the settings of hybrid non-RD and RD mode decision. It provides
      about 20% speed-up to the hybrid key frame coding at the expense
      of certain compression performance loss. For vidyo1, the key frame
      coding statistics are changed
      9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us
      
      Overall rtc set compression performance is down by -0.257%.
      
      Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
      74ded486
    • Jingning Han's avatar
      Use hybrid RD and non-RD coding flow for key frame coding · 07711e9b
      Jingning Han authored
      When block size is below 16x16, the encoder swap from non-RD to
      RD mode for key frame coding. This largely brough back the key
      frame compression performance. For vidyo1 at 1000 kbps, the key
      frame coding statistics are changed
      
      9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us
      
      As compared to the full RD case
      7187F, 34.930 dB, 214470 us
      
      The overall rtc set coding performance (single key frame setting)
      is improved by 1.5%.
      
      Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
      07711e9b
  2. 03 Dec, 2014 2 commits
    • Marco's avatar
      Enable non-rd mode coding on key frame, for speed 6. · 8fd3f9a2
      Marco authored
      For key frame at speed 6: enable the non-rd mode selection in speed setting
      and use the (non-rd) variance_based partition.
      
      Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
      mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
      
      Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
      but speeds up key frame encoding by at least 6x.
      Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
      
      Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
      8fd3f9a2
    • Jingning Han's avatar
      Rework coeff probability model update for rtc coding · 8fe50191
      Jingning Han authored
      This commit reworks the ONE_LOOP_REDUCED coefficient probability
      model update process. It allows model update for every coefficient
      across the spectrum at a coarser resolution, instead of performing
      precise update only for certain subset of probability models.
      
      The overall runtime remains nearly same (<1% change) for speed -6.
      The compression performance is improved by 7.5% in PSNR for speed
      -5 and 4.57% for speed -6, respectively.
      
      Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1
      8fe50191
  3. 26 Nov, 2014 1 commit
  4. 25 Nov, 2014 1 commit
    • Yunqing Wang's avatar
      vp9_ethread: modify VP9_COMP structure · edbd61e1
      Yunqing Wang authored
      This patch modified struct VP9_COMP. Created a struct ThreadData
      to include data that need to be copied for each thread. In
      multiple thread case, one thread processes one tile. all threads
      share one copy of VP9_COMP,
      (refer to VP9_COMP *cpi in the code)
      but each thread has its own copy of ThreadData,
      (refer to ThreadData *td in the code).
      Therefore, within the scope of encode_tiles(), both cpi and td
      need to be passed as function parameters.
      
      In single thread case, the FRAME_COUNTS pointer in ThreadData
      points to "counts" in VP9_COMMON.
      
      Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
      edbd61e1
  5. 24 Nov, 2014 1 commit
    • Jingning Han's avatar
      Key frame non-RD mode decision process · 2fbdfd2c
      Jingning Han authored
      This commit makes a non-RD coding mode decision process for key
      frame coding. It can be optionally turned on in speed -6 and above.
      
      Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
      2fbdfd2c
  6. 21 Nov, 2014 2 commits
  7. 20 Nov, 2014 2 commits
  8. 17 Nov, 2014 1 commit
  9. 14 Nov, 2014 1 commit
  10. 13 Nov, 2014 1 commit
    • Adrian Grange's avatar
      Prepare for dynamic frame resizing in the recode loop · 0d085ebc
      Adrian Grange authored
      Prepare for the introduction of frame-size change
      logic into the recode loop.
      
      Separated the speed dependent features into
      separate static and dynamic parts, the latter being
      those features that are dependent on the frame size.
      
      Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
      0d085ebc
  11. 11 Nov, 2014 1 commit
    • Jingning Han's avatar
      Use reconstructed pixels for intra prediction · e717d22b
      Jingning Han authored
      This commit makes the speed -6 and above use the reconstructed
      boundary pixels for precise intra prediction. This allows more
      intra prediction modes to be tested in the non-RD coding process.
      
      Enabling horizontal and vertical intra prediction modes can
      improve the speed -6 compression performance for rtc set
      by 0.331%.
      
      Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
      e717d22b
  12. 01 Nov, 2014 1 commit
    • Yaowu Xu's avatar
      Fix speed 7 and speed 12 for rt · 0271ff77
      Yaowu Xu authored
      A recent change has introduced big quality drops for speed 7 and 12
      for --rt mode. The change reverted the big drop and improved quality
      by 9.5% for speed 7 and 13.4% for speed 12.
      
      Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01
      0271ff77
  13. 30 Oct, 2014 1 commit
  14. 29 Oct, 2014 1 commit
    • Jingning Han's avatar
      Enable mode search threshold update in non-RD coding mode · 9349a28e
      Jingning Han authored
      Adaptively adjust the mode thresholds after each mode search round
      to skip checking less likely selected modes. Local tests indicate
      5% - 10% speed-up in speed -5 and -6. Average coding performance
      loss is -1.055%.
      
      speed -5
      vidyo1 720p 1000 kbps
      16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms
      
      nik 720p 1000 kbps
      33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms
      
      speed -6
      vidyo1 720p 1000 kbps
      16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms
      
      nik 720p 1000 kbps
      33271 b/f, 38.433 dB,  7886 ms -> 33279 b/f, 38.416 dB, 7843 ms
      
      Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
      9349a28e
  15. 21 Oct, 2014 3 commits
  16. 20 Oct, 2014 1 commit
    • Jingning Han's avatar
      Hybrid partition search for rtc coding mode · 9f128b3e
      Jingning Han authored
      This commit re-designs the recursive partition search scheme in
      rtc speed -5. It first checks if the current block is under cyclic
      refresh mode. If so, apply recursive partition search. Otherwise,
      perform sub-sampled pixel based partition selection. When the
      pre-selection finds the partition size should be 32x32 or above,
      use the partition size directly. Otherwise, apply partition search
      at nearby levels around the preset partition size.
      
      It is enabled in speed -5. The compression performance of rtc
      speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
      from 1% to 10%.
      
      nik_720p, 1000 kbps
      33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
      
      vidyo1_720p, 1000 kbps
      16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
      
      Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
      9f128b3e
  17. 15 Oct, 2014 2 commits
    • Jingning Han's avatar
      Use rate/distortion thresholds to control non-RD partition search · 5e766cce
      Jingning Han authored
      Compare the estimated rate and distortion to the thresholds scaled
      according to the operating block size and determine if further
      split partition search will be run. The compression performance of
      speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
      faster.
      
      vidyo1 720p
      16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
      
      nik720p
      16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
      
      Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
      5e766cce
    • Jingning Han's avatar
      Replace copy_partitioning use case with choose_partitioning · 89b8c7a5
      Jingning Han authored
      This commit replaces the use of copy_partitioning with
      choose_partitioning based on the sse of subsamped pixels, which
      provides significantly better coding performance and runs at
      similar speed, as compared to copy_partitioning. It improves rtc
      speed 5 coding performance by 3%.
      
      Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
      89b8c7a5
  18. 09 Oct, 2014 2 commits
    • Deb Mukherjee's avatar
      Subpel search cleanups and enhancements · d78dbff0
      Deb Mukherjee authored
      - Some fixes to surface fit.
      - Returns variance function as cost rather than sad in the
        pattern search and diamond search functions. Only
        vp9_pattern_search_sad function used in bigdia search
        uses sad as integer 1-away costs.
      - Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
      
      Results:
      derf [Speed 3]: About +0.036% in coding efficiency without any
      discernible speed loss.
      derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
      derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
      
      Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
      d78dbff0
    • Yunqing Wang's avatar
      Allow mode search breakout at very low prediction errors · e18edd5e
      Yunqing Wang authored
      In model_rd_for_sb function, the spatial domain SSE and variance
      are checked to see if transform coefficients are quantized to 0.
      Besides that, this patch adds another set of thresholds that are
      much more strict. These thresholds are used to conduct a partition
      block level check to measure if all its TX blocks are skippable
      for YUV planes. If it is true, x->skip is set for this partition
      block, and thus its mode search is terminated.
      
      This speeds up the encoding at very low prediction error case,
      such as screen sharing application. This patch covers what
      rd_encode_breakout_test() does, so that function is removed.
      
      Borg test at speed 3 shows:
      For stdhd set, psnr: +0.008%, ssim: +0.014%;
      For derf set, psnr: +0.018%, ssim: +0.025%.
      No noticeable speed change.
      
      Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
      e18edd5e
  19. 07 Oct, 2014 1 commit
    • Jim Bankoski's avatar
      experimental : partition using 1/8 x 1/8 image · 0ce51d82
      Jim Bankoski authored
      The concept:
      
      There's too much noise in source pixels for variance and at low bitrate
      the reconstructed looks nothing like the source so we have problems
      getting good partitionings with either.   This skirts the issue by using
      a box blur scaled down version for variance calculations.  To compare
      against source_var_ moved keyframe to be rd based like source_var.
      
      Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
      0ce51d82
  20. 03 Oct, 2014 1 commit
    • Jingning Han's avatar
      Rework partition search skip scheme · bb260d90
      Jingning Han authored
      This commit enables the encoder to skip split partition search if
      the bigger block size has all non-zero quantized coefficients in low
      frequency area and the total rate cost is below a certain threshold.
      It logarithmatically scales the rate threshold according to the
      current block size. For speed 3, the compression performance loss:
      derf  -0.093%
      stdhd -0.066%
      
      Local experiments show 4% - 20% encoding speed-up for speed 3.
      blue_sky_1080p, 1500 kbps
      51051 b/f, 35.891 dB, 67236 ms ->
      50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
      
      old_town_cross_720p, 1500 kbps
      14431 b/f, 36.249 dB, 57687 ms ->
      14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
      
      pedestrian_area_1080p, 1500 kbps
      50812 b/f, 40.124 dB, 100439 ms ->
      50755 b/f, 40.118 dB,  96549 ms (4% speed-up)
      
      mobile_calendar_720p, 1000 kbps
      10352 b/f, 35.055 dB, 51837 ms ->
      10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
      
      Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
      bb260d90
  21. 29 Sep, 2014 1 commit
    • Deb Mukherjee's avatar
      Adds two new subpel search methods · 4e9c0d2a
      Deb Mukherjee authored
      One is a more aggressive version of the pruned subpel tree
      search where only a single halfpel candidate is searched.
      The search candidate is based on a surface fit result.
      The other is a method to obtain the subpel position at one
      shot based on the same surface fit.
      
      The methods have not been deployed in any speed setting yet.
      
      Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
      4e9c0d2a
  22. 26 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      Skip the partition search for still frames · 1fcbf6ed
      Yunqing Wang authored
      This patch re-enabled the feature in Pengchong's patch
      (commit 12861260). Originally, it
      was turned on while use_lastframe_partitioning > 0(not used anymore).
      Now it was added as a feature, and turned on while speed >= 2.
      As described in the original patch, this feature helps speed up the
      slideshows in YouTube.
      
      Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
      1fcbf6ed
  23. 23 Sep, 2014 2 commits
    • Yaowu Xu's avatar
      Adapt mode based rd_threshold for similar block size · 4a101310
      Yaowu Xu authored
      The rd_thresholds are adaptively changed based on best mode tested.
      It was only changed for the same block size, this commit makes the
      adaptation for similar block sizes too. The commit also made minor
      adjustment and code cleanups.
      
      The impact on encoding time for _ped:
      118089 ms -> 111927 ms
      
      The impact on compression:
      derf:  -0.339%
      stdhd: -0.303%
      
      Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
      4a101310
    • Deb Mukherjee's avatar
      Pruned subpel search for speed 3. · c94b17f4
      Deb Mukherjee authored
      Adds code to return an integer cost list for NSTEP search. Then
      uses it for pruned subpel search in speed 3.
      
      derf: -0.06%
      Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
      [Subject to further testing].
      
      Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
      c94b17f4
  24. 22 Sep, 2014 1 commit
    • Jingning Han's avatar
      Adaptive mode search scheduling · eee904c9
      Jingning Han authored
      This commit enables an adaptive mode search order scheduling scheme
      in the rate-distortion optimization. It changes the compression
      performance by -0.433% and -0.420% for derf and stdhd respectively.
      It provides speed improvement for speed 3:
      
      bus CIF 1000 kbps
      24590 b/f, 35.513 dB, 7864 ms ->
      24696 b/f, 35.491 dB, 7408 ms (6% speed-up)
      
      stockholm 720p 1000 kbps
      8983 b/f, 35.078 dB, 65698 ms ->
      8962 b/f, 35.054 dB, 60298 ms (8%)
      
      old_town_cross 720p 1000 kbps
      11804 b/f, 35.666 dB, 62492 ms ->
      11778 b/f, 35.609 dB, 56040 ms (10%)
      
      blue_sky 1080p 1500 kbps
      57173 b/f, 36.179 dB, 77879 ms ->
      57199 b/f, 36.131 dB, 69821 ms (10%)
      
      pedestrian_area 1080p 2000 kbps
      74241 b/f, 41.105 dB, 144031 ms ->
      74271 b/f, 41.091 dB, 133614 ms (8%)
      
      Change-Id: Iaad28cbc99399030fc5f9951eb5aa7fa633f320e
      eee904c9
  25. 12 Sep, 2014 1 commit
    • Deb Mukherjee's avatar
      Use bigdia search with pruned subpel search · 83c76118
      Deb Mukherjee authored
      Improves function to return sad of integer pels by reusing integer
      pels already visited in the smallest scale.
      Turns on BIGDIA search for speed 4. Also, turns on the
      first version of the pruned subpel search at this speed.
      
      derf: -0.32% (speed 4)
      
      Speed seems to improve by at least 5% but subject to verification.
      
      Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
      83c76118
  26. 11 Sep, 2014 2 commits
    • Jingning Han's avatar
      Remove unused speed feature · 00fe92c2
      Jingning Han authored
      The speed feature that skips compound inter prediction modes was
      subsumed by other speed features and effectively was not in use.
      This commit removes it.
      
      Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
      00fe92c2
    • Jingning Han's avatar
      Refactor to remove speed feature dependency on mode search order · f9f08797
      Jingning Han authored
      This commit refactor the rate-distortion optimization search for
      regular block sizes to remove the speed feature dependency on mode
      search order.
      
      Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
      f9f08797
  27. 09 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      Remove the use of use_lastframe_partitioning at speed 4 · f10d7eed
      Yunqing Wang authored
      The use of use_lastframe_partitioning is totally removed in good-
      quality encoding. Its usage in real-time encoding needs to be
      evaluated to see if it can be removed too.
      
      The Borg tests at speed 4 showed:
      stdhd set: 0.220% psnr gain, 0.166% ssim gain;
      derf set:  0.329% psnr gain, 0.476% ssim gain.
      
      Speed test on selected clips showed 1.54% speedup.(Worst case:
      pedestrian_area_1080p25.y4m, speed loss: 1.5%)
      
      Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
      f10d7eed
  28. 05 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      No longer use use_lastframe_partitioning speed feature · 10921403
      Yunqing Wang authored
      The speedup in rd_pick_partition() function makes it possible
      to drop use_lastframe_partitioning feature. By doing that, we
      achieve good PSNR gain with small speed loss. Also, this makes
      encoding loop less complicated. The code cleanup patch will
      follow.
      
      Borg tests showed:
      1. At speed 2,
         stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
         derf set:  0.262% PSNR gain, 0.276% SSIM gain.
      2. At speed 3,
         stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
         derf set:  0.447% PSNR gain, 0.442% SSIM gain.
      
      The average speed loss over selected test clips is within 1%
      with the worst case of 4%.
      
      Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
      10921403
  29. 03 Sep, 2014 2 commits
    • Yaowu Xu's avatar
      Change last_partition_redo_frequency for speed 3 · 7a337124
      Yaowu Xu authored
      From 3 to 2, which seems to be slightly positive on compression for
      all test sets, also reduces encoding time by 2%-5%, varying on the
      test clips.
      
      Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
      7a337124
    • Yaowu Xu's avatar
      Remove redundant code · cdda17ed
      Yaowu Xu authored
      Change-Id: I453b167f03811a3cd3592089593b3f2823f62ab3
      cdda17ed