1. 09 Oct, 2014 1 commit
    • Deb Mukherjee's avatar
      Subpel search cleanups and enhancements · d78dbff0
      Deb Mukherjee authored
      - Some fixes to surface fit.
      - Returns variance function as cost rather than sad in the
        pattern search and diamond search functions. Only
        vp9_pattern_search_sad function used in bigdia search
        uses sad as integer 1-away costs.
      - Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
      
      Results:
      derf [Speed 3]: About +0.036% in coding efficiency without any
      discernible speed loss.
      derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
      derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
      
      Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
      d78dbff0
  2. 03 Oct, 2014 1 commit
    • Jingning Han's avatar
      Rework partition search skip scheme · bb260d90
      Jingning Han authored
      This commit enables the encoder to skip split partition search if
      the bigger block size has all non-zero quantized coefficients in low
      frequency area and the total rate cost is below a certain threshold.
      It logarithmatically scales the rate threshold according to the
      current block size. For speed 3, the compression performance loss:
      derf  -0.093%
      stdhd -0.066%
      
      Local experiments show 4% - 20% encoding speed-up for speed 3.
      blue_sky_1080p, 1500 kbps
      51051 b/f, 35.891 dB, 67236 ms ->
      50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
      
      old_town_cross_720p, 1500 kbps
      14431 b/f, 36.249 dB, 57687 ms ->
      14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
      
      pedestrian_area_1080p, 1500 kbps
      50812 b/f, 40.124 dB, 100439 ms ->
      50755 b/f, 40.118 dB,  96549 ms (4% speed-up)
      
      mobile_calendar_720p, 1000 kbps
      10352 b/f, 35.055 dB, 51837 ms ->
      10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
      
      Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
      bb260d90
  3. 29 Sep, 2014 1 commit
    • Deb Mukherjee's avatar
      Adds two new subpel search methods · 4e9c0d2a
      Deb Mukherjee authored
      One is a more aggressive version of the pruned subpel tree
      search where only a single halfpel candidate is searched.
      The search candidate is based on a surface fit result.
      The other is a method to obtain the subpel position at one
      shot based on the same surface fit.
      
      The methods have not been deployed in any speed setting yet.
      
      Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
      4e9c0d2a
  4. 26 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      Skip the partition search for still frames · 1fcbf6ed
      Yunqing Wang authored
      This patch re-enabled the feature in Pengchong's patch
      (commit 12861260). Originally, it
      was turned on while use_lastframe_partitioning > 0(not used anymore).
      Now it was added as a feature, and turned on while speed >= 2.
      As described in the original patch, this feature helps speed up the
      slideshows in YouTube.
      
      Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
      1fcbf6ed
  5. 23 Sep, 2014 2 commits
    • Yaowu Xu's avatar
      Adapt mode based rd_threshold for similar block size · 4a101310
      Yaowu Xu authored
      The rd_thresholds are adaptively changed based on best mode tested.
      It was only changed for the same block size, this commit makes the
      adaptation for similar block sizes too. The commit also made minor
      adjustment and code cleanups.
      
      The impact on encoding time for _ped:
      118089 ms -> 111927 ms
      
      The impact on compression:
      derf:  -0.339%
      stdhd: -0.303%
      
      Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
      4a101310
    • Deb Mukherjee's avatar
      Pruned subpel search for speed 3. · c94b17f4
      Deb Mukherjee authored
      Adds code to return an integer cost list for NSTEP search. Then
      uses it for pruned subpel search in speed 3.
      
      derf: -0.06%
      Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
      [Subject to further testing].
      
      Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
      c94b17f4
  6. 22 Sep, 2014 1 commit
    • Jingning Han's avatar
      Adaptive mode search scheduling · eee904c9
      Jingning Han authored
      This commit enables an adaptive mode search order scheduling scheme
      in the rate-distortion optimization. It changes the compression
      performance by -0.433% and -0.420% for derf and stdhd respectively.
      It provides speed improvement for speed 3:
      
      bus CIF 1000 kbps
      24590 b/f, 35.513 dB, 7864 ms ->
      24696 b/f, 35.491 dB, 7408 ms (6% speed-up)
      
      stockholm 720p 1000 kbps
      8983 b/f, 35.078 dB, 65698 ms ->
      8962 b/f, 35.054 dB, 60298 ms (8%)
      
      old_town_cross 720p 1000 kbps
      11804 b/f, 35.666 dB, 62492 ms ->
      11778 b/f, 35.609 dB, 56040 ms (10%)
      
      blue_sky 1080p 1500 kbps
      57173 b/f, 36.179 dB, 77879 ms ->
      57199 b/f, 36.131 dB, 69821 ms (10%)
      
      pedestrian_area 1080p 2000 kbps
      74241 b/f, 41.105 dB, 144031 ms ->
      74271 b/f, 41.091 dB, 133614 ms (8%)
      
      Change-Id: Iaad28cbc99399030fc5f9951eb5aa7fa633f320e
      eee904c9
  7. 12 Sep, 2014 1 commit
    • Deb Mukherjee's avatar
      Use bigdia search with pruned subpel search · 83c76118
      Deb Mukherjee authored
      Improves function to return sad of integer pels by reusing integer
      pels already visited in the smallest scale.
      Turns on BIGDIA search for speed 4. Also, turns on the
      first version of the pruned subpel search at this speed.
      
      derf: -0.32% (speed 4)
      
      Speed seems to improve by at least 5% but subject to verification.
      
      Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
      83c76118
  8. 11 Sep, 2014 2 commits
    • Jingning Han's avatar
      Remove unused speed feature · 00fe92c2
      Jingning Han authored
      The speed feature that skips compound inter prediction modes was
      subsumed by other speed features and effectively was not in use.
      This commit removes it.
      
      Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
      00fe92c2
    • Jingning Han's avatar
      Refactor to remove speed feature dependency on mode search order · f9f08797
      Jingning Han authored
      This commit refactor the rate-distortion optimization search for
      regular block sizes to remove the speed feature dependency on mode
      search order.
      
      Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
      f9f08797
  9. 09 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      Remove the use of use_lastframe_partitioning at speed 4 · f10d7eed
      Yunqing Wang authored
      The use of use_lastframe_partitioning is totally removed in good-
      quality encoding. Its usage in real-time encoding needs to be
      evaluated to see if it can be removed too.
      
      The Borg tests at speed 4 showed:
      stdhd set: 0.220% psnr gain, 0.166% ssim gain;
      derf set:  0.329% psnr gain, 0.476% ssim gain.
      
      Speed test on selected clips showed 1.54% speedup.(Worst case:
      pedestrian_area_1080p25.y4m, speed loss: 1.5%)
      
      Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
      f10d7eed
  10. 05 Sep, 2014 1 commit
    • Yunqing Wang's avatar
      No longer use use_lastframe_partitioning speed feature · 10921403
      Yunqing Wang authored
      The speedup in rd_pick_partition() function makes it possible
      to drop use_lastframe_partitioning feature. By doing that, we
      achieve good PSNR gain with small speed loss. Also, this makes
      encoding loop less complicated. The code cleanup patch will
      follow.
      
      Borg tests showed:
      1. At speed 2,
         stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
         derf set:  0.262% PSNR gain, 0.276% SSIM gain.
      2. At speed 3,
         stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
         derf set:  0.447% PSNR gain, 0.442% SSIM gain.
      
      The average speed loss over selected test clips is within 1%
      with the worst case of 4%.
      
      Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
      10921403
  11. 03 Sep, 2014 2 commits
    • Yaowu Xu's avatar
      Change last_partition_redo_frequency for speed 3 · 7a337124
      Yaowu Xu authored
      From 3 to 2, which seems to be slightly positive on compression for
      all test sets, also reduces encoding time by 2%-5%, varying on the
      test clips.
      
      Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
      7a337124
    • Yaowu Xu's avatar
      Remove redundant code · cdda17ed
      Yaowu Xu authored
      Change-Id: I453b167f03811a3cd3592089593b3f2823f62ab3
      cdda17ed
  12. 29 Aug, 2014 1 commit
    • Jingning Han's avatar
      Skip intra mode tests depending on inter residuals · 4282955e
      Jingning Han authored
      This commit allows encoder to skip intra coding mode test, when
      the known inter residual is less than the source variance. It
      reduces the runtime of speed 3 for test clips:
      bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
      pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.
      
      The compression performance is down by
      derf   -0.36%
      stdhd  -0.25%
      
      Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
      4282955e
  13. 28 Aug, 2014 2 commits
    • Yunqing Wang's avatar
      Early termination in encoding partition search · 4d2c3769
      Yunqing Wang authored
      In the partition search, the encoder checks all possible
      partitionings in the superblock's partition search tree.
      This patch proposed a set of criteria for partition search
      early termination, which effectively decided whether or
      not to terminate the search in current branch based on the
      "skippable" result of the quantized transform coefficients.
      The "skippable" information was gathered during the
      partition mode search, and no overhead calculations were
      introduced.
      
      This patch gives significant encoding speed gains without
      sacrificing the quality.
      
      Borg test results:
      1. At speed 1,
         stdhd set: psnr: +0.074%, ssim: +0.093%;
         derf set:  psnr: -0.024%, ssim: +0.011%;
      2. At speed 2,
         stdhd set: psnr: +0.033%, ssim: +0.100%;
         derf set:  psnr: -0.062%, ssim: +0.003%;
      3. At speed 3,
         stdhd set: psnr: +0.060%, ssim: +0.190%;
         derf set:  psnr: -0.064%, ssim: -0.002%;
      4. At speed 4,
         stdhd set: psnr: +0.070%, ssim: +0.143%;
         derf set:  psnr: -0.104%, ssim: +0.039%;
      
      The speedup ranges from several percent to 60+%.
                       speed1    speed2    speed3    speed4
      (1080p, 100f):
      old_town_cross:  48.2%     23.9%     20.8%     16.5%
      park_joy:        11.4%     17.8%     29.4%     18.2%
      pedestrian_area: 10.7%      4.0%      4.2%      2.4%
      (720p, 200f):
      mobcal:          68.1%     36.3%     34.4%     17.7%
      parkrun:         15.8%     24.2%     37.1%     16.8%
      shields:         45.1%     32.8%     30.1%      9.6%
      (cif, 300f)
      bus:              3.7%     10.4%     14.0%      7.9%
      deadline:        13.6%     14.8%     12.6%     10.9%
      mobile:           5.3%     11.5%     14.7%     10.7%
      
      Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
      4d2c3769
    • Deb Mukherjee's avatar
      Updates vp9_pattern search to return integer sads · 04b100b2
      Deb Mukherjee authored
      Updates the vp9_pattern_search function to return integer one-away
      neighbors' sad values, for subsequent use in speeding up the
      sub-pel search. Also, removes code for the do_refine option
      which is not being used currently.
      Updates the integer and subpel functions to pass in a 5-element
      sad list for output or input.
      
      A new pruned sub-pel search algorithm is implemented that uses
      the sad returned from the integer pel search. But it is not
      deployed yet.
      
      Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
      04b100b2
  14. 26 Aug, 2014 1 commit
    • Yaowu Xu's avatar
      add a new interp filter search strategy. · 1144fee3
      Yaowu Xu authored
      This commit addes a new strategy to reduce the search for optimal
      interpolation filter type. The encoder counts and store how many each
      filter type is selected and used for each of the reference frames.
      A filter type that is rarely used for all three reference frames is
      masked out to avoid computation.
      
      The impact on compression is neglectible:
      -0.02% on derf
      +0.02% on stdhd
      
      Encoding time is seen to reduce by 2~3%.
      
      Change-Id: Ibafa92291b51185de40da513716222db4b230383
      1144fee3
  15. 22 Aug, 2014 1 commit
  16. 21 Aug, 2014 1 commit
  17. 19 Aug, 2014 1 commit
  18. 18 Aug, 2014 2 commits
    • Yunqing Wang's avatar
      Add early termination in transform size search · ba70f160
      Yunqing Wang authored
      In the full-rd transform size search, we go through all transform
      sizes to choose the one with best rd score. In this patch, an
      early termination is added to stop the search once we see that the
      smaller size won't give better rd score than the larger size. Also,
      the search starts from largest transform size, then goes down to
      smallest size.
      
      A speed feature tx_size_search_breakout is added, which is turned off
      at speed 0, and on for other speeds. The transform size search is
      turned on at speed 1.
      
      Borg test results:
      1. At speed 1,
         derf set: psnr gain: 0.618%, ssim gain: 0.377%;
         stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
         No noticeable speed change.
      3. At speed 2,
         derf set: psnr loss: 0.157%, ssim loss: 0.175%;
         stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
         speed gain: ~4%.
      
      Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
      ba70f160
    • Jingning Han's avatar
      Speed up mode search depending on relative ref frame position · 6a464eca
      Jingning Han authored
      This commit enables the encoder to record the location of the
      center frame to generate alter reference frame. It then allows to
      skip checking prediction modes of other reference frame types when
      it comes to encode this frame.
      
      The speed 3 runtime is reduced for the test sequences:
      bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
      pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
      speed-up.
      
      No compression performance change observed.
      
      Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
      6a464eca
  19. 15 Aug, 2014 2 commits
    • Pengchong Jin's avatar
      Add a speed feature to give the tighter search range · eca93642
      Pengchong Jin authored
      Add a speed feature to give the tighter partition search
      range. Before partition search, calculate the histogram
      of the partition sizes of the left, above and previous
      co-located blocks of the current block. If the variance of
      observed partition sizes is small enough, adjust the search
      range around the mean partition size, which will be tigher.
      
      The feature is currently turned on at speed 2. Experiments on
      sample youtube clips show on average the runtime is reduced
      by 3-7%.
      
      For hard stdhd clips:
      park_joy_1080p @ 15000kbps:       509251 ms -> 491953 ms (3.3%)
      pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
      
      The PSNR performance is changed:
      derf: -0.112%
      yt:   -0.099%
      hd:   -0.090%
      stdhd:-0.102%
      
      Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
      eca93642
    • Yunqing Wang's avatar
      Remove a unused speed feature · 28b1437d
      Yunqing Wang authored
      Removed disable_split_var_thresh, which is not used anymore.
      
      Change-Id: I50119b150442e1571157433b5effc6aae0dbe0fd
      28b1437d
  20. 14 Aug, 2014 2 commits
    • Yaowu Xu's avatar
      Mask out H_PRED and V_PRED for 32x32 blocks · 5966586a
      Yaowu Xu authored
      Change-Id: I2847af5062b5fa320629fcabb9fa6b23ba3e5513
      5966586a
    • Yaowu Xu's avatar
      Set max_intra_bsize to 32x32 · 4d6d0613
      Yaowu Xu authored
      At --good and speed 3 or above for resolution less than 720p. This
      disables the tests for 64x64 intra prediction modes. Encoding time
      reduction is about 1%.
      
      Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f
      4d6d0613
  21. 13 Aug, 2014 2 commits
    • Jingning Han's avatar
      Allow full coeff probability model and cost update · ccef8842
      Jingning Han authored
      This commit moves the simplified coefficient probability model
      and costing update to speed 4, and turns on chessboard pattern
      mode search for sub 720p sequences. The overall coding performance
      of speed 3 is improved:
      derf  0.889%
      stdhd 1.744%
      
      The speed 3 runtime for test sequences are improved:
      bus cif at 1000 kbps 9823 ms -> 9642 ms
      pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms
      
      Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d
      ccef8842
    • Jingning Han's avatar
      Enable motion field based mode seach skip · 0daadeb6
      Jingning Han authored
      This commit allows the encoder to check the above and left neighbor
      blocks' reference frames and motion vectors. If they are all
      consistent, skip checking the NEARMV and ZEROMV modes. This is
      enabled in speed 3. The coding performance is improved:
      
      pedestrian area 1080p at 2000 kbps,
      from  74773 b/f, 41.101 dB, 198064 ms
      to    74795 b/f, 41.099 dB, 193078 ms
      
      park joy 1080p at 15000 kbps,
      from 290727 b/f, 30.640 dB, 609113 ms
      to   290558 b/f, 30.630 dB, 592815 ms
      
      Overall compression performance of speed 3 is changed
      derf  -0.171%
      stdhd -0.168%
      
      Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
      0daadeb6
  22. 12 Aug, 2014 1 commit
    • Jim Bankoski's avatar
      intra blocks disallowed inadvertently · 5c55202c
      Jim Bankoski authored
      At speed 6 the smallest partitioning was 16x16 and biggest
      intra block was 8x8, essentially disallowing all intra blocks
      which produces ugly artifacts when revealing new video.
      
      Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf
      5c55202c
  23. 08 Aug, 2014 2 commits
    • Dmitry Kovalev's avatar
      Simplifying vp9_set_speed_features() function. · cd1fbc67
      Dmitry Kovalev authored
      Change-Id: I3e67230690b81ef54ef48ae26107fe7bc880ab8e
      cd1fbc67
    • Dmitry Kovalev's avatar
      Moving pass from VP9_COMP to VP9EncoderConfig. · 91c2f1e4
      Dmitry Kovalev authored
      We had a very complicated way to initialize cpi->pass from
      cfg->g_pass:
      switch (cfg->g_pass) {
        case VPX_RC_ONE_PASS:
          oxcf->mode = ONE_PASS_GOOD;
          break;
        case VPX_RC_FIRST_PASS:
          oxcf->mode = TWO_PASS_FIRST;
          break;
        case VPX_RC_LAST_PASS:
          oxcf->mode = TWO_PASS_SECOND_BEST;
          break;
      }
      
      cpi->pass = get_pass(oxcf->mode).
      
      Now pass is moved to VP9EncoderConfig and initialization is simple:
      switch (cfg->g_pass) {
        case VPX_RC_ONE_PASS:
          oxcf->pass = 0;
          break;
        case VPX_RC_FIRST_PASS:
          oxcf->pass = 1;
          break;
        case VPX_RC_LAST_PASS:
          oxcf->pass = 2;
          break;
      }
      
      Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
      91c2f1e4
  24. 05 Aug, 2014 1 commit
  25. 30 Jul, 2014 1 commit
    • Jingning Han's avatar
      Chessboard pattern partition search · ca2dcb7f
      Jingning Han authored
      This commit enables a chessboard pattern constrained partition
      search for 720p and above resolutions. The scheme applies stricter
      partition search to alternative blocks based on its above/left
      neighboring blocks' partition range, as well as that of the
      collocated blocks in the previous frame. It is currently turned
      on at 16x16 block size level. The chessboard pattern is flipped
      per coding frame.
      
      The speed 3 runtime is reduced:
      park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
      pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)
      
      The compression performance is changed:
      hd     -0.223%
      stdhd  -0.295%
      
      Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
      ca2dcb7f
  26. 22 Jul, 2014 1 commit
    • Jingning Han's avatar
      Enable chessboard inter prediction filter type search · 54ad0958
      Jingning Han authored
      This commit enables a chessboard pattern prediction filter type
      search scheme for rate-distortion optimization speed-up. For the
      inferred motion vector modes, the encoder can re-use its above/left
      neighbor blocks' prediction filter type and skip a full test on
      all possible filter types. Such operation is turned on/off
      alternatively in a chessboard manner.
      
      It is turned on in speed 3. For test clip pedestrian 1080p, the
      runtime is reduced from 231500 ms -> 221700 ms. The compression
      performance is changed:
      derf:  -0.147%
      yt:    -0.134%
      hd:    -0.079%
      stdhd: -0.220%
      
      Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
      54ad0958
  27. 21 Jul, 2014 1 commit
    • Jingning Han's avatar
      Turn on adaptive pred filter scheme for sub8x8 below 720p · ffd948bb
      Jingning Han authored
      For sequences of resolution below 720p, the encoder will check
      intra prediction modes and inter prediction modes from LAST_FRAME.
      This commit turns on adaptive prediction filter scheme for sub8x8
      blocks, where inter prediction modes are enabled. For the test
      sequence bus at CIF, the speed 2 runtime goes down from 17879 ms
      to 16783 ms, i.e., 6% speed up. The compression performance of
      derf set is down by -0.128%.
      
      Change-Id: I01d5321a5ceab4e0666ac5be56c52d896c7a8d45
      ffd948bb
  28. 16 Jul, 2014 1 commit
  29. 15 Jul, 2014 1 commit
    • Yaowu Xu's avatar
      Added a rt speed 12 · faa686bb
      Yaowu Xu authored
      We target this speed to achieve similar encoding speed and better
      compression than vp8 rt mode with cpu-used at -12.
      
      Change-Id: Ic1bb4371c81a17ea80e83459c1cbf4c09a3498e8
      faa686bb
  30. 11 Jul, 2014 1 commit
  31. 09 Jul, 2014 1 commit
    • Yunqing Wang's avatar
      Remove repetitive code in mcomp.c · a581da21
      Yunqing Wang authored
      Deleted vp9_find_best_sub_pixel_comp_tree(), and combined it in
      vp9_find_best_sub_pixel_tree().
      
      Change-Id: Ifb25763c8b19822df5537cc1daa76ce88dc3b056
      a581da21