1. 31 Jan, 2018 1 commit
    • Jingning Han's avatar
      Conditionally skip transform block partition search · eb8f5e87
      Jingning Han authored
      Speed up recursive transform block partition search. When a txfm
      block is selected as all zero coefficients, skip the search over
      further split partition.
      
      Tested with txk-sel on, this makes the speed 0 / 1 both 10 - 15%
      faster at medium - high target bit-rate range. The coding
      performance change is neutral - 0.011% better for lowres set.
      
      Change-Id: I1247f3d5a33d15bf4bc5f0bcbac2bf1f3e1aca2e
      eb8f5e87
  2. 30 Jan, 2018 1 commit
    • Cheng Chen's avatar
      Skip intra modes in interframe prediction · c30c18f5
      Cheng Chen authored
      Skip remaining intra modes in rd search, when current intra_rd
      is 1.5 times worse than current best_rd.
      
      Performance: -0.01%
      Speed: 5% faster.
      
      Change-Id: I0265fe4618a23d676546b929cd5f694ce9a890f3
      c30c18f5
  3. 26 Jan, 2018 1 commit
    • Cheng Chen's avatar
      Skip txfm search · 3c22260b
      Cheng Chen authored
      Skip transform type search.
      
      Without txk_sel:
      Skip remaining transform type search when all transform blocks inside
      the coding block have eob = 0.
      
      With txk_sel:
      For each transform block, whenever eob = 0, we skip remaining
      transform type search.
      
      Speed impact:
      On low bitrate, 25% speed up.
      On high bitrate, 15-20% speed up.
      
      Performance impact: Google test lowres, 30 frames
      With txk_sel: 0.15% drop
      Without txk_sel: 0.30% drop
      
      Change-Id: I5e8db730a19feec22e378611046b1ce1ab001c85
      3c22260b
  4. 25 Jan, 2018 1 commit
  5. 24 Jan, 2018 1 commit
    • Jingning Han's avatar
      Skip RD search over lst 2/3 frame for non-nearest neighbor mvs · 8db5f17b
      Jingning Han authored
      Skip the rate distortion search over last 2/3 reference frames for
      the reference motion vectors derived from non-nearest neighbors.
      The overall coding performance change is in the noise range - 0.05%
      better. Speed up the encoding process by 20%.
      
      Change-Id: I823b8ca2805ae332f4c9bc8ee255069a82db4331
      8db5f17b
  6. 15 Jan, 2018 1 commit
  7. 11 Jan, 2018 2 commits
  8. 10 Jan, 2018 1 commit
    • Michelle Findlay-Olynyk's avatar
      hash_based_trellis speed feature · fbab0621
      Michelle Findlay-Olynyk authored
      Add speed feature that uses hash tables to
      reuse previously found optimized coefficients
      in av1_optimize_txb. This skips some expensive
      optimize_txb calls.
      
      Currently shows no significant quality
      degredation or speed improvement, and set to off
      by default. Requires hash_me, lv_map and
      lv_map_multi. Adding to speed features required
      changing AV1_COMMON *cm to AV1_COMP *cpi in a
      chain of functions.
      
      Variations that have been tried:
      -varying the maximum eob on which the feature
      activates: 16, 32, 64. 16 currently used. 64
      has best hit rate but longer execution time.
      -varying the data hashed and the length of hashes
      (first hash is 16 bit and based on context data,
      while second hash is 16 bit and based only on
      pre-optimized qcoeff values.)
      -softening the data used for the hashes: ideally
      this would raise the number of hits, without
      compromising quality too much.
      
      Change-Id: I94f22be82f3a46637c0489d512f2e334a307575f
      fbab0621
  9. 04 Jan, 2018 1 commit
  10. 23 Dec, 2017 1 commit
    • Sarah Parker's avatar
      Merge FINAL_PASS_TRELLIS_OPT with DISABLE_TRELLISQ_SEARCH · 792c2ec4
      Sarah Parker authored
      The speed feature FINAL_PASS_TRELLIS_OPT is meant to disable
      optimize_b during the transform search but allow it for the
      final encode of blocks. There was a previously existing macro
      called DISABLE_TRELLISQ_SEARCH, which does the same thing. This
      patch merges the functionality so the macro serves only to enable
      the speed feature.
      
      Change-Id: Ieee70f97f817998b7ca275f6e4647cc89a330ad6
      792c2ec4
  11. 18 Dec, 2017 1 commit
    • Cheng Chen's avatar
      Speed up by dropping some ref frames in compound search · c683bf9b
      Cheng Chen authored
      Record distortion for each single ref in rd. Rank according to their
      distortions. Then in compound search, drop the combination of ref
      frames of the largest and second largest distortions
      
      This patch shows neutral performance on google test using lowres
      with 20 frame.
      
      Local tests show ~5% speed up over baseline.
      
      Change-Id: I722fe66a0551f5f8a044c57c55caa74e46db7ee8
      c683bf9b
  12. 14 Dec, 2017 1 commit
    • Sarah Parker's avatar
      Add option for optimize_b only in final encode · 251c9dcb
      Sarah Parker authored
      This adds a third option to the optimize_coefficients speed
      feature, which turns off optimize_b in the search but uses
      it in the final encode. This option is not currently being
      used by default.
      
      Change-Id: Ic10c9fd8ef16bc453f5e232733cda34d0ddb7692
      251c9dcb
  13. 04 Dec, 2017 1 commit
    • Jingning Han's avatar
      Add the speed feature structure for codec dev · b49c6aea
      Jingning Han authored
      This commit re-structures the speed feature setup for the codec
      development purpose. Instead of progressively reducing encoder
      complexity at the expense of incremental coding loss, we allow a
      separate set of speed features, each corresponds to a certain
      category of coding units:
      
      1 << 0: transform coding
      1 << 1: inter prediction
      1 << 2: intra prediction
      1 << 3: block partition
      1 << 4: loop filters
      1 << 5: rd early skip
      
      [6 - 7] are left open for next adjustment.
      
      It is constructed to facilitate the codec development purpose.
      When working on a coding functions, one could choose to turn on
      one or more less related coding units to speed up the evaluation
      process. For example, to test a transform related experiment, one
      could set
      --dev-sf=2, 6, or 22
      which corresponds to turning on:
      2 - inter prediction speed features,
      6 - both inter / intra speed features,
      22 - inter / intra, and loop filter features.
      
      The goal is to allow faster experimental verification during the
      development process. With the experiment in a stable state, we
      can evaluate its performance in speed 0 at higher confidence level.
      
      Change-Id: Ib46c7dea2d2a60204c399dc01f10262c976adf0d
      b49c6aea
  14. 30 Nov, 2017 1 commit
    • Michelle Findlay-Olynyk's avatar
      Add speed feature use_fast_interpolation_filter... · a3eb912b
      Michelle Findlay-Olynyk authored
      Applies to speed >=1. Instead of searching all dual filter space
      {R,Sm,Sh}x{R,Sm,Sh}, only check {R}x{R,Sm,Sh} followed by
      {R,Sm,Sh}x{best of prev R,Sm,Sh}.
      
      Saves ~6% of cycles by reducing av1_convolve_2d_sse2, with 0.023
      overall psnr drop.
      
      Change-Id: I82d7a6321b335293124a007ff4c87f0e260052e1
      a3eb912b
  15. 29 Nov, 2017 2 commits
  16. 11 Nov, 2017 1 commit
    • Frederic Barbier's avatar
      Remove experimental flag of CDEF · 1aeee2e9
      Frederic Barbier authored
      This experiment has been adopted, we can simplify the code
      by dropping the associated preprocessor conditionals.
      
      Change-Id: I17bd46ebad7796d04fb4065fb36da0e1c4eeaf9b
      1aeee2e9
  17. 09 Nov, 2017 1 commit
  18. 05 Nov, 2017 1 commit
    • Debargha Mukherjee's avatar
      Misc. clean ups / refactor of speed 1 · d7338aa8
      Debargha Mukherjee authored
      With this patch, and the speed settings turned on for speed 1,
      the coding efficiency of speed 1 in default configuration should be
      only a little worse than speed 0, but it should roughly run at
      double the speed.
      
      Specifically, this patch makes various changes to make sure that
      speed 1 behaves exactly the same as speed 0 except for speed settings
      turned on or off in speed_features.c.
      
      This will change the bitstream generated a little for speeds
      1 or higher because of the following reasons:
      1. Removes a hacky speed setting correction factor in firstpass.c
      2. Fast cdef search is moved from speed 1+ to 2+, and a new speed
      feature is added to control that.
      3. Mesh search settings are pushed down one level so that speeds 0
      and 1 use the same settings.
      4. A disable_split_mask feature for animated content previously
      turned on speeds 1+ is moved down to speeds 2+.
      
      Change-Id: I0ec36556f157bdc42c5daa0cfb9518cf7ff65f6b
      d7338aa8
  19. 04 Nov, 2017 1 commit
    • Debargha Mukherjee's avatar
      Speed up of ext-partition types · c4b67641
      Debargha Mukherjee authored
      Search the new horz/vert a/b/4 partitions only if the best so far
      is either oriented along the same direction or split/none, or if
      the rd costs obtained from the previous partition searches indicate
      there is potential in searching these partitions.
      
      This brings about 25-30% speedup at less than 0.1% drop as seen on
      lowres 30 frames.
      
      Change-Id: I6c6c347e06c34ee0ca17479aeeb4075a66dc7e2c
      c4b67641
  20. 03 Nov, 2017 3 commits
    • Debargha Mukherjee's avatar
      Add two levels for selective ref frame sp. feature · 06b40cc3
      Debargha Mukherjee authored
      The first level is turned on for speed 1.
      
      Change-Id: I3dba0f0250b97a25e174cacc2a46ca7f76572c85
      06b40cc3
    • Debargha Mukherjee's avatar
      Cleanup of speed 1 · 203016e8
      Debargha Mukherjee authored
      Removes features for now so that we only add features with very
      small loss.
      
      Change-Id: Ie50f6af2a6cc19dde5f682754a1f0adf4ec957a8
      203016e8
    • Alexander Bokov's avatar
      Introducing a model for pruning the TX size search · 79a37242
      Alexander Bokov authored
      Use a neural-network-based binary classifier to predict the first split
      decision on the highest level of the TX size RD search tree. Depending
      on how confident we are in the prediction we either keep full unmodified
      TX size search or use the largest possible TX size and stop any further
      search.
      
      Average speed-up: 3-4%
      Quality loss (lowres): 0.062%
      Quality loss (midres): 0.018%
      
      Change-Id: I64c0317db74cbeddfbdf772147c43e99e275891f
      79a37242
  21. 02 Nov, 2017 1 commit
    • Sebastien Alaiwan's avatar
      Remove experimental flag of EXT_TX · 3bac9928
      Sebastien Alaiwan authored
      This experiment has been adopted, we can simplify the code
      by dropping the associated preprocessor conditionals.
      
      Change-Id: I02ed47186bbc32400ee9bfadda17659d859c0ef7
      3bac9928
  22. 01 Nov, 2017 1 commit
    • Debargha Mukherjee's avatar
      Add speed feature to reduce tx size search depth · edc7346f
      Debargha Mukherjee authored
      The speed feature simply restricts the number of depths
      searched. Currently it is turned on by default for speeds>=1.
      The coding efficiency impact (tested on lowres 30 frames) seems
      to be ~0.15% and the speedup is in the order of 15%.
      
      Change-Id: I514832bd7df937292875f73d9c9026e49ac576f2
      edc7346f
  23. 31 Oct, 2017 1 commit
    • Debargha Mukherjee's avatar
      Adding a speed feature for tx_size search · 51666866
      Debargha Mukherjee authored
      This patch factors out a function that computes the rd cost for
      a given transform type given the transform partition already
      computed. This is then used to develop a speed feature where the
      transform size search disables trellis optimization but once the
      transform sizes are decided, a final search is conducted with
      optimization turned back on.
      This patch does not change anything in speed 0 yet.
      
      Change-Id: I30acfc5e2dd353d711e5f4260d5b344847b03ade
      51666866
  24. 30 Oct, 2017 2 commits
    • Jingning Han's avatar
      Speed up inter frame rate-distortion optimization · cf842ad2
      Jingning Han authored
      The frame marker system supports one to map the reference frame
      index into the natural order. It allows direct checking on the
      efficacy of the reference frames given their relative locations
      with respect to the current coding frame.
      
      This commit uses such property to filter out reference frames
      less likely to contribute coding gains from the rate-distortion
      optimization process. For example, it takes out the check on
      last2 / 3 frames, when their actual location is further away
      from the golden frame.
      
      The AWCY results show 0.6% performance regression. The encoding
      speed gets doubled.
      
      To use the speed up, one needs to turn on frame-marker experiment
      before we turn it on by default, and enable selective_ref_frame
      entry in the speed feature.
      
      Change-Id: Ifb03ed90acd980bbc7ff1c2e17982e21e68d2588
      cf842ad2
    • Sebastien Alaiwan's avatar
      Remove experimental flag of GLOBAL_MOTION · 48795807
      Sebastien Alaiwan authored
      This experiment has been adopted, we can simplify the code
      by dropping the associated preprocessor conditionals.
      
      Change-Id: I9c9d6ef5317798cbf237307a9754fe7e03bdda47
      48795807
  25. 17 Oct, 2017 1 commit
    • Alexander Bokov's avatar
      Improving the model for pruning the TX type search · 0c7eb10d
      Alexander Bokov authored
      Introduces two new TX type pruning modes that provide better
      speed-quality trade-off compared to the existing ones. A shallow
      neural network with one hidden layer trained separately for each
      block size is used as a prediction model. The new modes differ in
      thresholds applied to the output of the neural net, so that they
      prune different number of TX types on average.
      
      Owing to relatively low quality loss PRUNE_2D_ACCURATE is used
      by default, regardless of speed settings. Starting with speed
      setting of 3 we switch to PRUNE_2D_FAST mode to get better
      speed-up.
      
      Evaluation results:
      ----------------------------------------------------------
      Prune mode | Avg. speed-up | Quality loss | Quality loss
                 |(high bitrates)|   (lowres)   |   (midres)
      ----------------------------------------------------------
      PRUNE_ONE  |     18.7%     |    0.396%    |    0.308%
      ----------------------------------------------------------
      PRUNE_TWO  |     27.2%     |    0.439%    |    0.389%
      ----------------------------------------------------------
      PRUNE_2D_  |     18.8%     |    0.032%    |    0.063%
      ACCURATE   |               |              |
      ----------------------------------------------------------
      PRUNE_2D_  |     33.3%     |    0.504%    |     ---
      FAST       |               |              |
      
      Change-Id: Ibd59f52eef493a499e529d824edad267daa65f9d
      0c7eb10d
  26. 16 Oct, 2017 1 commit
  27. 06 Oct, 2017 1 commit
  28. 02 Oct, 2017 1 commit
  29. 28 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] New UV_PREDICTION_MODE for CFL · 6e1cd787
      Luc Trudeau authored
      CfL is now an independent mode.
      
      Results on Subset1 (Compared to 4266a7ed with CFL enabled)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1645 | -0.4017 |  0.2475 |  -0.1851 | -0.2179 | -0.2338 |    -0.2897
      
      Change-Id: I2e86e7ea7bfc12bb1d763e70a136ca992d57a3c5
      6e1cd787
  30. 26 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] UV_PREDICTION_MODE · d6d9eeeb
      Luc Trudeau authored
      A separate prediction mode struct is added to allow
      for uv-only modes (like CfL). Note: CfL will be
      added as a separate mode in an upcoming commit.
      
      Results on Subset1 (Compared to 4266a7ed with CfL enabled)
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: Ie80711c641c97f745daac899eadce6201ed97fcc
      d6d9eeeb
  31. 12 Jul, 2017 1 commit
    • Rupert Swarbrick's avatar
      ext-partition-types: Add 4:1 partitions · 93c39e91
      Rupert Swarbrick authored
      This patch adds support for 4:1 rectangular blocks to various common
      data arrays, and adds new partition types to the EXT_PARTITION_TYPES
      experiment which will use them.
      
      This patch has the following restrictions, which can be lifted in
      future patches:
      
        * ext-partition-types is incompatible with fp_mb_stats and supertx
          for the moment
      
        * Currently only 32x32 superblocks can use the new partition types
      
      There's a slightly odd restriction about when we allow
      PARTITION_HORZ_4 or PARTITION_VERT_4. Since these both live in the
      EXT_PARTITION_TYPES CDF, read_partition() can only return them if both
      has_rows and has_cols is true. This means that at least half of the
      width and height of the block must be visible. It might be nice to
      relax that restriction but that would imply a change to how we encode
      partition types, which seems already to be in a state of flux, so
      maybe it's better to wait until that has settled down.
      
      Change-Id: Id7fc3fd0f762f35f63b3d3e3bf4e07c245c7b4fa
      93c39e91
  32. 08 Jun, 2017 1 commit
  33. 09 May, 2017 1 commit
  34. 05 May, 2017 1 commit
    • Debargha Mukherjee's avatar
      Add speed feature to control global motion compute · 2a9d746c
      Debargha Mukherjee authored
      Adds a speed feature to control which references to use
      to compute global motion.
      Also adds logic to not compute duplicate sets of
      parameters when reference frames point to the same
      buffers.
      Includes some renaming of functions to set good speed
      features to make things clearer.
      
      Change-Id: I641d33441fde98af18cad8d4db49cf7d5d153ead
      2a9d746c
  35. 02 May, 2017 1 commit
    • Alex Converse's avatar
      intrabc: Relax exhaustive mesh search constraints · 3a169875
      Alex Converse authored
      Overall:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1042 | -0.0564 | -0.0941 |  -0.1142 | -0.1115 | -0.1071 |    -0.0795
      
      On wikipedia_420.y4m:
         PSNR | PSNR HVS |    SSIM | CIEDE 2000 | PSNR Cb | PSNR Cr | MS SSIM
      -2.9491 |  -3.3248 | -3.2374 |    -2.8735 | -2.9295 | -2.4755 | -3.3194
      
      Change-Id: Icf95d4afcb13118db41d51b5f7fb80e48908509a
      3a169875