1. 25 Jan, 2018 11 commits
    • Frank Bossen's avatar
      Do MV scaling on the fly for memory and run time reduction · 7b6bb947
      Frank Bossen authored
      This change is not normative and produces the same results as before.
      TPL_MV_REF data structure is about 5x smaller.
      Observed overall decoder run time reduction is about 4%.
      No observed change in encoder run time.
      
      Change-Id: Id68a492bac3bf28f48b7ceeedf85cd29981238ee
      7b6bb947
    • Tom Finegan's avatar
      Add obu_sizing experiment. · 41150ad4
      Tom Finegan authored
      Writes PRE_OBU_SIZE_BYTES (currently 4) bytes padded unsigned LEB128
      encoded integers in OBU size fields when enabled:
      
      $ cmake path/to/aom -DCONFIG_OBU=1 -DCONFIG_OBU_SIZING=1 && cmake --build .
      
      Requires CONFIG_OBU.
      
      BUG=aomedia:1125
      
      Change-Id: I4d184ef0c8587d24e9c8c3e63237ea5003386c6a
      41150ad4
    • Frederic Barbier's avatar
      Give skip_mode priority over segmentation · b3bb318d
      Frederic Barbier authored
      BUG=aomedia:1266
      
      Change-Id: I7612e379aa7c63da56e975e95cd7266cd1f8c68d
      b3bb318d
    • Yue Chen's avatar
      Clean up and rework rates in motion_mode_rd() · c5024215
      Yue Chen authored
      Remove all *bmc variables, which were used to record basic motion
      search results (no advanced masked compound) when obmc and warped
      motion modes were allowed to work with compound ref.
      Remove switchable rate that is passed in to it, since in most
      motion modes, we need to recalculate the cost based on motion_mode
      and the refined mv. This change slightly improve the rd perf.
      
      Performance change: -0.024%
      
      Change-Id: I4afe0927e97cc7e7251022957f7665ed3032079c
      c5024215
    • Angie Chiang's avatar
      Simplify txfm table · 0c7b8d84
      Angie Chiang authored
      Instead of listing all possible stage_range,
      we use set_fwd_txfm_non_scale_range() to generate 2d stage_range
      from 1d stage_range.
      
      This will reduce the complexity of txfm table significantly.
      
      This is a lossless change.
      The coding performance isn't changed.
      The txfm config is exactly the same as it was before.
      
      Change-Id: Ibd1d9e53772bb928faaeecc98d81cbc8f38b27ed
      0c7b8d84
    • Angie Chiang's avatar
      Refactor buf_offset in av1_inv_txfm2d.c · 0822557b
      Angie Chiang authored
      Change-Id: I73d1d15ab678242737432064d203c476057286ed
      0822557b
    • Zoe Liu's avatar
      Simplify context identification for coding ref frames · fa8bad19
      Zoe Liu authored
      This patch simply aggregates the checking on the counts of certain
      reference frames in the neighboring above and left blocks. It does
      not incur any coding performance change.
      
      Change-Id: I59a962ba95e7ab16731ce97371ec5709a582a0ba
      fa8bad19
    • Hui Su's avatar
      Move av1_search_txk_type() to rdopt.c · 4a5c6cf8
      Hui Su authored
      Change-Id: I4f9d014324b35e30f25cae5fa570620249640cf6
      4a5c6cf8
    • Hui Su's avatar
      Reduce the size of av1_prob_cost[] · c1cd5194
      Hui Su authored
      Only half of it was necessary.
      
      Change-Id: I0b5fc9ae6a17f5d812e10ee903a12f23f1377d8e
      c1cd5194
    • Debargha Mukherjee's avatar
      Do not fail on deprecated --good option · 67adf42f
      Debargha Mukherjee authored
      Temporary quick fix for broken compatibility with testing
      infrastructure.
      
      Change-Id: I9af93690dd107fc79a79062f4d6ea7c53c8b4798
      67adf42f
    • Tom Finegan's avatar
      Return int from av1_pack_bitstream(). · e4099e38
      Tom Finegan authored
      - Stop relying on asserts for error checking.
      - Update callers to check for and return errors where required.
      
      Change-Id: Id6a39b14397394b85aaa9dc8b168f7a26f04919b
      e4099e38
  2. 24 Jan, 2018 18 commits
    • Cheng Chen's avatar
      Record total rate cost in trellis · 82775f61
      Cheng Chen authored
      Record total rate cost when computing trellis optimization.
      Reduce redundant rate computation in later stages.
      
      Speed impact: ~6% speed up
      Coding performance should not be affected.
      
      Change-Id: I9e940a2d126bb55930fcf22ea04d061eee1fc944
      82775f61
    • Andrey Norkin's avatar
      Adding timing info to sequence headers · 28e9ce29
      Andrey Norkin authored
      Change-Id: I0fdb09499196e02709e067f690dff71146ee5114
      28e9ce29
    • Imdad Sardharwalla's avatar
      Added SSE4.1 and AVX2 implementations of FAST SGR. · 9d234571
      Imdad Sardharwalla authored
      The self-guided filter speed tests show that:
      - The SSE4.1 implementation of FAST SGR is ~35% faster than the corresponding
        implementation of SGR;
      - The AVX2 implementation of FAST SGR is ~28% faster than the corresponding
        implementation of SGR.
      
      Change-Id: Iecdc1f8cee79500084c71d06dbb02d804272aa99
      9d234571
    • Debargha Mukherjee's avatar
      Add a config flag/code for fast sgr computation · ed5e9673
      Debargha Mukherjee authored
      Adds an experiment for fast sgr computation where for the r=2
      filter, computation of the A, B stats are computed for every
      other row and averaged in between.
      The motivation is to improve software performance with hopefully
      minimal loss.
      
      Change-Id: Ie36687826524dc18c1fbb7f6becff244187bf8da
      ed5e9673
    • David Barker's avatar
      [loop-restoration, bugfix] Restrict sampling of deblocked pixels · dff901ff
      David Barker authored
      There is a special case with certain frame heights, where we
      end up with a loop restoration stripe which ends 1px above the
      crop border.
      
      Previously this case was handled in quite an ugly way, which also
      disagrees with the spec (+ isn't great for hardware). This patch
      changes things to match the spec.
      
      Specifically, the old method was to sometimes upscale one extra
      row of deblocked pixels so that we could always have a 2px
      "below" border for each processing stripe. The new method is to
      only use rows inside the crop border, and to duplicate them if
      necessary.
      
      BUG=aomedia:1264
      
      Change-Id: Idf8ab510e1091dc3f5b257de60e16bca214d8dc4
      dff901ff
    • Sean DuBois's avatar
      Remove deadline · 47cc2559
      Sean DuBois authored
      BUG=aomedia:13
      
      Change-Id: I9df343f4a6a809b09446ff1f2083c38771ab068b
      47cc2559
    • Yaowu Xu's avatar
      Set input_shift properly · 913867b4
      Yaowu Xu authored
      Profile 0 now supports 10 bit, therefore no longer means input_shift
      at 0.
      
      Change-Id: Idae429b88ee5c073ee6e939a88d569c5ffde2b0d
      913867b4
    • Angie Chiang's avatar
      Simplify cos_bit setting in txfm · d4327bce
      Angie Chiang authored
      Move cos_bit from txfm 1d cfg to 2d cfg
      Each txfm stage only uses one cos_bit
      
      This is a lossless change and it speeds up encoder by 2%
      
      Change-Id: I45d398761e4729b8c4c37729571fe3765cb0c83f
      d4327bce
    • Frederic Barbier's avatar
      Cleanup redundant assertion · dc3d916b
      Frederic Barbier authored
      Change-Id: I6532e20c958d5bf6f6d73a6f076664e1b74ba055
      dc3d916b
    • Jingning Han's avatar
      Skip RD search over lst 2/3 frame for non-nearest neighbor mvs · 8db5f17b
      Jingning Han authored
      Skip the rate distortion search over last 2/3 reference frames for
      the reference motion vectors derived from non-nearest neighbors.
      The overall coding performance change is in the noise range - 0.05%
      better. Speed up the encoding process by 20%.
      
      Change-Id: I823b8ca2805ae332f4c9bc8ee255069a82db4331
      8db5f17b
    • Zoe Liu's avatar
      Use split and horz/vert to predict horzA/B/vertA/B · 6001fb05
      Zoe Liu authored
      In rd_pick_partition(), the first one or two blocks for the partition
      types HORZ_A, HORZ_B, VERT_A, and VERT_B may be already evaluated,
      during the evaluation of SPLIT, HORZ, and VERT. This patch saves the
      RD pick mode results and tries to reuse them to remove the duplicate
      RD mode evaluation operations.
      
      This patch should not incur any coding performance loss.
      
      Testing on a few lowres frames: when CFL is off, this patch obtains
      >10% encoder speedup.
      
      Change-Id: I932e233bc93873de62a88230254df44494236dde
      6001fb05
    • Yushin Cho's avatar
      Add AVX2 implementation for motion compensation function · 54cd8d76
      Yushin Cho authored
      AVX2 Code for av1_convolve_2d_sr_c()
      
      Change-Id: Id8a2192b78bbb2c6ac22da3134a7c256941985c8
      54cd8d76
    • Johann's avatar
      remove deprecated cmake flags · ec254b77
      Johann authored
      These flags provided compatibility with configure but have
      no effect in cmake builds.
      
      Change-Id: I2dbb71d9aeaae759cc3c4a46917e3840d696328d
      ec254b77
    • Johann's avatar
      remove stale .gitignore entries · 4a9eda2c
      Johann authored
      In-tree builds are explicitly disallowed by cmake. Any of these files
      showing up in the source tree should be cause for concern.
      
      BUG=aomedia:1254
      
      Change-Id: Iae42c17cbadb6554c6a95bda14daf5ac67e352a7
      4a9eda2c
    • Johann's avatar
      adopt some clang 5.0.0 formatting · 123e8a60
      Johann authored
      At least the changes that don't conflict with 4.0.1
      
      Change-Id: Iaa2fda027b8ab2b023d608cf5ec7b377a72b851e
      123e8a60
    • Yaowu Xu's avatar
      Add experiment aom_qm_ext and its dependency · e2994a5c
      Yaowu Xu authored
      Change-Id: I243e2a3cbae5b4eebe7fbabcb9f55552e9f13bd8
      e2994a5c
    • Jingning Han's avatar
      Support rd model in txk sel search · dd8600f5
      Jingning Han authored
      Make the per transform block kernel selection process unified with
      the rate distortion model used in preliminary mode search. This
      makes the txk-sel model search space same as baseline.
      
      Change-Id: I82a2d94e88a03c88154582575ced500197f8a409
      dd8600f5
    • Hui Su's avatar
      Code cleanup in rdopt.h · 206d22f2
      Hui Su authored
      Change-Id: Iea0e8665cdd5b9bc0fe17930add7068443765ea9
      206d22f2
  3. 23 Jan, 2018 11 commits
    • Hui Su's avatar
      Remove av1_cost_bit() · 751a2335
      Hui Su authored
      It's more efficient to use av1_cost_literal() instead.
      
      Change-Id: I50727d4a4ee06492b373c2e7831c224c5eae8735
      751a2335
    • Hui Su's avatar
      lv-map: replace read/write_bin with read/write_symbol · 41d61528
      Hui Su authored
      Change-Id: I9e16b5de0a3ae1814982660434812d417955d94f
      41d61528
    • Debargha Mukherjee's avatar
      Change tilesize to 256x256 for >CIF resolutions · 5f7f3677
      Debargha Mukherjee authored
      An improvement in coding efficiency for higher resolution
      sources. Plus having this on by default will guard against
      256x256 LRU support not being inadvertently broken.
      
      Change-Id: I171b3c310eab72e27390e9ad0aa9c362f7fbb508
      5f7f3677
    • Yaowu Xu's avatar
      Remove Frame_ID_NUMBERS_PRESENT_FLAG · 6eb9da2c
      Yaowu Xu authored
      This commit replaces hard coded FRAME_ID_NUMBERS_PRESENT_FLAG with
      error_resilient_mode, which properly reflects the intention of the
      experiment, i.e. "signal the complete state of the reference buffer
      explicitly for each frame" to deal with possible frame losses.
      
      Change-Id: I7130c110d26c6a8e1cf1266c05482b768cf352f9
      6eb9da2c
    • Tom Finegan's avatar
      Revert "add scalability experiment" · 8695e987
      Tom Finegan authored
      This reverts commit 2eeadab1.
      
      Reason for revert: Did not address final review comments before landing.
      
      Change-Id: I29089767857bd20b3a3e42322e3887fb7027559d
      8695e987
    • Soo-Chul Han's avatar
      add scalability experiment · 2eeadab1
      Soo-Chul Han authored
      configure:  --enable-experimental --enable-scalability
      
      New applications:  scalable_encoder, scalable_decoder
      
      scalable_encoder:
        * Encodes inputs as 2-layer (same size) stream
        * Encodes as obu file (OBU_NO_IVF must be enabled)
        * Base layer encoded in IPPPP where P's reference
          only the previous (in time) base layer
        * Enhancement layer encoded using its base layer as
          sole reference frame
        * Base layer encoded with fixed high QP
        * Enhancement layer encoded with fixed low QP
      
      scalable_decoder:
        * Able to decode scalable stream generated by
          scalable_encoder
        * Able to decode any single-layer stream encoded
          by aomenc
        * Outputs base layer as out_lyr0.yuv, and enhancement
          layer (if they exist) as out_lyrN.yuv (N = 1, 2, 3, ..)
        * Able to decode N layers (more than 2)
      
      Change-Id: I8555735db71e5b9b6f900ffdf978e0ad6f6bfc00
      2eeadab1
    • Yaowu Xu's avatar
      Fix build when obu is not enabled · a8975df5
      Yaowu Xu authored
      Change-Id: I2d2ce75c184011884de8a015a6666b5209de2082
      a8975df5
    • Frederic Barbier's avatar
      Move encoder-specific function out of decoder · 57ddc51a
      Frederic Barbier authored
      Change-Id: I5ae45abe5145dedf9751adbeb81a111a49df7eb5
      57ddc51a
    • Angie Chiang's avatar
      Let adst4's precision be adjustable · 8251736b
      Angie Chiang authored
      Change-Id: I6e251328b2934130992dbd355cfdffc3c721d357
      8251736b
    • Angie Chiang's avatar
      Tune the inv_shift · 06250276
      Angie Chiang authored
      Let the second stage of 10 bit inv txfms fit within 16 bits
      
      Change-Id: Ia087d65484cd410651190dcd9d3292cce6594d34
      06250276
    • Angie Chiang's avatar
      Correct inv_start_range · a8b45c37
      Angie Chiang authored
      Change-Id: I08e4686b0bcf19a3c318a831bc338c9e58f3a127
      a8b45c37