1. 24 Jan, 2018 16 commits
    • Imdad Sardharwalla's avatar
      Added SSE4.1 and AVX2 implementations of FAST SGR. · 9d234571
      Imdad Sardharwalla authored
      The self-guided filter speed tests show that:
      - The SSE4.1 implementation of FAST SGR is ~35% faster than the corresponding
        implementation of SGR;
      - The AVX2 implementation of FAST SGR is ~28% faster than the corresponding
        implementation of SGR.
      Change-Id: Iecdc1f8cee79500084c71d06dbb02d804272aa99
    • Debargha Mukherjee's avatar
      Add a config flag/code for fast sgr computation · ed5e9673
      Debargha Mukherjee authored
      Adds an experiment for fast sgr computation where for the r=2
      filter, computation of the A, B stats are computed for every
      other row and averaged in between.
      The motivation is to improve software performance with hopefully
      minimal loss.
      Change-Id: Ie36687826524dc18c1fbb7f6becff244187bf8da
    • David Barker's avatar
      [loop-restoration, bugfix] Restrict sampling of deblocked pixels · dff901ff
      David Barker authored
      There is a special case with certain frame heights, where we
      end up with a loop restoration stripe which ends 1px above the
      crop border.
      Previously this case was handled in quite an ugly way, which also
      disagrees with the spec (+ isn't great for hardware). This patch
      changes things to match the spec.
      Specifically, the old method was to sometimes upscale one extra
      row of deblocked pixels so that we could always have a 2px
      "below" border for each processing stripe. The new method is to
      only use rows inside the crop border, and to duplicate them if
      Change-Id: Idf8ab510e1091dc3f5b257de60e16bca214d8dc4
    • Sean DuBois's avatar
      Remove deadline · 47cc2559
      Sean DuBois authored
      Change-Id: I9df343f4a6a809b09446ff1f2083c38771ab068b
    • Yaowu Xu's avatar
      Set input_shift properly · 913867b4
      Yaowu Xu authored
      Profile 0 now supports 10 bit, therefore no longer means input_shift
      at 0.
      Change-Id: Idae429b88ee5c073ee6e939a88d569c5ffde2b0d
    • Angie Chiang's avatar
      Simplify cos_bit setting in txfm · d4327bce
      Angie Chiang authored
      Move cos_bit from txfm 1d cfg to 2d cfg
      Each txfm stage only uses one cos_bit
      This is a lossless change and it speeds up encoder by 2%
      Change-Id: I45d398761e4729b8c4c37729571fe3765cb0c83f
    • Frederic Barbier's avatar
      Cleanup redundant assertion · dc3d916b
      Frederic Barbier authored
      Change-Id: I6532e20c958d5bf6f6d73a6f076664e1b74ba055
    • Jingning Han's avatar
      Skip RD search over lst 2/3 frame for non-nearest neighbor mvs · 8db5f17b
      Jingning Han authored
      Skip the rate distortion search over last 2/3 reference frames for
      the reference motion vectors derived from non-nearest neighbors.
      The overall coding performance change is in the noise range - 0.05%
      better. Speed up the encoding process by 20%.
      Change-Id: I823b8ca2805ae332f4c9bc8ee255069a82db4331
    • Zoe Liu's avatar
      Use split and horz/vert to predict horzA/B/vertA/B · 6001fb05
      Zoe Liu authored
      In rd_pick_partition(), the first one or two blocks for the partition
      types HORZ_A, HORZ_B, VERT_A, and VERT_B may be already evaluated,
      during the evaluation of SPLIT, HORZ, and VERT. This patch saves the
      RD pick mode results and tries to reuse them to remove the duplicate
      RD mode evaluation operations.
      This patch should not incur any coding performance loss.
      Testing on a few lowres frames: when CFL is off, this patch obtains
      >10% encoder speedup.
      Change-Id: I932e233bc93873de62a88230254df44494236dde
    • Yushin Cho's avatar
      Add AVX2 implementation for motion compensation function · 54cd8d76
      Yushin Cho authored
      AVX2 Code for av1_convolve_2d_sr_c()
      Change-Id: Id8a2192b78bbb2c6ac22da3134a7c256941985c8
    • Johann's avatar
      remove deprecated cmake flags · ec254b77
      Johann authored
      These flags provided compatibility with configure but have
      no effect in cmake builds.
      Change-Id: I2dbb71d9aeaae759cc3c4a46917e3840d696328d
    • Johann's avatar
      remove stale .gitignore entries · 4a9eda2c
      Johann authored
      In-tree builds are explicitly disallowed by cmake. Any of these files
      showing up in the source tree should be cause for concern.
      Change-Id: Iae42c17cbadb6554c6a95bda14daf5ac67e352a7
    • Johann's avatar
      adopt some clang 5.0.0 formatting · 123e8a60
      Johann authored
      At least the changes that don't conflict with 4.0.1
      Change-Id: Iaa2fda027b8ab2b023d608cf5ec7b377a72b851e
    • Yaowu Xu's avatar
      Add experiment aom_qm_ext and its dependency · e2994a5c
      Yaowu Xu authored
      Change-Id: I243e2a3cbae5b4eebe7fbabcb9f55552e9f13bd8
    • Jingning Han's avatar
      Support rd model in txk sel search · dd8600f5
      Jingning Han authored
      Make the per transform block kernel selection process unified with
      the rate distortion model used in preliminary mode search. This
      makes the txk-sel model search space same as baseline.
      Change-Id: I82a2d94e88a03c88154582575ced500197f8a409
    • Hui Su's avatar
      Code cleanup in rdopt.h · 206d22f2
      Hui Su authored
      Change-Id: Iea0e8665cdd5b9bc0fe17930add7068443765ea9
  2. 23 Jan, 2018 23 commits
    • Hui Su's avatar
      Remove av1_cost_bit() · 751a2335
      Hui Su authored
      It's more efficient to use av1_cost_literal() instead.
      Change-Id: I50727d4a4ee06492b373c2e7831c224c5eae8735
    • Hui Su's avatar
      lv-map: replace read/write_bin with read/write_symbol · 41d61528
      Hui Su authored
      Change-Id: I9e16b5de0a3ae1814982660434812d417955d94f
    • Debargha Mukherjee's avatar
      Change tilesize to 256x256 for >CIF resolutions · 5f7f3677
      Debargha Mukherjee authored
      An improvement in coding efficiency for higher resolution
      sources. Plus having this on by default will guard against
      256x256 LRU support not being inadvertently broken.
      Change-Id: I171b3c310eab72e27390e9ad0aa9c362f7fbb508
    • Yaowu Xu's avatar
      Remove Frame_ID_NUMBERS_PRESENT_FLAG · 6eb9da2c
      Yaowu Xu authored
      This commit replaces hard coded FRAME_ID_NUMBERS_PRESENT_FLAG with
      error_resilient_mode, which properly reflects the intention of the
      experiment, i.e. "signal the complete state of the reference buffer
      explicitly for each frame" to deal with possible frame losses.
      Change-Id: I7130c110d26c6a8e1cf1266c05482b768cf352f9
    • Tom Finegan's avatar
      Revert "add scalability experiment" · 8695e987
      Tom Finegan authored
      This reverts commit 2eeadab1.
      Reason for revert: Did not address final review comments before landing.
      Change-Id: I29089767857bd20b3a3e42322e3887fb7027559d
    • Soo-Chul Han's avatar
      add scalability experiment · 2eeadab1
      Soo-Chul Han authored
      configure:  --enable-experimental --enable-scalability
      New applications:  scalable_encoder, scalable_decoder
        * Encodes inputs as 2-layer (same size) stream
        * Encodes as obu file (OBU_NO_IVF must be enabled)
        * Base layer encoded in IPPPP where P's reference
          only the previous (in time) base layer
        * Enhancement layer encoded using its base layer as
          sole reference frame
        * Base layer encoded with fixed high QP
        * Enhancement layer encoded with fixed low QP
        * Able to decode scalable stream generated by
        * Able to decode any single-layer stream encoded
          by aomenc
        * Outputs base layer as out_lyr0.yuv, and enhancement
          layer (if they exist) as out_lyrN.yuv (N = 1, 2, 3, ..)
        * Able to decode N layers (more than 2)
      Change-Id: I8555735db71e5b9b6f900ffdf978e0ad6f6bfc00
    • Yaowu Xu's avatar
      Fix build when obu is not enabled · a8975df5
      Yaowu Xu authored
      Change-Id: I2d2ce75c184011884de8a015a6666b5209de2082
    • Frederic Barbier's avatar
      Move encoder-specific function out of decoder · 57ddc51a
      Frederic Barbier authored
      Change-Id: I5ae45abe5145dedf9751adbeb81a111a49df7eb5
    • Angie Chiang's avatar
      Let adst4's precision be adjustable · 8251736b
      Angie Chiang authored
      Change-Id: I6e251328b2934130992dbd355cfdffc3c721d357
    • Angie Chiang's avatar
      Tune the inv_shift · 06250276
      Angie Chiang authored
      Let the second stage of 10 bit inv txfms fit within 16 bits
      Change-Id: Ia087d65484cd410651190dcd9d3292cce6594d34
    • Angie Chiang's avatar
      Correct inv_start_range · a8b45c37
      Angie Chiang authored
      Change-Id: I08e4686b0bcf19a3c318a831bc338c9e58f3a127
    • Angie Chiang's avatar
      Tune fwd txfm's config · a0d27597
      Angie Chiang authored
      Maximize cos_bit's precision
      Change-Id: Iad5d3915823f5c1c25a0caa3bd012d60caa2d521
    • Angie Chiang's avatar
      Fix txfm_stage_range_check · 248f0557
      Angie Chiang authored
      Only check cos_bit range if cos_bit is not NULL
      Change-Id: I286fc056812b20242cc962a8b008af7093d05b1d
    • Angie Chiang's avatar
      Move InvSqrt2 to the front of inv_txfm2d_add_c · 4b29ea86
      Angie Chiang authored
      This will simplify the range management of rect txfm
      Change-Id: Icf678fe735dd299c6c42a215c592611025e87ba6
    • Hui Su's avatar
      Remove more code about probability based entropy coding · 9fdf2e2e
      Hui Su authored
      Change-Id: Ie0bc1dd68f7a5d81e49da0ae6f855e572e12aa10
    • Cheng Chen's avatar
      Fix a bug in jnt_comp · 5b5f3d50
      Cheng Chen authored
      (1). index may go out side of range
      (2). when d0 <= d1, comparison is invalid.
      Performance impact on Google lowres testset:
      Turn on jnt_comp vs baseline,
      Without fix: -0.211% gain
      With fix: -0.357% gain
      Change-Id: I761522bba8396bba0d4108d710030b472939cf32
    • Imdad Sardharwalla's avatar
      Added a test for monochrome encoding. · 26ac0478
      Imdad Sardharwalla authored
      The test encodes 5 frames of a video using the --monochrome flag and
      verifies that the decoded frames satisfy:
      - each frame's monochrome flag is set to 1
      - each frame's U and V planes are set to a constant, and this constant
        is the same for all decoded frames
      - the initial frame's Y PSNR value is 'high enough'
      - the Y PSNR values remain fairly constant across all of the frames
      Change-Id: I4239ddfb745ed9746547737b4bc99963c71e51c0
    • Imdad Sardharwalla's avatar
      Don't calculate chroma data in monochrome mode · af8e2648
      Imdad Sardharwalla authored
      Encoder: Prior to this patch, some chroma data was calculated and
      later discarded when in monochrome mode. This patch ensures that
      the chroma planes are left uninitialised and that chroma
      calculations are not performed.
      Decoder: Prior to this patch, some chroma calculations were still
      being performed in monochrome mode (e.g. loop filtering). This
      patch ensures that calculations are only performed on the y
      plane, with the chroma planes being set to a constant.
      Change-Id: I394c0c9fc50f884e76a65e6131bd6598b8b21b10
    • Imdad Sardharwalla's avatar
      Fix Valgrind warning in av1_pick_filter_restoration · b08544de
      Imdad Sardharwalla authored
      Some array elements were defined and left uninitialised. This wasn't causing a
      problem, as the elements were later ignored, but it did cause Valgrind to
      produce warnings.
      The function now initialises the full array immediately after its definition in
      order to quiet these warnings.
      Change-Id: I5083f1f4008cb3ab70a4af4d1d2573dee8793303
    • Frank Bossen's avatar
      Add SSE2 implementation of 1-D convolve functions · ffa57594
      Frank Bossen authored
      Can reduce decoder runtime by about 7 percent.
      Change-Id: I4ee3eea9de867d065d03a176f242e286a4899004
    • Hui Su's avatar
      Remove the dct_only experiment · 7448fc24
      Hui Su authored
      Change-Id: I33bb6e902e3be2847ae8101199d9cbd0e1e5c38d
    • Peng Bin's avatar
      Move if statement outside for loops · 2e8eaddd
      Peng Bin authored
      By avoiding break CPU's pipeline,
      this patch achieves a small encoder
      speedup at the range of 0.2%~0.71%.
      Change-Id: I398cb09f8eb91695e3258091ff2f82f06ab74145
    • Soo-Chul Han's avatar
      [segment_pred_last] fix resolution change issues · 85e8c797
      Soo-Chul Han authored
      explicitly disable segmentation when ref frame has different
      Change-Id: I6db51116db308514d572eb465c2453403e64e1f2
  3. 22 Jan, 2018 1 commit
    • Yaowu Xu's avatar
      Simplify operations · 76f34734
      Yaowu Xu authored
      This also avoids a UBsan warning of "left shift of negative values".
      Change-Id: Ifb8d74218b1c0bc7b924e752442de0ba1b50a869