1. 24 Jan, 2018 7 commits
    • Imdad Sardharwalla's avatar
      Added SSE4.1 and AVX2 implementations of FAST SGR. · 9d234571
      Imdad Sardharwalla authored
      The self-guided filter speed tests show that:
      - The SSE4.1 implementation of FAST SGR is ~35% faster than the corresponding
        implementation of SGR;
      - The AVX2 implementation of FAST SGR is ~28% faster than the corresponding
        implementation of SGR.
      
      Change-Id: Iecdc1f8cee79500084c71d06dbb02d804272aa99
      9d234571
    • Debargha Mukherjee's avatar
      Add a config flag/code for fast sgr computation · ed5e9673
      Debargha Mukherjee authored
      Adds an experiment for fast sgr computation where for the r=2
      filter, computation of the A, B stats are computed for every
      other row and averaged in between.
      The motivation is to improve software performance with hopefully
      minimal loss.
      
      Change-Id: Ie36687826524dc18c1fbb7f6becff244187bf8da
      ed5e9673
    • David Barker's avatar
      [loop-restoration, bugfix] Restrict sampling of deblocked pixels · dff901ff
      David Barker authored
      There is a special case with certain frame heights, where we
      end up with a loop restoration stripe which ends 1px above the
      crop border.
      
      Previously this case was handled in quite an ugly way, which also
      disagrees with the spec (+ isn't great for hardware). This patch
      changes things to match the spec.
      
      Specifically, the old method was to sometimes upscale one extra
      row of deblocked pixels so that we could always have a 2px
      "below" border for each processing stripe. The new method is to
      only use rows inside the crop border, and to duplicate them if
      necessary.
      
      BUG=aomedia:1264
      
      Change-Id: Idf8ab510e1091dc3f5b257de60e16bca214d8dc4
      dff901ff
    • Angie Chiang's avatar
      Simplify cos_bit setting in txfm · d4327bce
      Angie Chiang authored
      Move cos_bit from txfm 1d cfg to 2d cfg
      Each txfm stage only uses one cos_bit
      
      This is a lossless change and it speeds up encoder by 2%
      
      Change-Id: I45d398761e4729b8c4c37729571fe3765cb0c83f
      d4327bce
    • Frederic Barbier's avatar
      Cleanup redundant assertion · dc3d916b
      Frederic Barbier authored
      Change-Id: I6532e20c958d5bf6f6d73a6f076664e1b74ba055
      dc3d916b
    • Yushin Cho's avatar
      Add AVX2 implementation for motion compensation function · 54cd8d76
      Yushin Cho authored
      AVX2 Code for av1_convolve_2d_sr_c()
      
      Change-Id: Id8a2192b78bbb2c6ac22da3134a7c256941985c8
      54cd8d76
    • Johann's avatar
      adopt some clang 5.0.0 formatting · 123e8a60
      Johann authored
      At least the changes that don't conflict with 4.0.1
      
      Change-Id: Iaa2fda027b8ab2b023d608cf5ec7b377a72b851e
      123e8a60
  2. 23 Jan, 2018 14 commits
    • Yaowu Xu's avatar
      Remove Frame_ID_NUMBERS_PRESENT_FLAG · 6eb9da2c
      Yaowu Xu authored
      This commit replaces hard coded FRAME_ID_NUMBERS_PRESENT_FLAG with
      error_resilient_mode, which properly reflects the intention of the
      experiment, i.e. "signal the complete state of the reference buffer
      explicitly for each frame" to deal with possible frame losses.
      
      Change-Id: I7130c110d26c6a8e1cf1266c05482b768cf352f9
      6eb9da2c
    • Tom Finegan's avatar
      Revert "add scalability experiment" · 8695e987
      Tom Finegan authored
      This reverts commit 2eeadab1.
      
      Reason for revert: Did not address final review comments before landing.
      
      Change-Id: I29089767857bd20b3a3e42322e3887fb7027559d
      8695e987
    • Soo-Chul Han's avatar
      add scalability experiment · 2eeadab1
      Soo-Chul Han authored
      configure:  --enable-experimental --enable-scalability
      
      New applications:  scalable_encoder, scalable_decoder
      
      scalable_encoder:
        * Encodes inputs as 2-layer (same size) stream
        * Encodes as obu file (OBU_NO_IVF must be enabled)
        * Base layer encoded in IPPPP where P's reference
          only the previous (in time) base layer
        * Enhancement layer encoded using its base layer as
          sole reference frame
        * Base layer encoded with fixed high QP
        * Enhancement layer encoded with fixed low QP
      
      scalable_decoder:
        * Able to decode scalable stream generated by
          scalable_encoder
        * Able to decode any single-layer stream encoded
          by aomenc
        * Outputs base layer as out_lyr0.yuv, and enhancement
          layer (if they exist) as out_lyrN.yuv (N = 1, 2, 3, ..)
        * Able to decode N layers (more than 2)
      
      Change-Id: I8555735db71e5b9b6f900ffdf978e0ad6f6bfc00
      2eeadab1
    • Frederic Barbier's avatar
      Move encoder-specific function out of decoder · 57ddc51a
      Frederic Barbier authored
      Change-Id: I5ae45abe5145dedf9751adbeb81a111a49df7eb5
      57ddc51a
    • Angie Chiang's avatar
      Let adst4's precision be adjustable · 8251736b
      Angie Chiang authored
      Change-Id: I6e251328b2934130992dbd355cfdffc3c721d357
      8251736b
    • Angie Chiang's avatar
      Tune the inv_shift · 06250276
      Angie Chiang authored
      Let the second stage of 10 bit inv txfms fit within 16 bits
      
      Change-Id: Ia087d65484cd410651190dcd9d3292cce6594d34
      06250276
    • Angie Chiang's avatar
      Correct inv_start_range · a8b45c37
      Angie Chiang authored
      Change-Id: I08e4686b0bcf19a3c318a831bc338c9e58f3a127
      a8b45c37
    • Angie Chiang's avatar
      Move InvSqrt2 to the front of inv_txfm2d_add_c · 4b29ea86
      Angie Chiang authored
      This will simplify the range management of rect txfm
      
      Change-Id: Icf678fe735dd299c6c42a215c592611025e87ba6
      4b29ea86
    • Hui Su's avatar
      Remove more code about probability based entropy coding · 9fdf2e2e
      Hui Su authored
      Change-Id: Ie0bc1dd68f7a5d81e49da0ae6f855e572e12aa10
      9fdf2e2e
    • Cheng Chen's avatar
      Fix a bug in jnt_comp · 5b5f3d50
      Cheng Chen authored
      (1). index may go out side of range
      (2). when d0 <= d1, comparison is invalid.
      
      Performance impact on Google lowres testset:
      Turn on jnt_comp vs baseline,
      Without fix: -0.211% gain
      With fix: -0.357% gain
      
      BUG=aomedia:1239
      
      Change-Id: I761522bba8396bba0d4108d710030b472939cf32
      5b5f3d50
    • Imdad Sardharwalla's avatar
      Don't calculate chroma data in monochrome mode · af8e2648
      Imdad Sardharwalla authored
      Encoder: Prior to this patch, some chroma data was calculated and
      later discarded when in monochrome mode. This patch ensures that
      the chroma planes are left uninitialised and that chroma
      calculations are not performed.
      
      Decoder: Prior to this patch, some chroma calculations were still
      being performed in monochrome mode (e.g. loop filtering). This
      patch ensures that calculations are only performed on the y
      plane, with the chroma planes being set to a constant.
      
      Change-Id: I394c0c9fc50f884e76a65e6131bd6598b8b21b10
      af8e2648
    • Frank Bossen's avatar
      Add SSE2 implementation of 1-D convolve functions · ffa57594
      Frank Bossen authored
      Can reduce decoder runtime by about 7 percent.
      
      Change-Id: I4ee3eea9de867d065d03a176f242e286a4899004
      ffa57594
    • Hui Su's avatar
      Remove the dct_only experiment · 7448fc24
      Hui Su authored
      Change-Id: I33bb6e902e3be2847ae8101199d9cbd0e1e5c38d
      7448fc24
    • Soo-Chul Han's avatar
      [segment_pred_last] fix resolution change issues · 85e8c797
      Soo-Chul Han authored
      explicitly disable segmentation when ref frame has different
      resolution
      
      BUG=aomedia:1205
      BUG=aomedia:1223
      BUG=aomedia:1256
      
      Change-Id: I6db51116db308514d572eb465c2453403e64e1f2
      85e8c797
  3. 22 Jan, 2018 5 commits
  4. 20 Jan, 2018 1 commit
  5. 19 Jan, 2018 5 commits
  6. 18 Jan, 2018 5 commits
  7. 17 Jan, 2018 3 commits
    • Linfeng Zhang's avatar
      Update convolve_sse2.c · 6f84e12d
      Linfeng Zhang authored
      to process width 4 case separately
      
      Change-Id: I18f5e026927c4d3d705586e9e0f8a6315931951c
      6f84e12d
    • Imdad Sardharwalla's avatar
      Optimise self-guided restoration SIMD functions · f32dabd2
      Imdad Sardharwalla authored
      Improvements have been made to calc_ab for both the
      SSE4.1 and AVX2 versions of the self-guided filter.
      These result in an increase in the speed of between
      3% and 5% depending on the bit depth.
      
      Change-Id: I83a12ba452fcbb61cce5066801ae213e23c609cd
      f32dabd2
    • Imdad Sardharwalla's avatar
      SIMD implementation of horz superres · 454697ca
      Imdad Sardharwalla authored
      SSE4.1 implementations of av1_convolve_horiz_rs and
      av1_highbd_convolve_horiz_rs have been added, along
      with the corresponding speed and correctness tests.
      
      The interp_taps argument was defunct and has now been
      removed and replaced with the UPSCALE_NORMATIVE_TAPS
      macro.
      
      Code associated with values of UPSCALE_NORMATIVE_TAPS
      that are no longer used has been removed.
      
      Change-Id: Ie74d8ca479a70c8d473ac12883cfe4f10b37a66d
      454697ca