1. 30 Oct, 2017 3 commits
    • Luc Trudeau's avatar
      [CFL] Sub8x8 Validation Code Rewrite · c7af36d4
      Luc Trudeau authored
      Sub8x8 Validation code is changed to be more robust. The scope of the
      validation is narrowed to validating that all of the required content in
      the storage buffer was stored between CfL predictions. The early
      termination used in the current mode decision code does not allow to
      validate more than that.
      This change does not change encoder output
      Change-Id: I7f1ed84da5037dcfaaf5da9cf33b4b8d664d2352
    • Debargha Mukherjee's avatar
      Remove experimental flag for rect-tx · 11812967
      Debargha Mukherjee authored
      Change-Id: I0cc53a03f07a11a6f7ea0570ff4ee8cf7c18c5aa
    • David Barker's avatar
      loop-restoration: Remove special case in Wiener filter · 3acd3b5c
      David Barker authored
      Remove the special case handling for the topmost/bottommost
      rows in each processing unit. This causes slightly different
      effects depending on whether striped-loop-restoration is enabled.
      With striped-loop-restoration:
        Now that we explicitly fill out 3 rows of above/below pixels
        for each stripe, we don't need to use stepdown_wiener_kernel.
        Instead, the duplication of the topmost/bottommost pixels
        accomplishes the same task, while making the code much cleaner.
        This patch should not cause a change in output, except in a
        couple of cases which were already questionable. In particular,
        it fixes bug #953, where the Wiener filter could not handle
        small processing units (<4 rows high)
      Without striped-loop-restoration:
        The Wiener filter returns to using a full 3 pixels above/below
        the processing unit. In order to make sure there are enough
        pixels, we need to expand WIENER_BORDER_VERT to 3 pixels.
        This will result in a slight change in output, but should be
        fairly minor.
      Change-Id: I9530ef55909246f7ba488b7ecfd92d59e776b2f9
  2. 28 Oct, 2017 3 commits
    • Nathan E. Egge's avatar
      Add new 4-point Type-VII DST to daala_tx. · 14a9cb1f
      Nathan E. Egge authored
      Replaces the lifting based orthonormal 4-point Type-IV DST with an
       orthonormal 4-point Type-VII DST that has no iterative multiplies.
      Change-Id: I0a1f1a8d8cecce1c8002b7891baea601bc088690
    • Jingning Han's avatar
      Extend the eob context model · 35deaa73
      Jingning Han authored
      Account for 1-D/2-D transform kernels for the eob modeling. To
      maintain a smaller context cardinality, set the two 1-D transform
      kernels in the same category. The difference in directions should
      be largely covered by the scan order.
      This and the previous CLs on nz_map context modeling together
      improve the compression performance of level-map coefficient coding
      system by 0.4% for lowres.
      Change-Id: I8c4f03ca01ce3d248950d04bd1266f445b4227a0
    • Jingning Han's avatar
      Account for rectangular transform block sizes in lv-map ctx · a24a6900
      Jingning Han authored
      Account for the rectangular transform block sizes in the non-zero
      map context model.
      Change-Id: I16cf21a4120c10c213df10950aeb4ef0ea40c477
  3. 27 Oct, 2017 11 commits
    • Joe Young's avatar
      Ext-intra modification/tuning · 3ca43bf0
      Joe Young authored
      For ext-intra direcation intra modes:
      1. Use neighbor block modes to modify edge filtering strength
         Coding gain (lowres/midres/hdres):
           (8 keyframes)
           PSNR: -0.19 -0.22 -0.10
           SSIM: -0.29 -0.27 -0.13
      2. Use context-based cdf to code angle_delta syntax
           (8 keyframes)
           PSNR: -0.20 -0.24 -0.27
           SSIM: -0.29 -0.33 -0.37
      3. Filter corner sample:
           (8 keyframes)
           PSNR: -0.01 -0.02 -0.05
           SSIM: -0.03 -0.04 -0.05
      Combined Bd-rate improvement for 8 keyframes
           PSNR: -0.40 -0.47 -0.40
           SSIM: -0.57 -0.60 -0.51
      Change-Id: Id47ac17b6bf91cd810b70cacfc5b457341f417f3
    • Urvang Joshi's avatar
      Superres: Fix writing/reading of denominator. · 8301018d
      Urvang Joshi authored
      Range is 9 to 16, and not 8 to 15.
      Change-Id: I7de6cea16a6377d9cd3b2af73efc841b42dad1fa
    • Urvang Joshi's avatar
      64X64: Keep top-left 32x32 only (other code path). · 693ae522
      Urvang Joshi authored
      Change-Id: Ib4faac1e7da40a351ec3abfe1f636a94c92ef0a3
    • Urvang Joshi's avatar
      Encoder: Reduce max resident set size by 23% · 5a69cd2d
      Urvang Joshi authored
      We reduce max stack size from 16 to 8.
      Memory reduction:
      - peak usage for 1080p video: 2.328 GB → 1.788 GB
      - sizeof ref_mv_stack: 6144 → 3072
      - sizeof(MB_MODE_INFO_EXT): 6456 → 3384
      - sizeof(PICK_MODE_CONTEXT):8056 → 5000
      - sizeof(PC_TREE): 201440 → 125040
      Compression performance is roughly neutral:
      - AWCY objective-1-fast: +0.03
      - Google lowres: 0.0
      - Google midres: -0.006
      Change-Id: Ifd38359c58e40b1c94552c5034618da8ce510f62
    • Cheng Chen's avatar
      JNT_COMP: 4. add context and entropy read/write · 0a7f2f51
      Cheng Chen authored
      Change-Id: I0e6f7ab981e31f7120105515f6204568b6dc82d3
    • Cheng Chen's avatar
      JNT_COMP: 3. rd select the best weight · ca6958c6
      Cheng Chen authored
      Select the best compound_idx in rd.
      The rate/cost for compound_idx and their ctx will be in patch 4.
      But there's a bug for now if we don't encode one more time using the
      selected compound_idx. It remains a issue to be solved in the future.
      Change-Id: I5e1ba51da2b6ab5bacd8aba752dda43bd2257014
    • Zhijie Yang's avatar
      Add short_filter experiment · f02f8aef
      Zhijie Yang authored
      Reduce the motion interpolation filter taps for inter prediction
      blocks with widths or heights smaller than or equal to 4 to alleviate the memory
      bandwidth increase.
      AWCY HL: 0.01% Y, -0.20% U, -0.29% V (positive number means loss)
      Change-Id: Ic454340e20aea2f1aae622336990f24a9e5b54d8
    • David Barker's avatar
      striped-loop-restoration: Save/restore more context rows · fa1e4b2a
      David Barker authored
      Save and restore 3 rows above and below each stripe, instead of 2.
      The extra rows are filled with duplicates of the outermost context
      This should not affect the encoder or decoder output in any way,
      as currently these outer rows are not used. But this will enable
      later patches to simplify the code and make it a closer match
      to the way things are described in the striped-loop-restoration
      design document.
      Change-Id: I8ae5433e321d6025c6dc1b473330f485f1599340
    • Sebastien Alaiwan's avatar
      Accept all warped motion model settings · 163710c0
      Sebastien Alaiwan authored
      When needed, fallback regular interp filter at reconstruction stage.
      Such bitstreams are valid.
      However, as we don't expect aomenc to generate them,
      print a helper warning.
      Change-Id: If30c8d8e478688d142abd857f4c35f3e8c68edb4
    • Nathan E. Egge's avatar
      Fix bug when enabling 32-point DST in daala_tx. · 856d1798
      Nathan E. Egge authored
      Change-Id: I567420e45f54cfe991065614d0a8c0c4d637e116
    • RogerZhou's avatar
      Fixed build conflict (amvr,intrabc). · 10a0380a
      RogerZhou authored
      Change-Id: Ibfeb424bf0ebab7bbeb69f6f6df24a4f4924ec97
  4. 26 Oct, 2017 9 commits
    • David Barker's avatar
      striped-loop-restoration: Fix line buffer width · e7745025
      David Barker authored
      The last restoration unit in a tile is allowed to be up to 1.5x
      the nominal restoration unit size. This was not properly accounted
      for in the definition of RESTORATION_LINEBUFFER_WIDTH, leading to
      memory corruption whenever we hit a particularly wide restoration
      Change-Id: I6e858278bf1e3304eedb5f974f1db6961245e7bf
    • Jingning Han's avatar
      Merge eob-first into lv-map · 3422ac17
      Jingning Han authored
      Change-Id: Ib36a8df1a3ebddbf4320fb7b9b5537041bddc3a3
    • Jingning Han's avatar
      Clean up br-node in lv-map · 36773c7a
      Jingning Han authored
      Use br-node approach, which can be easily turned into multi-symbol
      if desired.
      Change-Id: I40df5178ab299af24d347d91f01a88dbfc9305a6
    • Jingning Han's avatar
      Consolidate lv-map experiment · 00803a77
      Jingning Han authored
      Change-Id: I2ae2a33574bc3072561e696a31e0ea2e0770afa9
    • Sebastien Alaiwan's avatar
      Remove dead functions · 2457ec8c
      Sebastien Alaiwan authored
      Change-Id: Idcb0a6660ac3b34eb79c216d71c8a71ffb863669
    • Angie Chiang's avatar
      Collect coeff level distribution in symbolrate · 9c168370
      Angie Chiang authored
      Change-Id: If77800c0904b5e004508274acb32ae46a641405b
    • Angie Chiang's avatar
      Count superblock num in symbol rate accounting · d9af8ac3
      Angie Chiang authored
      Change-Id: Id955e62c89b44781cef6b562fbc1e5782fccf95e
    • Rupert Swarbrick's avatar
      Stop loop rest units from straddling tile boundaries · bcb65fe6
      Rupert Swarbrick authored
      With this patch, restoration units are allocated within each tile as
      if it were its own image. Arrays of information that need one entry
      per restoration unit are laid out in tiles, with rsi->units_per_tile
      units for each tile.
      Change-Id: I485c17166f33e24d281079b3138b76f98f0fe081
    • Nathan E. Egge's avatar
      Fix a bug in the DAALA_TX 4-point DST functions. · b634e7ed
      Nathan E. Egge authored
      The OD_FDST_4() and OD_IDST_4() macros were written for use in the
       OD_FDCT_8_ASYM macro which took asymmetrically scaled input and
       after running an asymmetric butterfly step, passed it through to
       the 4-point Type-II DCT and 4-point Type-IV DST.
      Because the DST implementations were never tested as stand alone
       transforms, some of the signs from the butterfly step ended up inside
       the DST macros.
      These extra operations will be addressed in a follow up patch.
      Change-Id: I5ad1dee7b903d3a6dc3d512ae430841244851bc0
  5. 25 Oct, 2017 12 commits
    • Jingning Han's avatar
      Fix reference frame mvs access · 058d0889
      Jingning Han authored
      Resolve an enc/dec mismatch issue when tmv is off and mfmv is on.
      Change-Id: Ia64005acd85f51d3162baafab1540095ad06187d
    • Sebastien Alaiwan's avatar
      av1_rtcd_defs.pl: deduplicate HBD/LBD · 27427722
      Sebastien Alaiwan authored
      There's no change to the generated file.
      Change-Id: I77e9d78d22d084bc77dbf1dc5b8b99368cd2444e
    • Yue Chen's avatar
      Optimizations for filter_intra · 57b8ff68
      Yue Chen authored
      Reduce number of modes from 10 to 6, and disable fi modes in UV.
      To reduce complexity, apply filter directly without subtracting
      the estimated means.
      Change-Id: Iaf78d92d31e4a7cc30ea7863b57a9611c5f503e6
    • Ola Hugosson's avatar
      striped_loop_restoration bug fixes · 54671902
      Ola Hugosson authored
      * The above/below buffers did not fit the extra replication pixels to the right and left
      * The wiener filter stripe has to be at least 4 pixel high (because of the
        split into above/mid/below parts)
      Change-Id: I360bef114c7ceb439e11b76bd4724af15e051348
    • David Michael Barr's avatar
      [CFL] Switch to txfm_rd_in_plane in alpha search · 1f8d0950
      David Michael Barr authored
      This is more precise than the dist functions it replaces.
      Results on Subset1 (compared with previous commit with CfL enabled)
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0634 | -0.9188 | -0.9429 |   0.0609 | 0.0722 |  0.0593 |    -0.3226
      Change-Id: I955a7d7eceea50482edb40b0d1041b300e3c9042
    • Sebastien Alaiwan's avatar
      Remove dead struct member · dea4d313
      Sebastien Alaiwan authored
      Change-Id: Id228c94fbe6005ac37a59bb8c23cfb0f95f97af0
    • Rupert Swarbrick's avatar
      Avoid UB from misaligned loads in selfguided_sse4.c · 84ffea31
      Rupert Swarbrick authored
      This follows on from the previous patch, which corrects xx_loadl_32
      for misaligned addresses. Calls to xx_loadl_32 in selfguided_sse4.c
      are all followed by a zero-extend, so this patch packages the two into
      the inlinable functions xx_load_extend_8_16 and xx_load_extend_8_32.
      There were also some hand-rolled loads (which matched the old body of
      xx_loadl_32 and weren't strictly correct when the pointer was
      misaligned). This patch fixes them up to use xx_load_extend_8_32.
      Change-Id: I9c76dd4f41baa1343149aa9c432218a17df8b415
    • Jingning Han's avatar
      Reduce the MFMV_STACK_SIZE value · 380e37cd
      Jingning Han authored
      Drop it from 4 to 3 to reflect the actual use case.
      Change-Id: Ifdadaf053153c21b4b4fef40a3298a557fd2ef92
    • Jingning Han's avatar
      Re-arrange the tpl_mvs stack order · 71da481d
      Jingning Han authored
      Check the availability of motion field from the ARF frame first.
      Change-Id: I8adce9e604344ee860b5015ff6c755f173886678
    • Jingning Han's avatar
      Reduce the actual tpl_mvs stack size · 406591c2
      Jingning Han authored
      Guarantee that the tpl_mvs stack size is 3 regardless if ALT2 will
      be deprecated.
      Change-Id: Ic8d19150051f87a4cfb25709feb4151b1e09a3e0
    • Rupert Swarbrick's avatar
      Define av1_foreach_rest_unit_in_frame · 33ed9e69
      Rupert Swarbrick authored
      This is the last stage in a quest to move all knowledge of the layout
      of restoration units across the frame into restoration.c. Now this is
      done, we can change how they are laid out (to split them properly at
      tile boundaries) without having to change code in any other file.
      Change-Id: Id5108d787d342f5070580d0e34d84b5ddcc53a86
    • Linfeng Zhang's avatar
      Remove unused get_level_count() and get_mag() · a29cef91
      Linfeng Zhang authored
      Change-Id: I5df23dd4106ff18747116d083423da3bdf300c7a
  6. 24 Oct, 2017 2 commits
    • Cheng Chen's avatar
      JNT_COMP: 1. Init version of experiment JNT_COMP · d867c9aa
      Cheng Chen authored
      Enable to assign distance based weight for joint compound prediction.
      (w0, w1) are weights for two predictors of different distance to
      current frame.
      Use 4 bit precision for quantized distance weight. e.g.
      the prediction is generated as
      value = (w0 * p0 + w1 * p1) >> n
      w0 + w1 = (1 << n), n = 4;
      Change-Id: Ib0ff0c41c82b9ebb033f498e90c18a03d18969e4
    • Yunqing Wang's avatar
      Enhance and refactor copying code · b90a97a8
      Yunqing Wang authored
      Modified the copying code and the profiling showed better performance
      than previous implementation.
      Change-Id: I41f585e0b0eac7a0deb4dec197c178e412a48db9