1. 07 Nov, 2017 4 commits
    • Monty Montgomery's avatar
      Fix build for CONFIG_DAALA_TX and CONFIG_TX64X64 · 683f70e7
      Monty Montgomery authored
      The recent 64x32 and 32x64 patches break the build when
      CONFIG_DAALA_TX and CONFIG_TX64X64 are enabled simultaneously.  This
      is a minor correction that fixes the build problem.
      Change-Id: I53cd8df9160fc35b67f2ac16bddcfab08425cf8e
    • Debargha Mukherjee's avatar
      Back out pruning for horz_4 and vert_4 partitions · d75ea5b5
      Debargha Mukherjee authored
      This change seems to drop efficiency more than expected. So backing
      that out for now until a better rd based decision is found.
      Change-Id: I3791a13ba76cfa38dd0df2f1fd4119b42b12291d
    • Yue Chen's avatar
      Fix mismatches caused by filter_intra · 18f6c15c
      Yue Chen authored
      Return invalid rate (previously only invalid rdcost) if the
      mode combination to check is < 8x8 tx_size + filter_intra mode.
      Change-Id: If90f431c7692473c88ac7a644bfa969a1acb3573
    • Rupert Swarbrick's avatar
      striped-loop-restoration: Respect tile experiments · 921b334f
      Rupert Swarbrick authored
      As of patch https://aomedia-review.googlesource.com/c/aom/+/28821 ,
      loop-restoration units cannot cross tile borders. But the context
      around each processing unit was still allowed to cross tile borders.
      This is fine in the usual case - but, when loop filtering across tiles
      is switched off, we're supposed to be able to decode each tile completely
      independently (each tile column, if dependent-horztiles is on).
      Roughly, the change we need to make is:
      When loop filtering across tiles is switched off, we treat each tile
      as if it were a full frame, and extend the CDEF output for that tile
      to form a 3-pixel border around the tile. We only use deblocked
      above/below pixels for processing unit boundaries which lie inside
      a tile.
      In terms of the code, this is implemented in two parts. This only
      applies when the loop_filter_across_tiles_flag is false; otherwise,
      we keep the old behaviour.
      * For processing units at the top edge of a tile, fill the above context
        with copies of the topmost line of CDEF output *from the same tile*,
        rather than using deblocked pixels from the tile above.
        The below context of processing units at the bottom edge of a tile
        is treated analogously.
      * When setting up the boundary for a processing stripe at the left edge
        of a tile, fill the stripe's left boundary with copies of the
        leftmost column of CDEF output from the same tile. Again, processing
        stripes at the right edge of a tile are treated analogously.
        Similarly to the above/below boundaries, we store the overwritten
        pixels into a pair of left/right context buffers, and restore them
        to their original values once we've dealt with that processing stripe.
      Change-Id: I53a0932793c1c56dc037683c6a4353a3f5dc4539
  2. 06 Nov, 2017 14 commits
    • Yushin Cho's avatar
      [segment] Remove coding of seg->abs_delta · d728c216
      Yushin Cho authored
      Remove the option of raw data or delta when coding the
      segment data, then only use delta coding.
      Raw data coding of segment data has been nowhere used but
      the option of "raw or delta codig of seg_data" has been coded to a bitstream.
      Change-Id: Iaf8f21692452d0c9a127b958812c6151d3c5db05
    • Yushin Cho's avatar
      [segment] Remove unused function · accfe39a
      Yushin Cho authored
      Also move its comment on seg_data to other relavant function.
      Change-Id: I5d3282040862cd09565b9d4f7baadf0124b64823
    • Luc Trudeau's avatar
      [CFL] DC_PRED as a block instead of as single value · ace7ffb2
      Luc Trudeau authored
      This change does not alter the bitstream. This change simplifies a subsequent
      commit to remove the custom DC_PRED used by CfL. To use the DC_PRED in AV1, 
      CfL must consider the DC_PRED as a block instead of a single value.
      Results on Subset1 (Compared to Previous commit with CfL enabled)
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      Change-Id: I75f981ab93ab1808450f8280bfbabde76ea5b7fe
    • Rupert Swarbrick's avatar
      Correct a loop restoration buffer size · efa76d7e
      Rupert Swarbrick authored
      On subsampled planes, the frame is narrower but the padding by
      RESTORATION_EXTRA_HORZ on each side is the same width as usual.
      Change-Id: Id68c0dd674efaa769412825b119ae5ebe56548ad
    • Yunqing Wang's avatar
      Disallow to use temporal MVs while large_scale_tile=1 · 0edbdcde
      Yunqing Wang authored
      While large_scale_tile=1, not use temporal MVs.
      Change-Id: I7107519595b79cbca45dfe72d5ada78cfdc39b00
    • Yunqing Wang's avatar
      Update the encoder flags for reference frames using and updating · 9a50fec3
      Yunqing Wang authored
      Updated the encoder flags for externally setting reference frames using
      and updating to include latest changes in AV1.
      1. For what reference frames to use, always initilize
      cpi->ref_frame_flags with AOM_REFFRAME_ALL at the beginning of encoding
      a frame. The internal ref_frame_flags starts from external flags. Added
      2. For what reference frames to update, added ext_refresh_bwd_ref_frame
      and ext_refresh_alt2_ref_frame for BWD and ALT2. Also, removed
      AOM_EFLAG_FORCE_GF and AOM_EFLAG_FORCE_ARF since these are never
      actually used. They can be added back if needed later.
      Change-Id: I1e4429290f09bfcd1b26f2babc0cf556fc6fbc6c
    • Sebastien Alaiwan's avatar
      Accept all global motion model settings · 8d88b297
      Sebastien Alaiwan authored
      When needed, fallback regular interp filter at reconstruction stage.
      Such bitstreams are valid.
      However, as we don't expect aomenc to generate them,
      print a helper warning.
      Change-Id: I7e818cf607d7d6f71df4ca7878d8976fb88c3282
    • Rupert Swarbrick's avatar
      Fix mismatch with striped loop restoration+superres · 8ce049e6
      Rupert Swarbrick authored
      When upscaling a frame, we extend frame borders to stop the upscale to
      save boundary lines convolving with uninitialised data off the edges,
      which was causing encode/decode mismatches. With this patch, we only
      do the extension when there's going to be an upscale (otherwise
      there's no need), which should give a small coding gain when not
      More importantly, it forces us to extend in the decode path whether or
      not we are using loop restoration, which matches what the encoder does
      and fixes a mismatch.
      Change-Id: Ie5a0791b0cbedbf254f9080f3cbf668318673f2f
    • Debargha Mukherjee's avatar
      A better trade-off for ext-partition-types search · b88f50a0
      Debargha Mukherjee authored
      Changes a pruning criterion that seems to give a little better
      compression efficiency at a little faster speed.
      Change-Id: I8e3f9aa552b093c4af4ba615bb6ce29587bc8c36
    • Dominic Symes's avatar
      Turn on the max_tile experiment · ab8bb8b8
      Dominic Symes authored
      max_tile was provisionally adopted at the working group meeting 2017-Oct-10
      This patch also enables support for 64x64 and 128x128 superblock size for max tile
      (rather than assuming 128). There is also one fix for max_tile in combination of
      loop restoration where the width/height was in the wrong units for max-tile specific code.
      Change-Id: Icb862a2738fea5fc6215819396e1afa4eb86e461
    • Yue Chen's avatar
      Use lower-precision filters in filter_intra · 00bc4aac
      Yue Chen authored
      Filter coeffcients c0, c1, c2 are scaled by 8, and can be
      represented by 4 bits unsigned integer (c2 is always <=0)
      Change-Id: I93643bab6734214cef0b0175d6980ebabe9dfe10
    • Cheng Chen's avatar
      JNT_COMP: Round the weighted sum · 7caa7382
      Cheng Chen authored
      Previously the weighted sums in convolve are right shifted without
      rounding. This patch adds rounding value before right shifts.
      Change-Id: Iea39aca419ac0ca0c32756f345293ce5e28dbd5b
    • Cheng Chen's avatar
      JNT_COMP: add SIMD implementations for c functions · ef34fff7
      Cheng Chen authored
      Add SIMD implementations for c functions for low bit-depth, making
      encoder speed faster by 3~4x than c functions.
      Change-Id: Icca0b07b25489759be9504aaec09d1239076fc52
    • Cheng Chen's avatar
      JNT_COMP: Refactor code · f78632e0
      Cheng Chen authored
      The refactoring serves two purposes:
      1. Separate code paths for jnt_comp and original compound average
      computation. It provides function interface for jnt_comp while leaving
      original compound average computation unchanged. In near future, SIMD
      functions can be added for jnt_comp using the interface.
      2. Previous implementation uses a hack on second_pred. But it may cause
      segmentation fault when the test clip is small. As reported in Issue
      944. This refactoring removes hacking and make it possible to address
      the seg fault problem in the future.
      Change-Id: Idd2cb99f6c77dae03d32ccfa1f9cbed1d7eed067
  3. 05 Nov, 2017 4 commits
    • Sebastien Alaiwan's avatar
      Remove LGT_FROM_PRED experiment · 7fc6b2ac
      Sebastien Alaiwan authored
      This experiment has been abandonned for AV1.
      Change-Id: I18cf1354df928a0614a1e58b718cd96ee7999925
    • Zoe Liu's avatar
      Add frame offset of references to dumped info · b4f31036
      Zoe Liu authored
      This patch also updates cm->frame_offset for show_existing_frame at
      the encoder.
      Change-Id: I863876675145ba663fc229a854b83b39759309a5
    • Debargha Mukherjee's avatar
      Misc. clean ups / refactor of speed 1 · d7338aa8
      Debargha Mukherjee authored
      With this patch, and the speed settings turned on for speed 1,
      the coding efficiency of speed 1 in default configuration should be
      only a little worse than speed 0, but it should roughly run at
      double the speed.
      Specifically, this patch makes various changes to make sure that
      speed 1 behaves exactly the same as speed 0 except for speed settings
      turned on or off in speed_features.c.
      This will change the bitstream generated a little for speeds
      1 or higher because of the following reasons:
      1. Removes a hacky speed setting correction factor in firstpass.c
      2. Fast cdef search is moved from speed 1+ to 2+, and a new speed
      feature is added to control that.
      3. Mesh search settings are pushed down one level so that speeds 0
      and 1 use the same settings.
      4. A disable_split_mask feature for animated content previously
      turned on speeds 1+ is moved down to speeds 2+.
      Change-Id: I0ec36556f157bdc42c5daa0cfb9518cf7ff65f6b
    • Debargha Mukherjee's avatar
      Further speed-up of ext-partition-types · 6f77d081
      Debargha Mukherjee authored
      Removing the NONE partition check from horz_4 and
      vert_4 partition search conditions provides another
      5-10% speedup at very little loss.
      Change-Id: Ie5f14191efe6d2b0695b27021de96ad0a1550f26
  4. 04 Nov, 2017 8 commits
    • Zoe Liu's avatar
      Refactor the code for reference frame flag setup · 368bf16d
      Zoe Liu authored
      At the encoder side, for the 7 reference frames, we always set up the
      priority rank as follows:
      That is, if two reference frames point to the same reference frame
      buffer, the flag for the latter frame in the rank will always be
      turned off.
      This patch does not change any coding performance / coding speed for
      the default configure setup. It only affects the following setup:
      one-sided-compound is on && ext-comp-refs is off
      As one-sided-compound is enabled by default when ext-comp-refs is
      enabled, and ext-comp-refs is enabled by default, above setup should
      not be considered.
      Change-Id: I6de18d3be938e1d4a8897e5ba0857b8d21e7f9d0
    • Sebastien Alaiwan's avatar
      Use rtcd script to choose between implementations · 78a7bd7d
      Sebastien Alaiwan authored
      Change-Id: I752ad96a8b4349d4a437a97e30edc8e4c22f81b5
    • James Zern's avatar
      cdef_test: quiet implicit bool to int conv warning · 9feda796
      James Zern authored
      Change-Id: Ic7096fe85dc653c9c7d7d1f098df19daff27e1cf
    • Yue Chen's avatar
      Remove NCOBMC_ADAPT_WEIGHT from AV1 · 80daf0c4
      Yue Chen authored
      Development of this experiment will be deferred to AV2.
      Change-Id: I3c4615a21b59508500bed8aab0a5c54413b4f284
    • Zoe Liu's avatar
      Speed up one-sided compound mode (ext-comp-refs) · 77fb5be1
      Zoe Liu authored
      One-sided compound ref prediction is used only when all reference
      frames are one-sided.
      This patch has demonstrated an encoder speedup of ~28%.
      Using the following configure setups, the coding performance has been
      dropped on Google test sets (50 frames) in BDRate by ~0.2% for lowres
      and by ~0.1% for midres (Corresponding performance impact should be
      smaller on AWCY):
      --enable-experimental --disable-convolve-round --disable-ext-partition
      --disable-ext-partition-types --disable-txk-sel --disable-txm
      Change-Id: I585bbffb2f8d154e8f52a1e79a84eff8bb4a471d
    • Jingning Han's avatar
      Separate ref frame mvs control from prev_frame_mvs · e17ebe92
      Jingning Han authored
      The control of using reference frame motion vector is a separate
      factor from the existence of previous frame motion vectors. This
      commit decouples these two, such that the encoder can control the
      use of reference motion vector. When it is used, one can further
      identify if the previous frame exists or not, then to decide if
      need to force use_prev_frame_mvs to be zero.
      This solves the issue where the previous frame mvs is set to be
      0 and it accidentally shuts off the access to all other existing
      referece frames mvs in the mfmv system. It brings back the coding
      performance gains to normal.
      Change-Id: I2531f73e55582a9bb5b3e0ff47e361a199ec8082
    • Debargha Mukherjee's avatar
      Speed up of ext-partition types · c4b67641
      Debargha Mukherjee authored
      Search the new horz/vert a/b/4 partitions only if the best so far
      is either oriented along the same direction or split/none, or if
      the rd costs obtained from the previous partition searches indicate
      there is potential in searching these partitions.
      This brings about 25-30% speedup at less than 0.1% drop as seen on
      lowres 30 frames.
      Change-Id: I6c6c347e06c34ee0ca17479aeeb4075a66dc7e2c
    • Debargha Mukherjee's avatar
      Simplify tx_mode frame level bit · 923b73d8
      Debargha Mukherjee authored
      Adds a new experiment to simplify the tx_mode symbol.
      The existing frame level tx_mode information is converted to a single bit
      to select between largest tx_size for a prediction unit or specified
      at the block level. The less useful modes: ALLOW_8X8, ALLOW_16X16,
      etc. are removed.
      Change-Id: Ib9358e17b0158a167eb4edef79f36ff113aa56e1
  5. 03 Nov, 2017 10 commits
    • Yunqing Wang's avatar
      Allow to disable the probability update · 0e141b56
      Yunqing Wang authored
      Added the function of allowing to disable the probability update while
      needed. This would be needed while encoding in multiple tiles, and
      enabling/disabling probability update can be set separately for every
      individual tile.
      Change-Id: Ic3c64e6cebac89c483d48b874761bd2e902d81e6
    • Dake He's avatar
      [level map] set USE_CAUSAL_CTX to 1 · 2d9bd32e
      Dake He authored
      Per the codec WG call today, turn on Plan B for level map by default.
      Change-Id: Iae885b38917cf79e4f0b290cc2d73ac28321710f
    • Ola Hugosson's avatar
      Enable striped_loop_restoration · 201a2b49
      Ola Hugosson authored
      Change-Id: Idc5ead2db38562924f27796eb78a05b658b5a20e
    • David Barker's avatar
      Fix bug introduced by commit 01563088 · b44deca9
      David Barker authored
      When fixing one bug in av1_decode_tg_tiles_and_wrapup, I seem to have
      introduced another bug. This was due to checking the wrong condition
      on whether to update the frame context at the end of the frame.
      Change-Id: I929a710e2de31a89cc7899fb1605ca7edf968a87
    • Rupert Swarbrick's avatar
      Fix highbd striped loop restoration bug · ff090b86
      Rupert Swarbrick authored
      The "data8_bl" variable is a uint8_t* and will be scaled up
      later (with REAL_PTR) if it's pointing to highbd data. Don't scale up
      the x offset.
      Change-Id: I03e2ce8861e25e3a603e8f0ba2c8af585e08b9c5
    • Yue Chen's avatar
      Disable filter_intra mode in <8x8 tx blocks · 95e13e23
      Yue Chen authored
      0.159% gain on lowres 60 frames, compared to 0.236% gain if we don't
      restrict it in small tx blocks.
      (--disable-ext-partition --disable-ext-partition-types
       --disable-convolve-round --disable-ext-comp-refs)
      Change-Id: I1d1c5474ca27de9dec992ea30a9883afd7a56474
    • Rupert Swarbrick's avatar
      Correct has_bottom_left calculation for mixed vertical partitions · 8315daf7
      Rupert Swarbrick authored
      This patch regenerates the orders tables and generates both the normal
      ones and also those for vertical partitions. I've added a long comment
      above the definition of orders[] that explains how they work (there's
      no change, but it took me a while to understand, so it's probably a
      good thing to document). I've also slightly changed when we use the
      orders_vert tables: they are now used for both PARTITION_VERT_A and
      The patch also removes the #if around the partition argument to
      has_top_right and adds it to has_bottom_left. (I could have put it
      inside an #if, but I shouldn't imagine there's any measurable
      performance cost and the code is cleaner this way).
      The tables were regenerated with a Haskell script which I've included
      at the bottom of the commit message (so the next person doesn't have
      to write it from scratch yet again). The output looks reasonably
      clean, but clang-format does change it somewhat so you need to run
      that afterwards. The tables are also output in a different order, so
      you'll need to clean that up by hand too.
      -- orders.hs: Print tables to stdout by calling printOrders
      import Data.Foldable
      import Data.List (findIndex)
      import Data.Maybe
      import System.Environment
      import Text.Printf
      import Text.Read
      data Block = Block { lbw :: Int, lbh :: Int, vert :: Bool }
      minLogBlockSize :: Bool -> Int
      minLogBlockSize v = if v then 3 else 2
      maxLogBlockSize = 7 :: Int
      -- This code generates the inverse of what we want: a mapping from visit order
      -- to raster order. That is, element i of the list will be the raster index of
      -- the block that we visit at step i.
      vrSplit :: Block -> Int -> Int -> Int -> [Int]
      vrSplit b stride lsz off
        | lbw b >= lsz && lbh b >= lsz = [off]
        -- Some form of horizontal partition
        | lbw b < lsz && lbh b >= lsz =
            [off,off + 1..off + 2^(lsz - lbw b) - 1]
        -- Some form of vertical partition
        | lbw b >= lsz && lbh b < lsz =
            [off,off + stride..off + (2^(lsz - lbh b) - 1)*stride]
        -- PARTITION_VERT_*
        | vert b && lbh b + 1 == lsz && lbw b + 1 == lsz =
            [off, off + stride, off + 1, off + stride + 1]
        | otherwise =
          concatMap (vrSplit b stride (lsz - 1))
          [off, off + 2^(lsz - lbw b - 1), off + 2^(lsz - lbh b - 1) * stride,
           off + 2^(lsz - lbw b - 1) + 2^(lsz - lbh b - 1) * stride]
      vrOrders :: Block -> [Int]
      vrOrders b = vrSplit b (2 ^ (maxLogBlockSize - lbw b)) maxLogBlockSize 0
      -- A simple function to invert the bijection generated by vrOrders (it's very
      -- naive, but the list isn't exactly long)
      invertList :: [Int] -> [Int]
      invertList is = map (\ i -> fromJust $ findIndex ((==) i) is) [0..length is - 1]
      genOrders :: Block -> [Int]
      genOrders = invertList . vrOrders
      -- Code to print everything out in the style used in the AOM codebase
      forButLast_ :: Applicative f => [a] -> (a -> f b) -> f ()
      forButLast_ [] f = pure ()
      forButLast_ (a : as) f = fbl a as f
        where fbl a [] f = pure ()
              fbl a (a' : as) f = f a *> fbl a' as f
      numDigits :: Int -> Int
      numDigits n =
        if n == 0 then 1
        else ceiling $ logBase 10 $ fromIntegral $ 1 + n
      printRow :: Int -> Int -> [Int] -> Bool -> IO ()
      printRow indent fw as islast = do
        { if null as then return ()
          else do
            { printf "%*s" indent ""
            ; forButLast_ as (\ a -> printf "%d,%*s" a (postDent a) "")
            ; printf "%d%s" (last as) (if islast then "\n" else ",\n") } }
        where postDent a = 1 + fw - numDigits a
      printInts :: Int -> Int -> Int -> [Int] -> IO ()
      printInts width indent fw [] = return ()
      printInts width indent fw as =
        let (row, rest) = splitAt eltsPerLine as in
          printRow indent fw row (null rest) >> printInts width indent fw rest
        where eltsPerLine = quot (width - indent + 1) (fw + 2)
      printBlockOrders :: Block -> IO ()
      printBlockOrders b = do
        { printf "static const uint16_t orders_%s%dx%d[%d] = {\n"
          (if vert b then "vert_" else "")
          ((2 :: Int) ^ lbw b) ((2 :: Int) ^ lbh b) numElts
        ; printInts 79 2 intWidth (genOrders b)
        ; printf "};\n" }
        where lsz = maxLogBlockSize
              numElts = (2 :: Int) ^ (lsz - lbw b + lsz - lbh b)
              intWidth = max 1 $ ceiling $ logBase 10 $ fromIntegral (numElts - 1)
      blocksForWidth :: Bool -> Int -> [Block]
      blocksForWidth v lbw = map (\ lbh -> Block lbw lbh v) [minLbh..maxLbh]
        where maxLogAspectRatio = if v then 0 else 2
              minLbh = max (minLogBlockSize v) (lbw - maxLogAspectRatio)
              maxLbh = min maxLogBlockSize (lbw + maxLogAspectRatio)
      blocksForV :: Bool -> [Block]
      blocksForV v = concatMap (blocksForWidth v) [minLbw..maxLbw]
        where minLbw = (minLogBlockSize v)
              maxLbw = maxLogBlockSize
      blocks :: [Block]
      blocks = blocksForV False ++ blocksForV True
      printOrders :: IO ()
      printOrders = traverse_ printBlockOrders blocks
      -- Ends orders.hs
      Change-Id: I6c53e80caa0d203cdc11f88471b6c117c633baa6
    • Rupert Swarbrick's avatar
      Avoid some mismatched braces in bitstream.c · cf772765
      Rupert Swarbrick authored
      These play havoc with editors' "jump to start of function"
      commands. There should be no change to generated code.
      Change-Id: Ib6961bb952da02081a675d0a4fa01eea2c1ff6d1
    • Sebastien Alaiwan's avatar
      Simplify FRAME_ID constants · d418f68e
      Sebastien Alaiwan authored
      Change-Id: I75890c0f64f93f48299895d1e0bcfbf91846a4ab
    • Debargha Mukherjee's avatar
      Add two levels for selective ref frame sp. feature · 06b40cc3
      Debargha Mukherjee authored
      The first level is turned on for speed 1.
      Change-Id: I3dba0f0250b97a25e174cacc2a46ca7f76572c85