1. 16 Nov, 2017 8 commits
    • Hui Su's avatar
      intrabc: replace left shift with multiply · 30483c9b
      Hui Su authored
      In read_intrabc_info() and assign_dv().
      
      BUG=aomedia:1037
      
      Change-Id: Ic430147a9a15024d942bde361be0c4a603f812e4
      30483c9b
    • Yue Chen's avatar
      Improve filter_intra throughput · 11bac017
      Yue Chen authored
      The prediction can be done in 2x2 or 4x4 processing unit, within
      which there is no dependency and the computation can be fully
      parallelized.
      Also turn < 8x8 filter_intra on, and disable it in > 32x32 txbs.
      
      Change-Id: I4f8a3104019cbb35e88f342d97516f81b19152b0
      11bac017
    • David Barker's avatar
      loop-retoration: Fix overflow in self-guided filter · 9c1f92ba
      David Barker authored
      A while ago, I calculated some bounds on the intermediate values inside
      the self-guided filter. These bounds turned out to be not quite correct
      in one particular instance (when we have a large region of max-value
      pixels).
      
      This caused a variable to overflow a uint32_t when decoding 12-bit
      streams in the reference decoder, and would force 8/10-bit-only
      hardware to use wider buffers than intended in order to match the
      reference code.
      
      Fortunately, this can be fixed quite easily, with minimal changes
      to the filter output. See comments within the patch for the exact
      details.
      
      Also re-instate a Wikipedia link which seems to have gone missing
      but which provided useful context for the derivation of the bounds.
      
      Change-Id: I83d4a277a37eff048af9989cccf19202fafb17b5
      9c1f92ba
    • David Barker's avatar
      loop-restoration: Fix + refactor stripe boundary setup · 16ff7ef3
      David Barker authored
      * Setup and restore the correct number of left/right boundary
        pixels at vertical tile edges, and save them in the correct
        buffers.
        Also fix the restore process in high-bitdepth mode.
      
      * When loop filtering across tiles is enabled, we were previously
        acting inconsistently at horizontal tile borders: The stripe
        just above the boundary would use CDEF pixels from the tile below
        for context, while the stripe just below would use deblocked
        pixels from the stripe above.
      
        The intended design appears to have been to use CDEF pixels on
        both sides (so we logically have a 64-pixel high stripe, it's just
        split into an 8-pixel and a 56-pixel high stripe in order to keep
        the coefficient sets aligned to tiles)
      
        Implement that behaviour by disabling the context setup process
        when at a horizontal tile border.
      
      * Pull some common calculations out of
        {setup,restore}_processing_stripe_boundary and into their
        common caller. This allows us to reduce the number of arguments
        going into each function and their internal complexity.
      
      * Add more design comments around stripe boundary setup,
        as there are quite a lot of constraints to be aware of
      
      Change-Id: Ic1586c149b7f764b9c1a711df3f11fb0f130b38a
      16ff7ef3
    • Monty Montgomery's avatar
      Eliminate tx_size dependant shifts for Daala TX · a26262c3
      Monty Montgomery authored
      short-circuit av1_get_tx_scale to always return zero when
      CONFIG_DAALA_TX, and remove it from the actual Daala TX toplevel
      
      This has potential overflow consequences for any metrics computation
      based on pixels; as such, also force use of the high-bitdepth path in
      each of these case.
      
      subset-1:
      monty-rest-of-stack-baseline-s1@2017-11-13T00:39:03.881Z ->
      monty-rest-of-stack-noshift-s1@2017-11-13T14:37:42.541Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0030 | -0.0523 |  0.2656 |  -0.0239 | -0.0033 | -0.0029 |     0.0067
      
      objective-1-fast --limit=4:
      monty-rest-of-stack-baseline-o1f4@2017-11-13T00:37:06.999Z ->
      monty-rest-of-stack-noshift-o1f4@2017-11-13T14:37:16.992Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0264 |  0.2303 |  0.0822 |  -0.0109 | -0.0395 | -0.0709 |     0.0538
      
      Change-Id: I57da71861f105dc7a404fa75a75bde573855ef79
      a26262c3
    • Zoe Liu's avatar
      Fix the spacing format for frame_refs · 0b7756b7
      Zoe Liu authored
      Change-Id: Ifd8275b9368139c8f1ab1e60ea09e177edd03bd1
      0b7756b7
    • Yunqing Wang's avatar
      Modify lightfield encoding example · b041d8a7
      Yunqing Wang authored
      Modified the lightfield encoding example to accommodate HW implementation
      requirements. Fixed the encoding scheme, generated a bitstream of a list
      of references followed by the surrounding large scale tile coded frames.
      All large scale tile coded frames use the same uncompressed frame header
      and the same set of frame contexts. This example also wrote out the frame
      header and frame contexts while encoding a large scale tile frame and
      setting EXT_TILE_DEBUG to 1.
      
      Change-Id: I7cc19099195d0a20335d5c6bfb9f493f1bf3a7b2
      b041d8a7
    • Yunqing Wang's avatar
      Force to have a common frame header in large scale tile coding · b6e23bc4
      Yunqing Wang authored
      In large scale tile coding(namely, large_scale_tile = 1), forced
      all frames to generate a bit-exact uncompressed frame header.
      This patch modified parameters that could change from one frame to
      another.
      
      Change-Id: Ibe72519da0b8a4f5a4ef30a4303ad7d7e4992a65
      b6e23bc4
  2. 15 Nov, 2017 15 commits
  3. 14 Nov, 2017 17 commits
    • Debargha Mukherjee's avatar
      Temporarily turn off sse4_1 code for sgr · 256e1d23
      Debargha Mukherjee authored
      Until a valgrind error coming from the sse4 code is fixed.
      This should resolve the valgrin below.
      
      BUG=aomedia:1021
      
      Change-Id: Ic461edb1da017d703a098bf5f9491fa51d0debcc
      256e1d23
    • Sebastien Alaiwan's avatar
      Move encoder-only code to av1/encoder · 95137bde
      Sebastien Alaiwan authored
      Change-Id: Ic4e16f30827e2e2e2dd140aee94d309b049dd063
      95137bde
    • Soo-Chul Han's avatar
      enable segment_globalmv (adopted) · b65e470a
      Soo-Chul Han authored
      Change-Id: I56a67e75d9f366dff8d92c9185b879365de437a7
      b65e470a
    • Zoe Liu's avatar
      Change mv projection to signed rounding · 11273449
      Zoe Liu authored
      The numerator in the mv projection can be negative, e.g. cur_to_bwd
      or cur_to_alt2, since either bwdref or altref2 can be a forward
      predictive reference, whereas the denominator always stays positive.
      The rounding inside mv projection hence should use signed operation.
      
      Change-Id: I42a105835754a002dd31fcfa7c845e4c105ec54f
      11273449
    • Rupert Swarbrick's avatar
      Don't send chroma data in monochrome mode · dcb3cff5
      Rupert Swarbrick authored
      This is still a rather inefficient black+white encoder, since it carefully
      computes some chroma data, but just doesn't write it. However, at least the
      bitstream is now monochrome.
      
      Change-Id: Ie8a89bf329e7b41441032fb0d9e9011385bc12ff
      dcb3cff5
    • Hui Su's avatar
      intrabc: use its own mv cost table · dfcbfbd4
      Hui Su authored
      To faciliate using intrabc on interframes.
      
      Change-Id: Ibfe376190adf24d15198c5fb548e1050e191a3d6
      dfcbfbd4
    • Rupert Swarbrick's avatar
      Replace force*split with has_rows/has_cols in rd_pick_partition · 1c2dfae3
      Rupert Swarbrick authored
      I think the result is a little easier to reason about (you now talk
      about a property of the block, rather than the behaviour that should
      be enforced). It also matches the code in read_partition in
      decodeframe.c
      
      Change-Id: I13ba06b1504fa153b8b6b60fa14b373483639718
      1c2dfae3
    • Rupert Swarbrick's avatar
      Remove nested #if !CONFIG_NEW_MULTISYMBOL lines · 3b48a6d4
      Rupert Swarbrick authored
      No change to the code, but these #if !CONFIG_NEW_MULTISYMBOL lines are
      all in the #else part of an #if CONFIG_NEW_MULTISYMBOL...
      
      Change-Id: Ibf11b1f0711113d9ee52927dcaf243d74e3f9d28
      3b48a6d4
    • Rupert Swarbrick's avatar
      Save right # of lines in save_deblock_boundary_lines · 7a7fffef
      Rupert Swarbrick authored
      The "src_height" computed in save_deblock_boundary_lines didn't match
      the one in save_tile_row_boundary_lines, which meant that the wrapper
      function assumed the deblock code was saving some lines and that code
      thought that save_cdef_boundary_lines would do it.
      
      This patch fixes up the logic to match, and also completely gets rid
      of the lines_to_save variable (after all, bad things would happen if
      lines_to_save was 1 because we'll still read both boundary lines
      later)
      
      The tile height gets rounded up to a multiple of 8 luma pixels in
      save_tile_row_boundary_lines to avoid nasty corner cases. This will
      only have any effect for rows at the bottom of the frame (where
      av1_get_tile_rect clips to the frame boundary).
      
      BUG=aomedia:1020
      
      Change-Id: I55adb53fa8ba9c7f97fb2fd5b328a3f2f5065464
      7a7fffef
    • Ola Hugosson's avatar
      WIP: lv_map_multi: make br multi symbol · e72a2091
      Ola Hugosson authored
      The br_cdf and lps_cdf with a new 4-state symbol br_cdf.
      The br symbol indicates whether the level is k, k+1, k+2 or >k+2
      In the latter case, a new br symbol is read. Up to 4 br symbols are
      read which will reach level 14 at most. Levels greater than 14 are
      golomb coded.
      
      The adapted symbol count is reduced further by this commit.
      E.g. for the I-frame of ducks_take_off at cq=12, the number of adapted symbols
      is reduced from 4.27M to 3.85M. About 10% reduction.
      
      Gains seems about neutral on a limitied subset.
      
      Change-Id: I294234dbd63fb0fa26aef297a371cba80bd67383
      e72a2091
    • Ola Hugosson's avatar
      WIP: lv_map_multi: New experiment · 13892108
      Ola Hugosson authored
      This experiment modifies lv_map to make use of multi symbol.
      
      Replace the nz_map and coeff_base binary CDF with a new multi-symbol
      CDF of size 4. The new base_cdf indicates for each coeff if the level
      is 0, 1, 2 or >2. Two new special contexts are added to be used for the
      last coefficient (the EOB coeff). For the EOB coefficient we already know
      that it is non-zero. We use one context for DC EOB and one for AC EOB
      (this can potentially be refined more).
      
      The new symbol is read/written by special bitreader/bitwriter functions.
      Those functions reduce the probability precision from 15bit to 9bit before
      the invocation of the arithmetic coding engine.
      
      The adapted symbol count is significantly reduced by this experiment.
      E.g. for the I-frame of ducks_take_off at cq=12, the number of adapted symbols
      is reduced from 6.7M to 4.3M.
      
      Change-Id: Ifc3927d81ad044fb9b0733f1e54d713cb71a1572
      13892108
    • Sebastien Alaiwan's avatar
      aom_convolve.c: extract functions 'scalar_product' · b093b144
      Sebastien Alaiwan authored
      And make variable consts when possible.
      
      Change-Id: I823e56327e338ba669a0edd68c6c6d67077ebb7e
      b093b144
    • Rostislav Pehlivanov's avatar
      q_segmentation: disable delta_q encoding when enabled · da06779c
      Rostislav Pehlivanov authored
      The decoder side correctly disabled delta_q but the encoder didn't.
      
      Change-Id: I9f720c678d9e99d723c632095c058eaecd1a639d
      da06779c
    • Rostislav Pehlivanov's avatar
      q_segmentation: set seg->q_lvls to 0 when disabled · 30556193
      Rostislav Pehlivanov authored
      Otherwise the previous value ended up being used, creating a desync.
      
      Change-Id: I42d466474ce1a2567045720b8dfd413625f21cfa
      30556193
    • Monty Montgomery's avatar
      Simplify Daala inverse TX toplevel for constant shift · 359854fe
      Monty Montgomery authored
      Rather than backing out all the LGT-related shifting matrices
      throughout the existing TX code, separate out and simplify Daala
      inverse TX into a single dedicated entry point.  When DAALA_TX is
      enabled, CONFIG_HIGHBITDEPTH is also forced, and all of Daala TX
      (lowbd and highbd) uses this single TX dispatch.
      
      This patch is purely non-functional changes.
      
      subset 1:
      monty-TXtesting-fwd-s1@2017-11-12T05:25:09.557Z ->
       monty-TXtesting-inv-s1@2017-11-12T05:25:43.878Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      objective-1-fast:
      monty-TXtesting-fwd-o1f@2017-11-12T05:25:29.386Z ->
       monty-TXtesting-inv-o1f@2017-11-12T05:25:58.897Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: I790e8d7ac08eb214eb712f5441d6e5f76ebddf17
      359854fe
    • Hui Su's avatar
      Turn on q_adapt_probs by default · dc71d8c8
      Hui Su authored
      Change-Id: Idc201abd06cb1ac351a71bc723d9fed99c215b8e
      dc71d8c8
    • Cheng Chen's avatar
      JNT_COMP: reduce context model number · c87b340e
      Cheng Chen authored
      Reduce context model number from 9 to 6.
      Let context be two kinds: two reference frames are equal distance
      or not.
      Also, give equal distance compound weight {9, 7} instead of {8, 8}/16
      
      Reducing context model gives neutral performance.
      New compound weight provides -0.14% gain.
      
      Change-Id: I8a3f3021eac9e446ac826e5992f42931af4c8962
      c87b340e