1. 06 Sep, 2017 3 commits
    • David Barker's avatar
      Adjust chroma position in warp filter · a60dc9d6
      David Barker authored
      When using chroma subsampling, the warp filter currently behaves
      strangely when projecting chroma pixels, especially when the
      subsamplings are not equal along the x and y axes.
      
      For example, when subsampling_x = 1 and subsampling_y = 0, we
      calculate the destination coordinates (dx, dy) from the source
      coordinates (sx, sy) as:
      dx = project(2*sx+0.5, 2*sy+0.5)/2 - 0.5
      dy = project(sx, sy)
      where project() applies the affine warp model.
      
      This patch changes to a simpler and more consistent model,
      where we:
      * Project the chroma sample into luma coordinates, taking
        the chroma sample to be co-located with the top-left luma
        sample in its (2x2, or 2x1, or 1x2) subsampling block
        (this is done for simplicity; we don't expect the exact
         position to make much difference to the output quality)
      * Apply the transformation in luma coordinates
      * Project the resulting luma sample back into chroma coordinates
      
      Change to software speed is in the noise, but this approach
      should be simpler in hardware, and should slightly improve
      quality for 4:2:2 and 4:4:0 videos.
      
      Change-Id: Idd455fdd3897594ca7d4edff5b85b78961d1638d
      a60dc9d6
    • Rupert Swarbrick's avatar
      Round up subsampled frame size in av1_loop_restoration_corners_in_sb · 7380b25e
      Rupert Swarbrick authored
      The previous code converted a frame_w (say) of 1 to zero for a plane
      where subsampling was enabled, causing a division by zero in
      av1_get_rest_ntiles. This doesn't match the spec, which says
      subsampling rounds up.
      
      The patch adds the rounding, and also adds an assertion to
      av1_get_rest_ntiles to help diagnose any other broken callsites.
      
      Change-Id: Ia6c249fa935c3a16d122ba6e7b450fe99f412fde
      7380b25e
    • Debargha Mukherjee's avatar
      Make loop-restoration use 64x64 processing units · 7a5587a8
      Debargha Mukherjee authored
      Changes loop-restoration to use processing unit size that is
      64x64 for luma; for chroma the processing unit is coupled to
      64x64 support region for luma.
      Thus for chroma the processing unit size is 32x32 for 4:2:0,
      32x64 for 4:2:2 and 64x64 for 4:4:4, etc.
      
      While the Wiener filter output should not change with this patch,
      the sgr filter will change since the boundary pixel handling in
      sgr is internal within the filter.
      
      Change-Id: I65a9e2df88927a19445420ce400acb1fcf7afa93
      7a5587a8
  2. 05 Sep, 2017 8 commits
  3. 04 Sep, 2017 3 commits
    • Jingning Han's avatar
      Static local functions in mfmv · 5c700910
      Jingning Han authored
      Change-Id: I0fefe099b314295583e8e17e55e4d8fc375a5b0c
      5c700910
    • Jingning Han's avatar
      Constrain motion vector projection range · b74a72bf
      Jingning Han authored
      Constrain the maximum motion vector projection range to be within
      +/-32 pixels in the vertical direction and +/-64 pixels in the
      horizontal direction.
      
      Such constraints allow a fixed amount of reference motion vector
      load to SRAM for each 64x64 block size, independent of the frame
      size. The wider range in the horizontal direction can be stored in
      the SRAM and reused by next 64x64 block. The compression performance
      loss is 0.03% for lowres and 0.04% for midres.
      
      Change-Id: I7f1c136363b136b1f2fa9f7c962a791c8e91a976
      b74a72bf
    • clang-format's avatar
      apply clang-format · 4eafefe0
      clang-format authored and James Zern's avatar James Zern committed
      Change-Id: If0b48a4ee1f7902d8c6154945ccef68a2b5aabb5
      4eafefe0
  4. 03 Sep, 2017 1 commit
    • Rupert Swarbrick's avatar
      Move loop restoration coefficients to within the frame · 6c545216
      Rupert Swarbrick authored
      Rather than encoding the loop restoration coefficients at the start of
      the frame header, this patch moves them to occur just after certain
      top-level superblocks.
      
      You might hope that we could just encode coefficients on top-level
      superblocks where the top-left corner of the superblock was also the
      top-left corner of the loop restoration tile. Unfortunately, this
      can't work with the superres experiment, where the loop restoration
      tiles don't necessarily line up with the superblocks. Indeed, in
      general there can be multiple different loop restoration coefficients
      that apply in a given top-level superblock. This patch defines a
      function, av1_loop_restoration_corners_in_sb, which yields the
      rectangle [rrow0, rrow1) x [rcol0, rcol1) of loop restoration tiles
      whose top left corners lie in this top-level superblock.
      
      The total file size should be unchanged by this patch: the bits have
      just been moved from the frame header and spread out among the rest of
      the frame.
      
      Change-Id: Icf43b0560964a63dea0d2cd801313f04139188d7
      6c545216
  5. 02 Sep, 2017 3 commits
  6. 01 Sep, 2017 2 commits
    • Ryan's avatar
      this update fixes the bug described in bug report 723 · a97c897b
      Ryan authored
      link is https://bugs.chromium.org/p/aomedia/issues/detail?id=723
      
      BUG=aomedia:723
      
      Change-Id: Iece3abcd88de69ab410674615965687abb5e4579
      a97c897b
    • David Barker's avatar
      Miscellaneous fixes for var-tx · 16c64e33
      David Barker authored
      Lots of small bug fixes, mainly around the transform size coding:
      
      * The loop filter was accidentally using the non-subsampled
        block size for the V plane, due to comparing a plane index
        (0, 1, or 2) against PLANE_TYPE_UV (== 1)
      
      * We allowed an initial update of the transform partition probabilities
        even on frames where we know they will never be used
        (because tx_mode != TX_MODE_SELECT).
        Further, these probabilities would not be reverted at the end
        of the frame, leading to the probability delta persisting across frames.
      
        Change this to behave more like the non-var-tx transform size coding,
        where probability deltas are only coded for frames with
        tx_mode == TX_MODE_SELECT, and the deltas only apply for one frame.
      
      * Fix decoder for the case where the video as a whole isn't lossless,
        and we have tx_mode == TX_MODE_SELECT, but the current segment
        *is* lossless.
        Note that the encoder already does the right thing in this case.
      
      * Don't allow the transform splitting to recurse "below" 4x4.
        This is really just a refactor, but means we can increase the
        maximum depth when subdividing rectangular transforms if we
        want to, whereas the previous code would have needed special cases
        for 4x8 and 8x4 transforms.
      
      * Finally, when we hit the maximum splitting depth, don't update
        the counts as if we had coded a 'no split' symbol.
      
      Change-Id: Iaebdacc9de81d2e93d3c49241e719bbc02e32682
      16c64e33
  7. 31 Aug, 2017 8 commits
    • Yaowu Xu's avatar
      signed char -> int8_t for consistency · 0fbe33d6
      Yaowu Xu authored
      Change-Id: I5cf978071fbb55040d2be88f627b600484988520
      0fbe33d6
    • Yaowu Xu's avatar
      avoid operation on invalid ref_row · fc377967
      Yaowu Xu authored
      BUG=aomedia:718
      
      Change-Id: Ib3fc5e83dd915d6869ee2d7e0bf40427111c6499
      fc377967
    • Angie Chiang's avatar
      Use 7 neighbors for nz_map ctx · 2b38deff
      Angie Chiang authored
      This will let coding performance drop slightly
      lowres 0.093%
      
      Increase encoder speed by 24%
      
      Reduce nz_map's context size by 20%
      
      Change-Id: I871c18a7e0341e066afc334556b9998194b3f8c9
      2b38deff
    • Stanislav Vitvitskyy's avatar
      Using CDFs for read_partition special case · 8711cf5f
      Stanislav Vitvitskyy authored
      Test results:
      akiyo	    -0.05%
      bowing	    -0.072%
      bridge	    -0.042%
      bus	    -0.156%
      coastguard  -0.645%
      container   -0.087%
      deadline     0.007%
      flower       0.02%
      football    -0.009%
      foreman      0.03%
      hall         0.087%
      highway     -0.041%
      husky       -0.031%
      mad900       0.015%
      mobile      -0.007%
      mother       0.012%
      news         0.039%
      pamphlet     0.061%
      paris       -0.003%
      sign        -0.148%
      silent       0.003%
      students    -0.009%
      tempete     -0.061%
      waterfall    0.666
      
      Change-Id: I96c2fd3a6fbc5f8e5cf7f3b881ef89335e58d5ac
      8711cf5f
    • Luc Trudeau's avatar
      [CFL] Asserts for chroma_sub8x8 · c84c21c4
      Luc Trudeau authored
      When Chroma from Luma is combined with chroma_sub8x8, the prediction
      used for sub8x8 blocks originates from multiple luma blocks. Extra
      asserts are added to validate that the prediction buffer contains all
      the required information.
      
      Change-Id: I305c46ce9b8292697e1d5b181d123461026da11c
      c84c21c4
    • hui su's avatar
      Remove probablity model for coeffecient tokens · b53682f5
      hui su authored
      Remove the token prob tables and counters.
      
      Change-Id: Ic63d52d80bb922fc10b586c27a20f2378618168c
      b53682f5
    • Jingning Han's avatar
      Enable motion field estimation in DRL · ffbb0f91
      Jingning Han authored
      Enable the use of motion field estimation in the dynamic motion
      vector referencing system. With default experiments on, it improves
      the compression performance:
      
      lowres 1.2%
      midres 1.5%
      
      Change-Id: Ifc5b15a7239b5c3212ea50f326ab99d372034658
      ffbb0f91
    • Jingning Han's avatar
      Add frame index to the decoded frames · c723b348
      Jingning Han authored
      Add frame index to the deocded frames. Store such information to
      the reference frame buffer pool. This design allows each frame
      to know its index in natural order, as well as its reference
      frames positions.
      
      Change-Id: I5bb36928dc5750a4fdcc582dca0d244d6482f400
      c723b348
  8. 30 Aug, 2017 4 commits
    • Yunqing Wang's avatar
      Refactor setup_ref_mv_list · d797ea9e
      Yunqing Wang authored
      This patch eliminates the is_inside() checking for each neighouring
      block in scan_row_mbmi() and scan_col_mbmi(). Instead, in
      setup_ref_mv_list(), find maximum above row_offset and left col_offset
      for current block, and use them to decide which above rows and left
      columns to search on. This patch doesn't change bitstream. No
      noticeable speedup is seen.
      
      Change-Id: Ic4ae74412605d86e9e675f86d23de3a69c04e8f3
      d797ea9e
    • Yi Luo's avatar
      Highbd parallel_deblocking sse2 optimization · 6f5569f3
      Yi Luo authored
      - Decoder speed improves ~13.7% (baseline + parallel_deblocking).
      - Highbd loopfilter AVX2 version works when this experiment is
        disabled.
      
      Change-Id: I5d56b137a1d52236a4735656c370d57ef71ae043
      6f5569f3
    • Luc Trudeau's avatar
      [CFL] Fixed negative rounding in scaled_luma · 9c0e9eac
      Luc Trudeau authored
      Since the scaled luma can be negative, ROUND_POWER_OF_TWO_SIGNED must be used.
      This changes the behavior from rounding toward -infinity to rounding towards 0.
      
      Results for Subset1 (compared with 35545dd5 with CfL enabled)
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0082 | -0.1061 | -0.0119 |  -0.0126 | -0.0011 | -0.0121 |     0.0094
      
      Change-Id: Ie7258a17a199368339d4794fba6b5916e607c95b
      9c0e9eac
    • Sarah Parker's avatar
      Add cdfs and mask buffers for mrc-tx · 5c6744b5
      Sarah Parker authored
      These are not currently being used for anything so there is
      no impact on performance.
      
      Change-Id: Ida4e0afcc10bee665f8daa379314cd18b3a4ea28
      5c6744b5
  9. 29 Aug, 2017 1 commit
  10. 28 Aug, 2017 2 commits
    • Luc Trudeau's avatar
      [CFL] Move store flag to CFL_CTX · fcca37a4
      Luc Trudeau authored
      With recent changes, it is now possible to store the storage
      flag inside the CFL_CTX. This simplifies the implementation
      and will allow reuse in the decoder.
      
      This change does not alter the bitstream.
      
      Change-Id: Ibb8aebdd3d06f8765d40248ece8a038892e87032
      fcca37a4
    • Jingning Han's avatar
      Refactor zero ref mv check · 2484fa05
      Jingning Han authored
      Unify and simplify the logic for both single and compound modes.
      
      Change-Id: If781aac66b47c1a707f4f9a647cb8a3294477a48
      2484fa05
  11. 25 Aug, 2017 4 commits
    • Rupert Swarbrick's avatar
      Add support for 16x4 partitions · 6a93b155
      Rupert Swarbrick authored
      When updating default_partition_cdf, this sums the probabilities that
      were divided evenly across the pairs PARTITION_HORZ_A/PARTITION_HORZ_B
      and PARTITION_VERT_A/PARTITION_VERT_B. Those summed probabilities now
      get distributed evenly across the triples you get by adding
      PARTITION_HORZ_4 and PARTITION_VERT_4, respectively.
      
      Rather than implement 2X8/8X2 blocks for now, ss_size_lookup returns
      4X8/8X4 block sizes to use as chroma transform sizes for 4X16/16X4
      blocks.
      
      The changes in setup_pred_plane and set_skip_context are because this
      is presumably the first time we've had to deal with 16x4 or 4x16
      blocks. Since BLOCK_16X4 is not less than BLOCK_8X8, the existing
      logic didn't work (and the "shuffle back one" logic should probably be
      done for small widths and heights separately).
      
      Change-Id: If28d8954da42d6c726f2bcce2cb5242154b0870c
      6a93b155
    • Nathan E. Egge's avatar
      Force C implementations when using Daala DCT's. · e030936c
      Nathan E. Egge authored
      This patch fixes a regression introduced in 1d190950 where the encoder
       was using the 4x4 VP9/AV1 transforms for RDO, but then used the Daala
       transforms for encoding.
      The ~2% improvement below comes from forcing the C implementation of the
       4x4 and 8x8 transforms to be used when CONFIG_DAALA_DCT4 and
       CONFIG_DAALA_DCT8 are enabled respectively.
      
      subset-1 (--enable-experimental --enable-daala_dct4):
      
      master@2017-08-21T21:41:18.302Z ->
       master_daala_dct4_use_c@2017-08-22T02:39:14.457Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -2.1953 | -1.2044 | -1.1865 |  -1.6173 | -1.7029 | -1.6784 |    -1.7235
      
      Change-Id: I44d2b24094e89b2857ae03d743180e706cef45eb
      e030936c
    • Yue Chen's avatar
      Fix av1_get_tx_scale() for 32x8 and 8x32 tx · aa0d90f0
      Yue Chen authored
      Make it 0 to run at higher precision
      
      Change-Id: I51decbf9179efa18a1a06dcc3f0e939d9895a5cd
      aa0d90f0
    • David Barker's avatar
      Fix tile boundary calculation · 5c06a646
      David Barker authored
      Fix a rare case in which the tile boundary information was not
      set up properly in the decoder when using LOOPFILTERING_ACROSS_TILES
      
      The situation was:
      * One frame uses loop filtering across tiles. Then its tile
        boundary information is not needed, so is not calculated.
      * The next frame (in decode order) has the same size and the
        same tile layout, but doesn't use loop filtering across tiles.
      * Now the tile boundary information *is* needed, but we weren't
        recalculating it. This resulted in the loop filter being
        applied across tile boundaries even though we signalled not to.
      
      Since the conditions on when we can reuse the previous frame's
      boundary information are complex, and the overhead of calculating
      the tile boundaries is low, we avoid this issue by simply
      recalculating the boundary information each frame.
      
      Change-Id: I1f3cbb0537535bf38faaed4c21c07142e747f962
      5c06a646
  12. 24 Aug, 2017 1 commit
    • Urvang Joshi's avatar
      get_sqr_tx_size(): fix for tx64x64 · dd3206fc
      Urvang Joshi authored
      When 64x64 transforms are enabled, it should return TX_64x64.
      
      Midres set:
      Small PSNR improvement overall (-0.061%),
      But 3 clips have large gains (-1.0% to -0.4% range)
      
      Change-Id: Ic2a1f0213449f81213219479c6b6aa0acfaac2e7
      dd3206fc