1. 26 Jun, 2017 1 commit
    • Yushin Cho's avatar
      Fix daala_dist to handle visible pixels only · 75b01004
      Yushin Cho authored
      - If invisible pixels, av1_daala_dist() simply use source pixles for dst.
      - Added av1_daala_dist_diff() which inputs diff signal instead of dst.
      
      - Refactored daala_dist code so that av1_daala_dist() and _diff()
      is called inside av1's distortion calculation functions, pixel_sse() and
      sum_squares_visible().
      
      Change-Id: Id857db52fe19856d92c46a9e84ac2962c01ae045
      75b01004
  2. 24 Jun, 2017 6 commits
  3. 23 Jun, 2017 1 commit
  4. 22 Jun, 2017 7 commits
    • Yaowu Xu's avatar
      Prevent divide-by-zero · 9180b6e8
      Yaowu Xu authored
      Change-Id: Id22615d461bf16272d1b2e2c72ae7e00db8bcb5c
      9180b6e8
    • Yaowu Xu's avatar
      convert to int before apply sign · bdda9d4e
      Yaowu Xu authored
      avoids overflow of unsigned integer.
      
      Change-Id: Ic92974b508bb0cd6fc680203ffa6cff14d644ff7
      bdda9d4e
    • Jingning Han's avatar
      Fix compiler warning in joint_motion_search · cb637674
      Jingning Han authored
      Avoid compiler warning when global-motion is off.
      
      Change-Id: Ie6a0d3e4efc0e06b263e8c8c0c0dc153738c3804
      cb637674
    • Zoe Liu's avatar
      Add entropy stats dump out for individual frame context type · a56f916e
      Zoe Liu authored
      Change-Id: Id0cd184e8b3cea085ecc3adbc7fea7bb765c7986
      a56f916e
    • Yi Luo's avatar
      Add avx2 highbd_quantize_b · 193422e7
      Yi Luo authored
      - First pass encoding time reduces ~10.9% on i7-6700
        at 100 frames, 1080p.
      - avx2 works for coeff number >= 8 cases; coeff number < 8
        case will be implemented by sse2.
      - Unit test is added type B/FP/DC.
      
      Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c
      193422e7
    • Yushin Cho's avatar
      Fix daala-dist, rd tx search · 04eb9594
      Yushin Cho authored
      Previously, for block >=8x8, and tx < 8x8,
      we skipped setting the early-exit flag in block_rd_txfm() because
      distortion for sub8x8 tx block is from MSE but reference (best)
      is from daala-dist.
      However, not setting early-exit flag turned out to be the reason
      for a regression in MSE probe mode of daala-dist because
      it loses the chance to set rd_stats properly.
      
      On the other hand, there is still a small regression, say 0.05% psnr bd-rate,
      which seems to occur in the case that a tx block in a partition has chosen
      the skipped rd_cost since it is smaller than non-skip rd_cost and
      set the early-exit flag to 0 (so, not exit), but the daala-dist applied
      to the whole partition cannot access the same info but can choose from
      two kinds of rd_costs:
      1) all tx blocks are skipped (even if a tx block has non-zero coeff) and 0 bits
      2) sum of final distortion of all tx blocks (i.e. non-zero coeff decoded)
      and bits to encode coeffs.
      
      Change-Id: I2ec69972aa1f22d465293cb9e8d5e18ef2c6f7f3
      04eb9594
    • Yaowu Xu's avatar
      Add missing accumulation cross threads · a0cc9aa8
      Yaowu Xu authored
      BUG=aomedia:618
      
      Change-Id: Ie96ccc363462a28527c99a72e97b7acaf2ab0ff8
      a0cc9aa8
  5. 21 Jun, 2017 4 commits
    • Debargha Mukherjee's avatar
      Add chorma tilesize option in loop-restoration · 84f567c7
      Debargha Mukherjee authored
      Adds an option bit in the bitstream syntax to allow chroma to
      have restoration tilesize that is coupled to luma based on
      subsmapling of the color components.
      
      This is meant to ease encoder hardware implementation.
      
      Change-Id: Ic3cc2b68c0f33701ed3ff2fe19cf57cd864da67f
      84f567c7
    • Timothy B. Terriberry's avatar
      cb4x4: Move sub-4X4 block sizes behind chroma flags. · 81ec2619
      Timothy B. Terriberry authored
      cb4x4 itself should not require these sizes.
      
      This simplifies compatibility with other experiments, since we can
      first make them work with cb4x4 (which is now on by default), and
      then worry about chroma_sub8x8 and chroma_2x2 (which is not) in
      separate steps.
      
      Encoder and decoder output should remain unchanged.
      
      Change-Id: Iff2a5494cab3b7d96f881e8bd9cd4bf18c817cfa
      81ec2619
    • Timothy B. Terriberry's avatar
      ext_inter: Skip compound type probs. for small block sizes. · 4a81001b
      Timothy B. Terriberry authored
      When writing the compressed header, prob_diff_update() was called
      for compound_type_prob[] for every defined block size, even though
      luma never uses block sizes smaller than 4x4.
      
      This fixes is_any_masked_compound_used() and
      is_interinter_compound_used() to properly return 0 for chroma-only
      block sizes, and then uses these functions to guard the probability
      updates in write_compressed_header() and read_compressed_header(),
      the same way the actual compound type values are guarded in
      read_inter_block_mode_info() and pack_inter_mode_mvs().
      
      Change-Id: Ib521cf53f9ec166ef634609c8b47c5814b6a9ff5
      4a81001b
    • Fergus Simpson's avatar
      Use last_show_frame in use_prev_frame_mvs calc · 2b4ea11a
      Fergus Simpson authored
      Without tempmv-signaling configured, using the previous frame's MVs
      requires that the last frame was a show frame. With tempmv-signaling
      configured, cm->show_last_frame is not checked when calculating
      use_prev_frame_mvs. This patch adds that check and resolves mismatches
      seen with random resizing and random superres.
      
      Includes a couple fixes too - cm's last_width, last_height, and
      last_show_frame were updated under different conditions. Now they're all
      updated at the same time.
      
      Change-Id: Ibdfb196cb6e9d002fd57cb4df10a899b60faac00
      2b4ea11a
  6. 20 Jun, 2017 4 commits
    • Yunqing Wang's avatar
      Declare rate_mv_bmc in warped motion · 562a3937
      Yunqing Wang authored
      A motion refining was added in warped motion, which required the
      declaration of rate_mv_bmc in warped motion.
      
      BUG=aomedia:613
      
      Change-Id: I74dfc396f915a5cc4599bfbdccad758fa630505f
      562a3937
    • Yi Luo's avatar
      Add high bit depth fast path quantizer avx2 · 6faf349a
      Yi Luo authored
      - User level encoder timer reduction ~4.3% with
        following testing: 1080p, 10-bit, 4Mbps, 4 frames,
        profile=2, i7-6700.
      
      Change-Id: Ib4a579d10cbd705cb7b1c4f0d619159a76bb34d7
      6faf349a
    • David Michael Barr's avatar
      [CFL] drop skip logic, always write alpha · 23198661
      David Michael Barr authored
      
      
      Results on Subset 1 (Compared to a0f8c145 with CfL)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0677 | -0.3359 | -0.2115 |   0.0529 | 0.0735 |  0.0495 |    -0.0907
      
      Change-Id: Ib61ff862e8cfbdf0c693a4eba5f2712a6e9ab819
      Signed-off-by: default avatarDavid Michael Barr <b@rr-dav.id.au>
      23198661
    • Luc Trudeau's avatar
      [CFL] RDO Loop Rework · 14fc5045
      Luc Trudeau authored
      CfL performs an extra loop iteration during luma mode selection. Recent
      changes have broken the extra iteration. Remove previous approach.
      
      New approach adds the extra iteration right before uv parameter
      selection. Interesting fact, If the best luma intra mode already has
      worse RD performance than the best inter mode found so far (if any),
      then the entire chroma intra search is skipped, including the extra 
      iteration.
      
      Results on Subset1 (compared to 3e18e4ae with CfL)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.3090 | -2.7271 | -2.3521 |  -0.3369 | -0.3463 | -0.3525 |    -1.1868
      
      Change-Id: If67b0badd2c8ea25c61685483d39d622c1729b18
      14fc5045
  7. 19 Jun, 2017 7 commits
    • Joe Young's avatar
      [intra-edge] Convert 4x4 VP9 to ext-intra; upsample edge samples · 830d4ce4
      Joe Young authored
      Updates to intra-edge experiment
      
      - Convert VP9-style intra pred to Ext-intra style
      - Upsample edge predictors by 2x based on angle and edge size
      
      BD-rate, 1-kf AWCY
        360p:  -0.11%
        720p:  -0.54
        1080p: -0.96
      
      Change-Id: Ib73805d31d5d286e607a7ee7470fcbdf11edbbff
      830d4ce4
    • Luc Trudeau's avatar
      [CFL] Compute Luma Average Over Partition Unit · 3e18e4ae
      Luc Trudeau authored
      Extract the compution of the luma reconstructed average out of cfl_load
      and into cfl_compute_average. The reconstructed luma average is stored
      in the CFL_CONTEXT to avoid computing it for each transform block and
      for each plane.
      
      Results on subset1 (compared to 803bea26 with CfL)
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0474 | -0.1486 | -0.2931 |  -0.0358 | -0.0397 | -0.0127 |    -0.1162
      
      Change-Id: I9e34af0fe5961ce8dbe70cb80aea2a16221d0d92
      3e18e4ae
    • Timothy B. Terriberry's avatar
      encoder: Remove 64x upsampled reference buffers · 5d24b6f0
      Timothy B. Terriberry authored
      They do not handle border extension correctly (interpolation and
      border extension do not commute unless you upsample into the
      border), nor do they handle crop dimensions that are not a multiple
      of 8 (the upsampled version is not sufficiently large), in addition
      to using massive amounts of memory and being a criminal waste of
      cache (1 byte used for every 8 bytes fetched).
      
      This commit reimplements use_upsampled_references by computing the
      subpixel samples on the fly. This implementation not only corrects
      the border handling, but is also faster, while maintaining the
      same quality.
      
      HL AWCY results are basically noise:
          PSNR | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
        0.0188 |   0.0187 | 0.0045 |  0.0063 |     0.0228
      
      Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd
      5d24b6f0
    • Debargha Mukherjee's avatar
      Fix a bug for non 420 formats and some refactoring · 887069f3
      Debargha Mukherjee authored
      BUG=aomedia:607
      
      Change-Id: I5a5fb893f0237e7ca6e0d807e825f8d4e26949b2
      887069f3
    • Zoe Liu's avatar
      Add new coding tool of ext-comp-refs · c082bbcb
      Zoe Liu authored
      The tool of ext-comp-refs adds the uni-directional compound reference
      prediction. In details, 3 pairs of uni-direcitonal compound references
      are added for the comp ref prediction:
      (LAST_FRAME, LAST2_FRAME),
      (LAST_FRAME, GOLDEN_FRAME), and
      (BWDREF_FRAME, ALTREF_FRAME).
      
      This new tool of ext-comp-refs will eventually overwrite
      one-side-compound and have the two coding tools to merge to one.
      
      It achieves -0.35 ~ -0.55% coding gains in BDRate, compared against
      AV1 baseline with the default experiments on, but without
      one-sided-compound. It achieves -0.2% ~ -0.3% coding gains when
      one-sided-compound is on. It achieves larger gains on higher
      resolution.
      
      Change-Id: Icbdb16e97b96aaebaf2213f5f72d5331e2e358eb
      c082bbcb
    • Zoe Liu's avatar
      Unify the checking on compound mode prediction · 0c634c70
      Zoe Liu authored
      Change-Id: Id9c025febf21aeb67cbc719f585661b715bdb9ce
      0c634c70
    • Sarah Parker's avatar
      Add macro to disable trellis optimization in rdopt · 345366ac
      Sarah Parker authored
      Turning off the trellis optimization gives a performance
      drop of 0.726% on the lowres set.
      
      Change-Id: I4fdd1e20fb6f671162cd32b3abe699cd2aee1919
      345366ac
  8. 17 Jun, 2017 2 commits
  9. 16 Jun, 2017 8 commits