1. 27 Dec, 2017 6 commits
    • Linfeng Zhang's avatar
      Fix warnings: unused variable ‘plane_bsize’ · 9abda591
      Linfeng Zhang authored
      Change-Id: I8e205e5b6310b345065200cfdac23f30badc3caa
      9abda591
    • Yaowu Xu's avatar
      Fix msvc compiling errors and warnings · 96fa7575
      Yaowu Xu authored
      Change-Id: I69916bb6390dd9275341d8cd3fae2d8961e1cae3
      96fa7575
    • Linfeng Zhang's avatar
      Optimize get_txb_ctx() · 4ab9a5dc
      Linfeng Zhang authored
      Tested with 720p encoding and av1_cost_coeffs() saves 18% time.
      
      Change-Id: If6de7c539c4b01a3066bdc267fb375dfe77c2c50
      4ab9a5dc
    • Jingning Han's avatar
      Limit the mfmv reference region · 4c864e0c
      Jingning Han authored
      Sub8x8 blocks will not check the extended region in motion field.
      For regular block sizes, limit the extended region to be 3 points,
      down from 9 points check.
      
      Change-Id: I70f2631aa726ad01ee6bb83fffdf71ef82505888
      4c864e0c
    • Jingning Han's avatar
      Drop mvs with magnitude above 4096 from mvs reference frame · 05102b52
      Jingning Han authored
      When either component of a motion vector is above 4096, drop this
      motion vector from the motion vector reference frame for later
      motion field projection use. The coding performance change is close
      to 0 for lowres and midres. This ensures that the motion vector
      and reference frame tuple can be efficiently stored within 32 bits.
      
      Change-Id: I9ae60a5caab2d3f49200abb5415532d82986839f
      05102b52
    • Tom Finegan's avatar
      Always prefix OBUs with a size field. · ff86395f
      Tom Finegan authored
      - Make the add_4bytes_obusize experiment part of the obu experiment.
      - Remove the add_4bytes_obusize experiment flags.
      - Update the encoder, decoder, and tooling sources.
      
      BUG=aomedia:1125
      
      Change-Id: Ia5c443c855e52618257b39c44ca2632703bf83fd
      ff86395f
  2. 26 Dec, 2017 1 commit
    • Zoe Liu's avatar
      Not signal reference_mode if one ref avaialble · c67d98c6
      Zoe Liu authored
      Use the frame ID that indicates the frame display order to identify
      whether two different reference frames exist for inter-coded frames.
      If there is only one unique reference valid in the reference buffer,
      there is no need to signal reference_mode. Instead, the decoder may
      identify such scenario and set reference_mode to SINGLE_REFERENCE.
      
      Change-Id: If7d374f5355f153c50b408be5a9956a833c976c3
      c67d98c6
  3. 25 Dec, 2017 2 commits
  4. 24 Dec, 2017 2 commits
  5. 23 Dec, 2017 5 commits
    • Yunqing Wang's avatar
      Add optimized convolve functions for single reference case · 94e3fe3b
      Yunqing Wang authored
      Added optimized convolve functions for single reference case, so that no
      separate post rounding is needed and the result is written to the
      destination buffer directly. Duplicate code will be cleaned up later.
      
      Change-Id: Iffc0cc6e135b8b6f45a95c314d63368f5aa35f34
      94e3fe3b
    • Yue Chen's avatar
      Remove unused binary-symbol coding and tree-based coding · b101935f
      Yue Chen authored
      Change-Id: I70ebb6ada7ec4a975a8984a2e1ea2fa51664a786
      b101935f
    • Jingning Han's avatar
      Reduce the ref mv search region for sub8x8 blocks · 818b0064
      Jingning Han authored
      Reduce the reference motion vector search region over the spatial
      neighbor blocks for sub8x8 block sizes, in order to reduce the
      worst case context model parsing latency.
      
      Change-Id: I77a2a25483836cc02cf1784c93566fa7cff40fc8
      818b0064
    • Debargha Mukherjee's avatar
      Add stage range configurations for inv transforms · f5a5987f
      Debargha Mukherjee authored
      Only the col transforms are needed since the inverse transform
      is designed to do row first and then col. So the row
      transform can reuse the same configuration as the row transform of
      a square transform of the same size.
      
      Change-Id: I55e0bd6fca2765679be90364a65393e1787f42fe
      f5a5987f
    • Sarah Parker's avatar
      Replace hbd adst4 with lbd adst4 · 95f52605
      Sarah Parker authored
      0.05% drop in performance for 10 bit
      0.03% drop in performance for 12 bit
      
      Updated relevant tests:
      - Use the fadst4 function from VP9 as the reference.
      - Update some max/avg error thresholds
      
      Change-Id: Ic8c5b591eea3309427d2bb42828d44e640f718a1
      95f52605
  6. 22 Dec, 2017 11 commits
    • Hui Su's avatar
      Palette: enable all partitions no larger than 64x64 · 8b618f62
      Hui Su authored
      Enable palette mode for
      4x4, 4x8, 8x4, 4x16, 16x4, 8x32, 32x8, 16x64, 64x16
      
      0.8% gain on screen_content keyframe coding.
      
      Change-Id: Ic3c089b74171ace9082a0d3ad9e27c8a27553789
      8b618f62
    • Debargha Mukherjee's avatar
      Add stage range configurations for fwd transforms · b31ff9b2
      Debargha Mukherjee authored
      Only the row transforms are needed since the forward transform
      is designed to do col first and then row. So the col transform
      can reuse the same configuration as the col transform of a
      square block of the same size.
      
      Change-Id: I35d88146d8f8afeb685e958cb8df447f4d2b7aa1
      b31ff9b2
    • Linfeng Zhang's avatar
      Add av1_get_nz_map_contexts_sse2() · 0ba23e86
      Linfeng Zhang authored
      10x - 50x faster than C code.
      
      av1_cost_coeffs_txb() is about 6% faster.
      
      av1_cost_coeffs() is about 3% faster.
      
      Change-Id: Ib9cbed02a65b9cb0c5deb7a5d99c95d0d8ba32c0
      0ba23e86
    • Debargha Mukherjee's avatar
      Make chroma loopfiltering tx_sizes consistent · 8aec7f30
      Debargha Mukherjee authored
      Removes existing inconsistencies between chroma tx_sizes
      used for chroma loopfiltering.
      Includes various refactoring to remove the uv_txsize_lookup
      array eventually.
      
      BUG=aomedia:1090
      
      Change-Id: Ib74299b41280ca3ebeaf9a9293242d531d68ad28
      8aec7f30
    • Debargha Mukherjee's avatar
      Option to disable small tx size for intra chroma · 80592c72
      Debargha Mukherjee authored
      This is essentially an implementation of Mozilla's big_chorma_tx
      proposal, and CFL is already using this.
      
      The option is turned on by default.
      Also includes some associated refactoring.
      
      AWCY Subset1 results:
      PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0136 | -1.0317 | -1.3525 |  -0.0140 | -0.0188 | -0.0156 | -0.4665
      Link:
      https://beta.arewecompressedyet.com/?job=debargha-base-lvmap%402017-12-21T06%3A08%3A35.079Z&job=debargha-nosmltxi-lvmap%402017-12-21T06%3A10%3A57.767Z
      
      Also resolves the bug below:
      
      BUG=aomedia:1158
      
      Change-Id: I9b806b57c008b7a9bb79357f0bc44dbb091e5278
      80592c72
    • Yunqing Wang's avatar
      Set AV1 convolve function pointers in JNT_COMP · 26b75145
      Yunqing Wang authored
      Set function pointers for AV1 convolve functions in JNT_COMP.
      
      Change-Id: I9042b09c7c0222660b18b3a9ebb1379fd05b52c8
      26b75145
    • Jingning Han's avatar
      Take out drl index control from opt-ref-mv · b4fc74da
      Jingning Han authored
      Removing the drl dependency on the candidate list length appears
      to incur more than 0.3% compression performance loss. Hence remove
      this option from opt-ref-mv to allow better latency vs compression
      performance trade off.
      
      Change-Id: I6edaeb2d437996082b7bdd6cda7351426c5584b9
      b4fc74da
    • Cheng Chen's avatar
      Remove lpf_sb · 07365c9a
      Cheng Chen authored
      As loopfilter is not needed for intrabc, clean up related code.
      
      Change-Id: If89d4969a7795cd8993e6add8fd03ef1296699ef
      07365c9a
    • Zoe Liu's avatar
      Add the syntax/decoder support for fwd-kf · a7c1b196
      Zoe Liu authored
      Forward-coded KEY_FRAME, served as a backward reference frame, is
      coded as intra-only. The show_existing_frame to show the buffered
      forward KEY_FRAME needs to reset the frame context as well as reset
      the reference frame buffer.
      
      One binary symbol, namely reset_decoder_state, is added to the frame
      header. Whenever a frame is a show_existing_frame, it reads out this
      binary symbol value from the bitstream. When this binary symbol is
      1, it indicates that the existing frame to show shall be an intra
      coded frame and will serve as a KEY_FRAME. The frame context is set
      to default and the reference buffer is updated the same way as a
      normal KEY_FRAME.
      
      Change-Id: I8b641220689459a104d2f5a03bbdb6820af8f990
      a7c1b196
    • Debargha Mukherjee's avatar
      Make space for range config for 2D transforms · 867f3120
      Debargha Mukherjee authored
      Change-Id: I62117adde6f403c02667903a31454b2e3cfea4aa
      867f3120
    • Yunqing Wang's avatar
      Set AV1 convolve function pointers · d790c809
      Yunqing Wang authored
      Set function pointers for AV1 convolve functions.
      
      Change-Id: I9241ef31fcd060a6b76e0cac8e2452b0207df929
      d790c809
  7. 21 Dec, 2017 11 commits
    • Urvang Joshi's avatar
      Swap new size 8 and 16 fwd/inv transforms for ADST · 4d5cf537
      Urvang Joshi authored
      This is to make them similar to the ones in VP9.
      
      Change-Id: Iaebf625f2dce4f159b8a8615f37003d773ee6450
      4d5cf537
    • Hui Su's avatar
      intrabc: enable 16x4 and 4x16 blocks · eb2fd5c5
      Hui Su authored
      0.15% gain on the screen_content testset.
      
      BUG=aomedia:998
      
      Change-Id: Ia6484a90b92a00bb0073ecf988b5c164fe8ba84c
      eb2fd5c5
    • Luc Trudeau's avatar
      [CFL] SSE2/AVX2 versions of subtract_average · b4faea73
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSE2/CFLAverageSpeedTest:
      4x4: C time = 499 us, SIMD time = 156 us (~3.2x)
      8x8: C time = 1124 us, SIMD time = 221 us (~5.1x)
      16x16: C time = 4228 us, SIMD time = 620 us (~6.8x)
      32x32: C time = 8743 us, SIMD time = 2236 us (~3.9x)
      
      AVX2/CFLAverageSpeedTest:
      4x4: C time = 482 us, SIMD time = 180 us (~2.7x)
      8x8: C time = 1007 us, SIMD time = 227 us (~4.4x)
      16x16: C time = 3471 us, SIMD time = 324 us (~11x)
      32x32: C time = 8758 us, SIMD time = 1443 us (~6.1x)
      
      Change-Id: Id5ae80142a9764f388c0770ebcff4e46fa3a4dad
      b4faea73
    • Jingning Han's avatar
      Silence compiler warning · 0105c604
      Jingning Han authored
      Clear compiler warning when high bd is off.
      
      Change-Id: I46e35aa03ea7c50c8b98a75cd6d210b15ec5d9c4
      0105c604
    • Hui Su's avatar
      Palette: modify the context slightly · c1f411bc
      Hui Su authored
      Use the number of pixels in a block as context, rather than the bsize
      itself. The rectangular blocks therefore share the same context, e.g.
      BLOCK_8X16 and BLOCK_16X8.
      
      The number of contexts is reduced from 10 to 7.
      Almost no coding performance changes.
      
      Change-Id: Ib3241194580c2b93ad0e953957cdc9188393d055
      c1f411bc
    • Yue Chen's avatar
      Use SIMD function for smooth interintra blending · 592d19d0
      Yue Chen authored
      Tiny speedup: ~0.48%
      No performance change
      
      Change-Id: Icad3c3d25424a6570d1f134aa33d8d015e5b4a10
      592d19d0
    • Angie Chiang's avatar
      Remove all_zero check in read/write_inter_mode · ec9bebc1
      Angie Chiang authored
      This is a bitstream simplification.
      It will reduce motion vector context model generating latency.
      
      Change-Id: I98a496f5d72402ff51a478d5387a0653fa306dc1
      ec9bebc1
    • Steinar Midtskogen's avatar
      Remove CDEF_SINGLEPASS defines · 8322ff04
      Steinar Midtskogen authored
      The experiment has been adopted and has been enabled by default for a
      while and the alternative code path has not been maintained for a long
      time, which is now removed.
      
      Change-Id: Iaf22f2969b45b71b2bf67707e131ab4c439b7fa6
      8322ff04
    • Debargha Mukherjee's avatar
      Remove DISABLE_VARTX_FOR_CHROMA = 2 option. · 27b5136f
      Debargha Mukherjee authored
      Removing code for this option since it is not better than the
      DISABLE_VARTX_FOR_CHROMA = 1 option and is more complex.
      
      Change-Id: Id39d23bc6130bbed0ac008c1c76a2ba5aaee4d22
      27b5136f
    • Debargha Mukherjee's avatar
      Do not use length-64 transform for chroma · 1a8664ea
      Debargha Mukherjee authored
      Adds a missing logic in get_vartx_max_txsize() function for
      64x16 and 16x64 transforms.
      
      Change-Id: I60bf4f5b49be674f103e30a2e35fa0a43ba1f7e6
      1a8664ea
    • Urvang Joshi's avatar
      Fix build when HIGHBITDEPTH and TXMG are off. · 49bcbac0
      Urvang Joshi authored
      Change-Id: I9cedde11c45d84a9604a588cef3ad1ce9888499e
      49bcbac0
  8. 20 Dec, 2017 2 commits
    • Timothy B. Terriberry's avatar
      Fix bustage caused by 8089315a with daala_tx. · 501acee3
      Timothy B. Terriberry authored
      The inverse transform API was changed to pass in an unpadded 32x32
      block of coefficients for transforms larger than 32x32, but the
      code path actually used for daala_tx was not modified to pad it out
      to the full size like the others were.
      
      Change-Id: Ibda5d20a9d839ba41f8a1a0308c414111219da92
      501acee3
    • Yunqing Wang's avatar
      Add is_compound in ConvolveParams · 17be4d8b
      Yunqing Wang authored
      Added is_compound in ConvolveParams, so that later we could handle
      single ref and compound ref differently in optimization.
      
      Change-Id: If36d1634c5dbd9e6e1962c8017db470bf78738fa
      17be4d8b