1. 10 Oct, 2017 12 commits
    • Lester Lu's avatar
      lgt-from-pred: transforms based on prediction · 432012f6
      Lester Lu authored
      In this experiment, sharp image discontinuity in the predicted
      block is detected. Based on this discontinuity, we choose
      particular LGTs as row and column transforms.
      
      Bitstream syntax, entropy coding, and RD search for LGT are added.
      One binary symbol is used to signal whether LGT is used. This
      experiment can work independently with the lgt experiment.
      
      lowres: -0.414% for key frames, -0.151% overall
      midres: -0.413% for key frames, -0.161% overall
      
      Change-Id: Iaa2f2c2839c34ca4134fa55e77870dc3f1fa879f
      432012f6
    • Angie Chiang's avatar
      Turn off limit_nb_scan_distance() temporarily · 63647c02
      Angie Chiang authored
      Change-Id: Idb1a4bf4dd655bde22862d76f6fa70457381a770
      63647c02
    • Angie Chiang's avatar
      Add REDUCE_CONTEXT_DEPENDENCY flag · 4408aad9
      Angie Chiang authored
      This is flag will allow us to calculate the context indexes of
      any two consecutive non-zero binaries in parallel
      
      Moreover, we can set MIN_SCAN_IDX_REDUCE_CONTEXT_DEPENDENCY to X,
      which let first X coefficients be immune from the context
      dependency reduction act
      
      Change-Id: I75b71452996161ba06ec449021c7dea8e3899800
      4408aad9
    • Angie Chiang's avatar
      Pass scan_idx and scan into get_nz_map_ctx · f9711f88
      Angie Chiang authored
      This aims at facilitate the experiment about reduce context
      dependency
      
      Change-Id: I3d026bda1118cf613001efa32deed62997d5e3bb
      f9711f88
    • Angie Chiang's avatar
      Add frame-level flag to turn on/off adapt_scan · 6dbffbf1
      Angie Chiang authored
      Change-Id: I7a73dbe72b618e795191cc31bc32e31ad99d8587
      6dbffbf1
    • Angie Chiang's avatar
      Refine do_adapt_scan's logic · fe533ec6
      Angie Chiang authored
      Change-Id: I6d68f03e3f9b1e40b05503f6bb4055e2fd870893
      fe533ec6
    • Yue Chen's avatar
      Process OBMC pred in max unit of 64x64 · 7eb7679d
      Yue Chen authored
      Make the codec account for the 64x64 processing unit constraint
      when generating secondary predictions and applying overlapped
      filter.
      
      This issue was addressed in commit 440d4254 and 501294ce, but
      afterwards some features are not fully retained in an obmc
      refactoring commit.
      
      Change-Id: I6f16e6fccb966d45034d5b55447c9d9cb70e02cb
      7eb7679d
    • Rupert Swarbrick's avatar
      Avoid Visual Studio compile error in loopfilter · a1befa51
      Rupert Swarbrick authored
      If you have a structure, foo_t, with an alignment request then Visual
      Studio won't allow you to declare a function
      
        void use_foo(foo_t x);
      
      The reasoning is that x might be passed on the stack, and their ABI
      doesn't allow them to guarantee that x is aligned appropriately. More
      strangely, this isn't allowed either:
      
       void use_some_foos(foo_t x[10]);
      
      This is functionally equivalent to:
      
       void use_windows_foos(foo_t *x);
      
      (except that you can't tell how long the array should be from the
      function signature).
      
      Since Visual Studio is supposed to allow the latter form, use that
      instead.
      
      Change-Id: Icd449fc1058606fa7e48a6f791091bbb42a73b2c
      a1befa51
    • Debargha Mukherjee's avatar
      Turn on 32x64 and 64x32 transforms for real · cce6692a
      Debargha Mukherjee authored
      Change-Id: Ie4382b8a1c0f87ce50e9afefd1cef8ca55435c61
      cce6692a
    • Sarah Parker's avatar
      Compute global refmv candidate at center of current block · 0a5cc5fd
      Sarah Parker authored
      When a neighboring block uses global motion, use the mv
      computed at the center of the current block as the candidate vector
      rather than the mv computed at the center of the neighboring block.
      
      0.15% improvement on cam_lowres
      
      Change-Id: I79eff8bf27a7aa84ae4a6d56e4a10c41a4438fb9
      0a5cc5fd
    • Rupert Swarbrick's avatar
      Add an SSE4.1 implementation of av1_highbd_convolve_2d_scale · 724d31eb
      Rupert Swarbrick authored
      For large blocks this is about 8x the speed of the C version. The code
      needs SSE 4.1 for the PMULLD instruction that we use to do SIMD 32-bit
      multiplies.
      
      The patch uses av1_convolve_scale_test (written already to test the
      low bit depth path) to make sure the optimised code matches the C
      version.
      
      Change-Id: I9304d6bb3d2cb31390de93ed08ff1a852e3ace86
      724d31eb
    • Rupert Swarbrick's avatar
      Add an SSE4.1 implementation of av1_convolve_2d_scale · 98dc22b8
      Rupert Swarbrick authored
      For large blocks this is almost 8x the speed of the C version. The
      code needs SSE 4.1 for the PMULLD instruction that we use to do SIMD
      32-bit multiplies.
      
      This patch also makes av1_convolve_scale_test actually test something,
      making sure the optimised code matches the C version. The slightly
      excessive generality in the test (all the templating) is because of a
      following patch, which is for the high bit depth path and can then use
      most of the same test code.
      
      Change-Id: I6732bc6b2378ffaadae5aa6441100cf660f7ee11
      98dc22b8
  2. 09 Oct, 2017 7 commits
    • Angie Chiang's avatar
      Avoid updating non-used transform · ca8016ef
      Angie Chiang authored
      Since 32x32 transform use DCT only, we can avoid update other
      types of transform
      
      Change-Id: I51dd8ec71975187d249d7e25130e994a48cac5c1
      ca8016ef
    • Sarah Parker's avatar
      Change rectangular vartx recursion depth to 2 · d25ef8c6
      Sarah Parker authored
      0.15% improvement on lowres set
      
      Change-Id: If16a8e07797c64508f9e2d9b26ae874ac53c57a4
      d25ef8c6
    • Urvang Joshi's avatar
      Revert wrong uses of TX_SIZE enum. · ab8840eb
      Urvang Joshi authored
      Introduced by: https://aomedia-review.googlesource.com/c/aom/+/25181
      
      Change-Id: I1f25178d6b273fbeade4c33f153b5f2bac4a8b99
      ab8840eb
    • Debargha Mukherjee's avatar
      Enable 32x64 and 64x32 transforms · 1a86b013
      Debargha Mukherjee authored
      Change-Id: I73e9d2d327b062828a75bc99fb348441dd32174a
      1a86b013
    • Debargha Mukherjee's avatar
      Resolve some static analysis warnings · e36a08c4
      Debargha Mukherjee authored
      Change-Id: Iaff923f34100ecdce76d2319fab67cde59d485ae
      e36a08c4
    • Rupert Swarbrick's avatar
      Define block_signals_txsize function · fcff0b25
      Rupert Swarbrick authored
      This returns true if a block signals tx_size in the stream and uses it
      in the bitstream writing code and the decoder.
      
      Note that we can't quite use it in pack_inter_mode_mvs when
      CONFIG_VAR_TX && !CONFIG_RECT_TX but I've switched the code to using
      it the rest of the time since rect-tx is adopted and eventually the
      other code path should be deleted.
      
      Also use the helper function in tx_size_cost in rdopt.c, where the
      test was wrong and caused underestimates of block
      costs. (Specifically, the code that subtracts tx_size_cost from
      this_rate_tokenonly in rd_pick_intra_sby_mode ended up subtracting
      zero for a 4x8 block).
      
      The behaviour of the decoder should be unchanged. The only change in
      the encoder's behaviour should be in tx_size_cost where it should now
      match the rest of the code.
      
      Change-Id: I97236c9ce444993afe01ac5c6f4a0bb9e5049217
      fcff0b25
    • Zoe Liu's avatar
      Add encoder/decoder support to frame_sign_bias · 17af2748
      Zoe Liu authored
      Frame sign bias value will not be signaled in frame header. Instead,
      the sign bias of reference frames are derived from their corresponding
      frame offsets at both encoder and decoder.
      
      The tool of 'frame_sign_bias' is dependent of 'frame_marker'. Compared
      against baseline, the enabling of both tools obtains a small coding gain
      of -0.08 ~ -0.11% in BDRate over Google lowres/midres tests.
      
      Change-Id: I8d85dc427ced0b2152712ccf61be4be6068075b9
      17af2748
  3. 08 Oct, 2017 4 commits
    • Cheng Chen's avatar
      Use arithmetic coding (cdf) to code sb filter lvl · 41d37c20
      Cheng Chen authored
      Change-Id: I5446327378938128f27186015619a079c2845d53
      41d37c20
    • Debargha Mukherjee's avatar
      Fix a compile warning with Windows build · 5c3b0f86
      Debargha Mukherjee authored
      Change-Id: I71c07652565c0e1ca44d73f3731459949271fe45
      5c3b0f86
    • Yunqing Wang's avatar
      Modify storing and using of the temporal frame MVs · d1d511f3
      Yunqing Wang authored
      Add an experiment "tmp", which includes:
      1. Always use larger block size while storing frame MVs and make
      it consistent for CB4X4 or non-CB4X4 cases. Namely, use 8x8 for
      4x4 mi size and 16x16 for 8x8 mi size.
      2. Allocate smaller buffer for frame MVs and save memory usage.
      3. Use nearby 8x8 or 16x16 location's previous frame MVs, and make
      the logic simple.
      4. Reduce the number of copying for frame MVs, that is very costly
      in decoder.
      
      Baseline decoder got 5+% speedup. Borg test on lowres set showed a
      +0.009% PSNR difference before/after the patch.
      
      Change-Id: I61e14e95fd35bea88f338931b4f43c44f4e4cf1f
      d1d511f3
    • Debargha Mukherjee's avatar
      Fix pvq build · 35fb461e
      Debargha Mukherjee authored
      Various fixes for pvq build.
      
      Change-Id: Ideebdb072ed5786f3224e93ded5ec75a23e68dab
      35fb461e
  4. 07 Oct, 2017 8 commits
    • Luc Trudeau's avatar
      [CFL] Support for 4:4:4 High Bit Depth · 69d9e878
      Luc Trudeau authored
      Change-Id: I13ba0dbe57297b540b78512d21a119f05a86a849
      69d9e878
    • Luc Trudeau's avatar
      [CFL] Support for 4:2:0 High Bit Depth · 056d1f40
      Luc Trudeau authored
      high bit depth (_hbd) and low bit depth (_lbd) versions
      of the cfl functions: sum_above_row, sum_left_col,
      cfl_build_prediction, cfl_luma_subsampling_420 (4:4:4 will
      be added in subsequent commit) and cfl_alpha_dist. For
      cfl_alpha_dist, special care is given to scale the SSE
      according to the bit depth.
      
      BUG=aomedia:835
      
      Change-Id: I5b72845100d88fb8a438efe665bcae7fe1ba50b8
      056d1f40
    • Urvang Joshi's avatar
      Add a macro SCALE_HORIZONTAL_ONLY. · e58b564d
      Urvang Joshi authored
      When enabled, scaling through resize and superres will occur only in the
      frame's width; the height will not be scaled.
      
      Macro is off by default.
      
      Change-Id: I501b2b0b2766aa4a86da5937b57c4d5aee4e34c4
      e58b564d
    • Urvang Joshi's avatar
      FRAME_SUPERRES: Rework to use scale factor of 8/D · de71d142
      Urvang Joshi authored
      Earlier, the superres scale was in the form of:
      N/16, where N ranged from 8 to 16.
      
      We change this to the form:
      8/D, where D ranges from 8 to 16.
      
      This helps on the decoder side, by making it possible to work on 8x8
      blocks at a time.
      
      Change-Id: I6c72d4b3e8d1c830e61d4bb8d7f6337a100c3064
      de71d142
    • Urvang Joshi's avatar
      Superres: Remove unused variable. · e61923fa
      Urvang Joshi authored
      cm->superres_scale_numerator is used for both keyframes and
      non-keyframes, and is initialized from either
      oxcf->superres_scale_numerator or oxcf->superres_kf_scale_numerator as
      appropriate.
      
      Change-Id: Ie46df576ef3830e181643ae591d836449a4bd38f
      e61923fa
    • Rupert Swarbrick's avatar
      Upscale frame correctly in foreach_rtile_in_tile · b66894ac
      Rupert Swarbrick authored
      The restoration tiles (rtiles) divide the upscaled frame, not the
      encoded one.
      
      Change-Id: I2d08fe926d694fee7064461685289d3fd1c1de0c
      b66894ac
    • Joe Young's avatar
      [intra-edge] Pad intra edge samples to avoid valgrind warning · 7cfd5343
      Joe Young authored
      The SSE4 function filter_intra_edge_sse4_1() reads data slightly
      past the initialized part of the array. Those data are discarded
      later, but causes a valgrind warning. This change avoids the warning
      by initializing the array an extra +16 positions.
      
      BUG=aomedia:868
      
      Change-Id: Ib610492cff91492ae379c5d62895773f8747c4bc
      7cfd5343
    • Luc Trudeau's avatar
      [CFL] Extract sum top row and left col from custom DC_PRED · 13281d4f
      Luc Trudeau authored
      To simplify high bit depth commit, the summing the top row and the left
      column are extracted out of cfl_dc_pred. This does not change the
      bitstream.
      
      Change-Id: I5c9fe91df4942f736c5af29c1d93abb3a6c8501f
      13281d4f
  5. 06 Oct, 2017 7 commits
    • Jingning Han's avatar
      Rework key frame intra mode context model · a45d842d
      Jingning Han authored
      Reduce the context model size for key frame modes from 30240 bits
      to 4500 bits, i.e., less than 1/6 of the original context model.
      The coding performance loss on key frame is 0.14% for lowres and
      noise level difference for video sequence. The loss on key frame
      for midres is 0.05% and noise level for whole video. The change
      on hdres kf coding is 0.015%.
      
      Change-Id: I9e36825e5c5ee6ba35038c3ca349ad1ad3429910
      a45d842d
    • Debargha Mukherjee's avatar
      Avoid large stack allocations · 5d108a36
      Debargha Mukherjee authored
      When ext-partition and ncobmc-adapt-weight is on, avoid too large
      stack allocations.
      
      Change-Id: I8db74e45cac80c4e5dfd9e20cfc73d9978d1578e
      5d108a36
    • Angie Chiang's avatar
      Reduce division operations in update_scan_prob · bbcd8f76
      Angie Chiang authored
      Change-Id: I923931a9dbf828eb13670511852d55c953b479c1
      bbcd8f76
    • Rupert Swarbrick's avatar
      Simplify the ALL_ZERO_FLAG logic in av1_rd_pick_intra_mode_sb · 799ff701
      Rupert Swarbrick authored
      Since the CONFIG_EXT_INTER #if/#endif lines have been removed, it's a
      bit clearer what's going on here and this patch cleans up the code.
      
      Firstly, the patch pulls the cheap checks on best_mbmode.ref_frame out
      to the front of the block, so we needn't call gm_get_motion_vector at
      all for compound predictions.
      
      Next, second element of the zeromv array is never used, so we needn't
      compute it.
      
      Finally, the patch removes the calls to lower_mv_precision. These
      shouldn't be needed, but it's not exactly obvious why not so the patch
      adds some comments to gm_get_motion_vector to explain what's going on
      and adds an assertion to make sure they are true. It also adds a call
      to integer_mv_precision on the early return path of
      gm_get_motion_vector, correcting an apparent bug when CONFIG_AMVR is
      true.
      
      This patch shouldn't make any difference to encoder or decoder
      behaviour.
      
      Change-Id: I0b4a01063574d080bbf6d30187f4e1748c60939d
      799ff701
    • Angie Chiang's avatar
      Implement non-recursive av1_update_scan_order · da4bbb51
      Angie Chiang authored
      The performance difference is
      lowres: 0.02% gain
      midres: 0.07% gain
      
      Change-Id: I68a74462f41db3bf24573cf2a08c8b5b8aa13f5f
      da4bbb51
    • Sebastien Alaiwan's avatar
      Fix warning about bitwise 'not' on boolean · cf26ee5a
      Sebastien Alaiwan authored
      Change-Id: I4732dbbb71a0db9ac284a4b2ae5f10816e0e9264
      cf26ee5a
    • RogerZhou's avatar
      Extend IntraBC to 4x4 · ca86546f
      RogerZhou authored
      Change-Id: I3f30c35bcd1bc623ad0c34c4b954ff71b2fcfd00
      ca86546f
  6. 05 Oct, 2017 2 commits