1. 12 Jul, 2016 1 commit
    • Yi Luo's avatar
      HBD convolution filtering (10/12 taps) SSE4.1 optimization · 8cacca73
      Yi Luo authored
      - For experiment EXT_INTERP under high bit depth.
      - Add unit test to verify bit-exact.
      - Speed performance improvement:
        On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
        drops from 6682503 ms to 5390270 ms.
      
      Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
      8cacca73
  2. 09 Jul, 2016 1 commit
    • Yue Chen's avatar
      Fix assertion failures in mips+msa setting · 4ab19eac
      Yue Chen authored
      Directly call c functions, otherwise when EXT_TX is enabled, hybrid
      transform other than combination of DCT/ADST has not been implemented, thus
      will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
      vp10_ihtnxn_nxn_add_msa().
      
      BUG=webm:1239
      
      Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
      4ab19eac
  3. 08 Jul, 2016 1 commit
  4. 07 Jul, 2016 2 commits
  5. 30 Jun, 2016 1 commit
    • Geza Lore's avatar
      Reject ext-inter compound modes based on modelled RD. · 532304e4
      Geza Lore authored
      Reject ext-inter compound modes before doing full rate distortion
      evaluation, if the corresponding single reference modes had a lower
      modelled RD.
      
      ext-inter speedup up to TBD.
      
      Coding performance: TBD
      
      Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857
      532304e4
  6. 29 Jun, 2016 2 commits
  7. 27 Jun, 2016 1 commit
    • Yi Luo's avatar
      Fix bugs in convolution filter optimization · 8404253f
      Yi Luo authored
      - Fix the over-writing bug in horizontal filtering as width = 2.
      - Fix 10-tap vertical filtering which no longer reads one row of
        pixel above the block.
      - Fix 10-tap filter zero padding.
      - Encoder speed slow down ~4.0%, compared to,
        81ad9536 Convolution vertical filter SSSE3 optimization
      
      Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
      8404253f
  8. 24 Jun, 2016 1 commit
  9. 23 Jun, 2016 1 commit
    • Yi Luo's avatar
      Convolution vertical filter SSSE3 optimization · 81ad9536
      Yi Luo authored
      - Apply 8-pixel vertical filtering direction parallelism.
      - Add unit tests to verify bit exact.
      - Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
      - Combinational cycle count of vp10_convolve() drops from 26.06%
        to 6.73%.
      
      Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
      81ad9536
  10. 22 Jun, 2016 2 commits
    • Jingning Han's avatar
      Refactor reference frame type defs · b605de07
      Jingning Han authored
      Move the reference frame type definitions to common/enums.h file.
      Replace hard coded numbers.
      Combine repeated definitions.
      
      Change-Id: I288e079a03e448014cc181bcdb3f88ee8ec8d139
      b605de07
    • Jingning Han's avatar
      Make drl support bi-directional reference frames · c2195c5b
      Jingning Han authored
      This commit refactors the reference frame structure used in the
      dynamic motion vector referencing system, and makes it support
      the bi-directional reference frames. This resolves unit test
      failure (enc/dec mismatch) when both are turned on.
      
      The compression performance (ref-mv + ext-refs) is improved by
      0.2% for lowres.
      
      Change-Id: I233624d8fccc1f69e82295f94de984ff056365dc
      c2195c5b
  11. 21 Jun, 2016 1 commit
  12. 20 Jun, 2016 1 commit
    • Yi Luo's avatar
      Convolution horizontal filter SSSE3 optimization · 229690a9
      Yi Luo authored
      - Apply signal direction/4-pixel vertical/8-pixel vertical
        parallelism.
      - Add unit test to verify the bit exact result.
      - Overall encoding time improves ~24% on Xeon E5-2680 CPU.
      
      Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
      229690a9
  13. 17 Jun, 2016 1 commit
    • Zoe Liu's avatar
      Merge bi-predictive frames to EXT_REFS · 5805a14c
      Zoe Liu authored
      This patch removed the experiment of BIDIR_PRED and merged the feature
      into the experiment of EXT_REFS:
      
      (1) Each frame now has up to 6 reference frames, namely
          LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
          BWDREF_FRAME, ALTREF_FRAME (backward);
          LAST4_FRAME has been removed;
      (2) First pass still keeps the 8 updates:
          KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
          BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
      (3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
      (4) New encoding modes are added for both single-ref and compound cases,
          through the use of the 2 extra forward references (LAST2 & LAST3)
          and the 1 extra backward reference (BWDREF).
      
      RD performance wise, using Overall PSNR: Avg/BDRate
              Bipred only      Prev EXT_REFS    Current EXT_REFS with bipred
      lowres: -3.474/-3.324    -1.748/-1.586    -4.613/-4.387
      derflr: -2.097/-1.353    -1.439/-1.215    -3.120/-2.252
      midres: -2.129/-1.901    -1.345/-1.185    -2.898/-2.636
      
      If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
      use 2 bi-predictive frames than 1, a further improvement may be
      obtained:
                       Current EXT_REFS with bipred
              1 bi-predictive frame    2 bi-predictive frames
      lowres: -4.613/-4.387            -4.675/-4.465
      derflr: -3.120/-2.252            -3.333/-2.516
      midres: -2.898/-2.636            -3.406/-3.095
      
      Change-Id: Ib06fe9ea0a5cfd7418a1d79b978ee9d80bf191cb
      5805a14c
  14. 14 Jun, 2016 3 commits
    • Geza Lore's avatar
      Select segment based loopfilter strength for supertx blocks. · 44b91a0e
      Geza Lore authored
      Segment based loopfilter strength for supertx coded blocks is now
      selected based on the minimum of all segment IDs within a supertx
      coded block (same as the quantiser settings).
      
      Change-Id: Ib056bd0d05f6a1d3b512a76deb4e2ad4db0f7dc4
      44b91a0e
    • Geza Lore's avatar
      Rework supertx segment handling and adaptive quantization. · 7dd90c9d
      Geza Lore authored
      Segment level quantizer settings for supertx coded blocks are now
      selected based on the minimum of all segment IDs within a supertx
      coded block.
      
      This also fixes the 3 adaptive quantization modes with supertx.
      
      Change-Id: Ib5db099539d4f82f240e1d745d6e5264f8b34cde
      7dd90c9d
    • Jingning Han's avatar
      Fix enc/dec mismatch in non-420 settings · a4ea8fd8
      Jingning Han authored
      This commit makes the dual filter experiment work with non-420
      settings. It fixes unit test failure in EndToEndTestLarge.
      
      Change-Id: I04f7afdee78f91389d9ff72947efa152098af930
      a4ea8fd8
  15. 13 Jun, 2016 1 commit
  16. 10 Jun, 2016 4 commits
  17. 09 Jun, 2016 1 commit
  18. 08 Jun, 2016 1 commit
  19. 06 Jun, 2016 3 commits
    • Aamir Anis's avatar
      Updated loop restoration · 99d9a8fe
      Aamir Anis authored
      1. Wiener restoration filter now has normalization and evaluation of
      quantization procedure.
      2. Corrected scaling of bits in RD cost computation.
      3. Changed dynamic range and number of bits for Wiener filter.
      Observed gains: Overall 0.58% for low_res, 0.7% for mid_res sequences.
      
      Change-Id: I8928b3ea493bfe1790926b00388d6c4bafc08e19
      99d9a8fe
    • Angie Chiang's avatar
      Fix build failure happened in reconinter.c · 2250c6b0
      Angie Chiang authored
      Change-Id: Ifd5ed91e4e91238fb53a202c8d76c11fbb9ccf7c
      2250c6b0
    • Geza Lore's avatar
      Optimize wedge partition selection. · efda2831
      Geza Lore authored
      We can optimize wedge partition selection by pre-computing the
      residuals of the 2 underlying predictors, and then blend these
      to compute the sse of the compound predictor, without actually
      having to compute and subtract the compound predictor.
      
      Similarly we can pre-compute a proxy array which we can use to
      cheaply check which mask sign would have lower sse.
      
      Details are in wedge_utils.c.
      
      Mathematically these are equivalence transformations, but due to the
      finite precision the encoder output will be perturbed, though on
      average this should make 0% difference.
      
      ext-inter gains about ~4.5% speedup.
      
      Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
      efda2831
  20. 03 Jun, 2016 1 commit
    • Geza Lore's avatar
      Pre-compute and use contiguous wedge masks. · ab29978e
      Geza Lore authored
      This is purely a refactoring patch and has no functional effect.
      
      Uses of these masks can be arranged such that all input blocks are
      contiguous in memory (stride == block width). In this case 1D versions
      of  operations can be used. 1D vector operations have superior performance
      over 2D block equivalents as they are more processor cache friendly and
      they can do away with a second loop overhead.
      
      Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
      ab29978e
  21. 02 Jun, 2016 1 commit
    • Geza Lore's avatar
      Use standard rounding in combine_interintra. · 888e90e8
      Geza Lore authored
      Use the same rounding method that is used throughout the codebase,
      where the halfway value is rounded up rather than down.
      
      Change-Id: I04e92850bc69a7d7a07b06e3d2ce97f6f2ada321
      888e90e8
  22. 31 May, 2016 1 commit
  23. 28 May, 2016 1 commit
    • Zoe Liu's avatar
      Make the bi-predictive frame group interval adjustable · e89ca180
      Zoe Liu authored
      This is for the bidir-pred experiment. Previously the length of the
      bi-predictive frame group interval is fixed at 2, i.e. one
      bi-predictive frame may be inserted every other frame. This patch
      makes the length adjustable, i.e. any positive number may be
      specified, but the use of the backward ref will be turned off if the
      bi-predictive frame group interval is larger than the golden frame
      group.
      
      Further, an additional rate factor level has been added:
      INTER_LOW
      , which applies to LAST_BIPRED_UPDATE frames that are not used as
      references.
      
      Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
      e89ca180
  24. 25 May, 2016 2 commits
    • hui su's avatar
      Add a quick path in build_intra_predictors · bad6e169
      hui su authored
      For the cases where no reference data is available.
      
      Change-Id: Ibf1ac9b7073acc2c7fc44da893f3d608dc74bc1e
      bad6e169
    • Yi Luo's avatar
      Integrate HBD inverse HT flip types sse4.1 optimization · bfe4c0ae
      Yi Luo authored
      - tx_size: 4x4, 8x8, 16x16.
      - tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
        ADST_FLIPADST, FLIPADST_ADST.
      - Encoder speed improvement:
        park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
      - Add unit test cases for bit-exact against C.
      
      Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
      bfe4c0ae
  25. 24 May, 2016 4 commits
    • Zoe Liu's avatar
      Added an experiment "bidir_pred" for backward prediction · cf5083d4
      Zoe Liu authored
      Major parts have been implemented as follows:
      (1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
      (2) Added the handling for the scenario of
      "cpi->common.show_existing_frame == 1" at the encoder;
      (3) Added a new reference frame of BWDREF_FRAME;
      (4) Have bwd-ref work with upsampled references.
      
      Note that when the experiment of "ext_refs" turned on, this experiment
      will be turned off automatically currently.
      
      RD performance in Overall PSNR has been improved, compared against the
      VP10 baseline:
      
      lowres: Avg -3.312; BDRate -3.154
      derflr: Avg -1.927; BDRate -1.176
      midres: Avg -2.149; BDRate -2.001
      hdres : Avg -0.567; BDRate -0.588
      
      Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
      cf5083d4
    • Yi Luo's avatar
      HBD inverse HT 8x8 and 16x16 sse4.1 optimization · 28cdee44
      Yi Luo authored
      - Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
      - Encoding speed improves ~27% on crowd_run_1080p_12.
      - Merge 4x4, 8x8, 16x16 unit tests in one test file.
      
      Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
      28cdee44
    • Geza Lore's avatar
      Remove redundant memcpy from wedge predictor. · 2935b4db
      Geza Lore authored
      Removing redundant calls to memcpy from
      build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
      with ext-inter only. The output is identical.
      
      Change-Id: If97d4e323a5c8aca90c84a25a72085e006b05446
      2935b4db
    • Geza Lore's avatar
      Pick up bit-depth from the right place · 62b63317
      Geza Lore authored
      Change-Id: Icbdb036d7927b77b84bd78e8348ec8b5be88df08
      62b63317
  26. 23 May, 2016 1 commit
    • Geza Lore's avatar
      Add optimized vpx_blend_mask6 · a661bc87
      Geza Lore authored
      This is to replace vp10/common/reconinter.c:build_masked_compound.
      Functionality is equivalent, but the interface is slightly more
      generic.
      
      Total encoder speedup with ext-inter: ~7.5%
      
      Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
      a661bc87