1. 10 Jan, 2014 6 commits
    • Jim Bankoski's avatar
      explain speed features · 6439aa5a
      Jim Bankoski authored
      Added comments to explain what the various speed features do, and removed
      1 that was clearly unused.
      
      Change-Id: Icd37a536072ddafedbfaefcecbe48979f6d10faf
      6439aa5a
    • Dmitry Kovalev's avatar
      Removing mi_height_log2_lookup table. · 96be0a50
      Dmitry Kovalev authored
      Change-Id: I1f0ae2edc3a96b33c0494d165ae756a8feba6184
      96be0a50
    • Marco Paniconi's avatar
      Don't use gf_update by default for 1-pass CBR. · c46538d4
      Marco Paniconi authored
      Change-Id: I5df6abceb0a2a69706feadeb820b593cae88f573
      c46538d4
    • Paul Wilkins's avatar
      Revert "SSSE3 convolution optimization" · b6452571
      Paul Wilkins authored
      This reverts commit 511d218c.
      
      In current form intrinsics break borg build.
      
      Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
      b6452571
    • Jingning Han's avatar
      Enable skipping reference frame check in rd loop · d66c7486
      Jingning Han authored
      This commit allows encoder to compare the SAD cost associated with
      the best motion vector predictor, per frame. If one reference frame
      has this cost more than 4 times of the best SAD cost given by other
      reference frames, skip NEARESTMV, NEARMV, ZEROMV mode check of this
      reference frame.
      
      This setting is turned on in speed 2 and above. Compression quality
      change in speed 2:
      derf  -0.014%
      yt    -0.097%
      hd    -0.023%
      stdhd  0.046%
      
      It reduces the speed 2 runtime of test sequences:
      pedestrian_area_1080p 4000 kbps 310763 ms -> 303595 ms
      bluesky_1080p 6000 kbps         259852 ms -> 251920 ms
      
      Change-Id: I7f59cf79503d51836d61d56d50dc5bdf0e502e22
      d66c7486
    • Deb Mukherjee's avatar
      Cleanups on refresh flags · 412e4954
      Deb Mukherjee authored
      Cleanups on frame refresh flags and external overrides.
      
      Change-Id: Ia6a56fe1bde906b1dc3fcbf4ef1c7b207cd2df2d
      412e4954
  2. 09 Jan, 2014 7 commits
    • Marco Paniconi's avatar
      Keep buffer clipped to maximum in change_config. · 193fa5c8
      Marco Paniconi authored
      Under a configuration change, where the bitrate suddenly decreases,
      the buffer level may be larger than maximum allowed (for that first frame to be encoded after change_config).
      This change keeps it clipped to its maximum level.
      
      Change-Id: I4d0b5b3d1fd8148600dd39e02bd630c9464baba5
      193fa5c8
    • Yaowu Xu's avatar
      Simplify set_rt_speed_feature() · 2d381d76
      Yaowu Xu authored
      1. Made speed choices to be progressive
      2. Adjusted rt speed settings to achieve better speed/quality
      
      Overall, rt-5 gained 2.5% in compression/quality, encoding time of 720p
      niklas clip goes from 137,052ms to 121,874ms
      
      Change-Id: Ia6e7e1e15225395a868a2f1059c3db8e266e1600
      2d381d76
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
      af31b27a
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
      511d218c
    • Johann's avatar
      Use the correct member for initialization · 719dadf3
      Johann authored
      On Windows this fails with:
      error C2440: 'initializing': cannot convert from int_mv to uint32_t
      
      Change-Id: I51630efd0e83a0ce620c91aa7859dd6fc1572e99
      719dadf3
    • Dmitry Kovalev's avatar
      Using VP9_COMMON instead of VP9_COMP. · b16fac42
      Dmitry Kovalev authored
      Change-Id: If7d3958653104f3e170853e931f8489de3ecf3cc
      b16fac42
    • Dmitry Kovalev's avatar
      Adding {get, set}_rate_correction_factor() functions. · c01fe86c
      Dmitry Kovalev authored
      Change-Id: Ib3212832953a3445fc5f021af0e1de7886f09b4f
      c01fe86c
  3. 08 Jan, 2014 10 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
      ba6ab46c
    • Dmitry Kovalev's avatar
      Removing direct references to {lst_fb, gld_fb, alt_fb}_idx fields. · 510a8282
      Dmitry Kovalev authored
      Change-Id: Ib1d9628d2b538b6dc27b0db1fa7f40f70ff2072f
      510a8282
    • Dmitry Kovalev's avatar
      Cleanups around cpi->common. · 0ecd583d
      Dmitry Kovalev authored
      Change-Id: I0c42a729038d0f4cb7bc07f587d066fcb1dfe9d9
      0ecd583d
    • Dmitry Kovalev's avatar
      Renaming 'Mode' to 'mode'. · 962c8b24
      Dmitry Kovalev authored
      Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
      962c8b24
    • Dmitry Kovalev's avatar
      Renaming 'Sharpness' to 'sharpness'. · 57be8136
      Dmitry Kovalev authored
      Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
      57be8136
    • Alex Converse's avatar
      Add a C fallback for get_msb() and change inline to INLINE. · ce7ff3b6
      Alex Converse authored
      For systems without __builtin_clz() or _BitScanReverse(), taken from libwep
      
      Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
      ce7ff3b6
    • hkuang's avatar
      Add initial intra frame neon optimization. 1~2% gain. · 691111aa
      hkuang authored
      More intra optimizations will be added.
      
      Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
      691111aa
    • levytamar82's avatar
      AVX2 Variance Optimization · 357b6536
      levytamar82 authored
      Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
      vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
      vp9_mse16x16 by migrating to AVX2
      some of the functions were optimized by processing 32 elements instead of 16.
      some of the functions were optimized by processing 2 loop strides of 16
      elements in a single 256 bit register
      This optimization gives between 2.4% - 2.7% user level performance gain
      and 42% function level gain.
      
      Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
      357b6536
    • Alex Converse's avatar
      Replace RD modeling with a fixed point approximation. · f2ca665f
      Alex Converse authored
      Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
      f2ca665f
    • Paul Wilkins's avatar
      Fix rate allocation bug. · d7b49b28
      Paul Wilkins authored
      Fix miss alignment of the frames contributing to the
      error score and bit allocation for gf/arf groups.
      
      Initial results slightly +.
      
      Change-Id: Ie508bdcfdac52e592d48e1f13e01b3551b523deb
      d7b49b28
  4. 07 Jan, 2014 8 commits
  5. 06 Jan, 2014 8 commits
  6. 04 Jan, 2014 1 commit