1. 10 Jan, 2014 3 commits
  2. 09 Jan, 2014 7 commits
    • Marco Paniconi's avatar
      Keep buffer clipped to maximum in change_config. · 193fa5c8
      Marco Paniconi authored
      Under a configuration change, where the bitrate suddenly decreases,
      the buffer level may be larger than maximum allowed (for that first frame to be encoded after change_config).
      This change keeps it clipped to its maximum level.
      
      Change-Id: I4d0b5b3d1fd8148600dd39e02bd630c9464baba5
      193fa5c8
    • Yaowu Xu's avatar
      Simplify set_rt_speed_feature() · 2d381d76
      Yaowu Xu authored
      1. Made speed choices to be progressive
      2. Adjusted rt speed settings to achieve better speed/quality
      
      Overall, rt-5 gained 2.5% in compression/quality, encoding time of 720p
      niklas clip goes from 137,052ms to 121,874ms
      
      Change-Id: Ia6e7e1e15225395a868a2f1059c3db8e266e1600
      2d381d76
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
      af31b27a
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
      511d218c
    • Johann's avatar
      Use the correct member for initialization · 719dadf3
      Johann authored
      On Windows this fails with:
      error C2440: 'initializing': cannot convert from int_mv to uint32_t
      
      Change-Id: I51630efd0e83a0ce620c91aa7859dd6fc1572e99
      719dadf3
    • Dmitry Kovalev's avatar
      Using VP9_COMMON instead of VP9_COMP. · b16fac42
      Dmitry Kovalev authored
      Change-Id: If7d3958653104f3e170853e931f8489de3ecf3cc
      b16fac42
    • Dmitry Kovalev's avatar
      Adding {get, set}_rate_correction_factor() functions. · c01fe86c
      Dmitry Kovalev authored
      Change-Id: Ib3212832953a3445fc5f021af0e1de7886f09b4f
      c01fe86c
  3. 08 Jan, 2014 10 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
      ba6ab46c
    • Dmitry Kovalev's avatar
      Removing direct references to {lst_fb, gld_fb, alt_fb}_idx fields. · 510a8282
      Dmitry Kovalev authored
      Change-Id: Ib1d9628d2b538b6dc27b0db1fa7f40f70ff2072f
      510a8282
    • Dmitry Kovalev's avatar
      Cleanups around cpi->common. · 0ecd583d
      Dmitry Kovalev authored
      Change-Id: I0c42a729038d0f4cb7bc07f587d066fcb1dfe9d9
      0ecd583d
    • Dmitry Kovalev's avatar
      Renaming 'Mode' to 'mode'. · 962c8b24
      Dmitry Kovalev authored
      Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
      962c8b24
    • Dmitry Kovalev's avatar
      Renaming 'Sharpness' to 'sharpness'. · 57be8136
      Dmitry Kovalev authored
      Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
      57be8136
    • Alex Converse's avatar
      Add a C fallback for get_msb() and change inline to INLINE. · ce7ff3b6
      Alex Converse authored
      For systems without __builtin_clz() or _BitScanReverse(), taken from libwep
      
      Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
      ce7ff3b6
    • hkuang's avatar
      Add initial intra frame neon optimization. 1~2% gain. · 691111aa
      hkuang authored
      More intra optimizations will be added.
      
      Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
      691111aa
    • levytamar82's avatar
      AVX2 Variance Optimization · 357b6536
      levytamar82 authored
      Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
      vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
      vp9_mse16x16 by migrating to AVX2
      some of the functions were optimized by processing 32 elements instead of 16.
      some of the functions were optimized by processing 2 loop strides of 16
      elements in a single 256 bit register
      This optimization gives between 2.4% - 2.7% user level performance gain
      and 42% function level gain.
      
      Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
      357b6536
    • Alex Converse's avatar
      Replace RD modeling with a fixed point approximation. · f2ca665f
      Alex Converse authored
      Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
      f2ca665f
    • Paul Wilkins's avatar
      Fix rate allocation bug. · d7b49b28
      Paul Wilkins authored
      Fix miss alignment of the frames contributing to the
      error score and bit allocation for gf/arf groups.
      
      Initial results slightly +.
      
      Change-Id: Ie508bdcfdac52e592d48e1f13e01b3551b523deb
      d7b49b28
  4. 07 Jan, 2014 8 commits
  5. 06 Jan, 2014 8 commits
  6. 04 Jan, 2014 2 commits
  7. 03 Jan, 2014 2 commits
    • Jingning Han's avatar
      Tune IDCT8_1D macro function interface · 3e0c62b5
      Jingning Han authored
      This commit adds input/output ports for IDCT8_1D macro function to
      provide more flexibility in variable use. It allows to skip several
      buffer swap operations.
      
      Change-Id: I21f3450509537322293043b3281bfd3949868677
      3e0c62b5
    • Dmitry Kovalev's avatar
      Adding RefBuffer struct. · ba41e9d4
      Dmitry Kovalev authored
      Adding RefBuffer to simplify reference buffer management. The struct has a
      pointer to image data and scale factors relative to the current frame.
      
      Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
      ba41e9d4