1. 24 Jan, 2014 1 commit
  2. 22 Jan, 2014 2 commits
  3. 21 Jan, 2014 1 commit
    • hkuang's avatar
      Seperate the border size for encoder and decoder. · 437004c7
      hkuang authored
      Encoder's boarder is still 160, while decoder's boarder will be 32.
      With on demand and separate boarder buffer for boarder extension.
      The decoder's boarder does not need to to 160 anymore.
      Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
  4. 18 Jan, 2014 2 commits
    • Jingning Han's avatar
      Deprecate best_mv from encoder · b461c088
      Jingning Han authored
      This commit deprecates the use of best_mv from encoding and bit-stream
      writing stages. It hence removes the definition from MACROBLOCKD.
      Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
    • hkuang's avatar
      Use a temp buffer for reconstruction when · 7459fee8
      hkuang authored
      reference buffer is out of boarder.
      Change-Id: Ic7ad136e54a4d68abe0fd4345146a86b0ba824e1
  5. 17 Jan, 2014 3 commits
  6. 15 Jan, 2014 3 commits
  7. 14 Jan, 2014 1 commit
  8. 13 Jan, 2014 1 commit
  9. 10 Jan, 2014 2 commits
  10. 09 Jan, 2014 2 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
  11. 08 Jan, 2014 7 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
    • Dmitry Kovalev's avatar
      Renaming 'Mode' to 'mode'. · 962c8b24
      Dmitry Kovalev authored
      Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
    • Dmitry Kovalev's avatar
      Renaming 'Sharpness' to 'sharpness'. · 57be8136
      Dmitry Kovalev authored
      Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
    • Alex Converse's avatar
      Add a C fallback for get_msb() and change inline to INLINE. · ce7ff3b6
      Alex Converse authored
      For systems without __builtin_clz() or _BitScanReverse(), taken from libwep
      Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
    • hkuang's avatar
      Add initial intra frame neon optimization. 1~2% gain. · 691111aa
      hkuang authored
      More intra optimizations will be added.
      Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
    • levytamar82's avatar
      AVX2 Variance Optimization · 357b6536
      levytamar82 authored
      Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
      vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
      vp9_mse16x16 by migrating to AVX2
      some of the functions were optimized by processing 32 elements instead of 16.
      some of the functions were optimized by processing 2 loop strides of 16
      elements in a single 256 bit register
      This optimization gives between 2.4% - 2.7% user level performance gain
      and 42% function level gain.
      Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
    • Alex Converse's avatar
      Replace RD modeling with a fixed point approximation. · f2ca665f
      Alex Converse authored
      Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
  12. 03 Jan, 2014 4 commits
    • Jingning Han's avatar
      Tune IDCT8_1D macro function interface · 3e0c62b5
      Jingning Han authored
      This commit adds input/output ports for IDCT8_1D macro function to
      provide more flexibility in variable use. It allows to skip several
      buffer swap operations.
      Change-Id: I21f3450509537322293043b3281bfd3949868677
    • Dmitry Kovalev's avatar
      Adding RefBuffer struct. · ba41e9d4
      Dmitry Kovalev authored
      Adding RefBuffer to simplify reference buffer management. The struct has a
      pointer to image data and scale factors relative to the current frame.
      Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
    • Jingning Han's avatar
      Reduce num of buffer swap calls in idct8_1d_sse2 · 0b1a2713
      Jingning Han authored
      This commit merges the initial buffer swap operations in idct8_1d_sse2
      into the array transpose step, hence reducing number of instructions
      Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
    • Jingning Han's avatar
      Rework idct8x8_10 SSE2 implementation · 1bb11781
      Jingning Han authored
      This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
      the fact that only top-left 4x4 block contains non-zero coefficients,
      and hence reduces the instructions needed.
      The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
      estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
      frames coded at 4000kbps, the average decoding speed goes up from
      79.3 fps to 79.7 fps.
      Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
  13. 26 Dec, 2013 1 commit
  14. 20 Dec, 2013 3 commits
  15. 19 Dec, 2013 2 commits
    • Dmitry Kovalev's avatar
      Call set_scaled_offsets() just before scale_mv() call. · c872d2be
      Dmitry Kovalev authored
      Before mv scaling it is required to calculate x_offset_q4/y_offset_q4
      by calling set_scaled_offsets(). Now offset configuration can not be
      missed because it happens just before scale_mv().
      Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a
    • Yunqing Wang's avatar
      Code clean up · 09faf559
      Yunqing Wang authored
      Removed unused filter coefficients.
      Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd
  16. 18 Dec, 2013 3 commits
  17. 17 Dec, 2013 2 commits