1. 24 Jan, 2014 1 commit
  2. 22 Jan, 2014 2 commits
  3. 21 Jan, 2014 1 commit
    • hkuang's avatar
      Seperate the border size for encoder and decoder. · 437004c7
      hkuang authored
      Encoder's boarder is still 160, while decoder's boarder will be 32.
      With on demand and separate boarder buffer for boarder extension.
      The decoder's boarder does not need to to 160 anymore.
      
      Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
      437004c7
  4. 18 Jan, 2014 2 commits
    • Jingning Han's avatar
      Deprecate best_mv from encoder · b461c088
      Jingning Han authored
      This commit deprecates the use of best_mv from encoding and bit-stream
      writing stages. It hence removes the definition from MACROBLOCKD.
      
      Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
      b461c088
    • hkuang's avatar
      Use a temp buffer for reconstruction when · 7459fee8
      hkuang authored
      reference buffer is out of boarder.
      
      Change-Id: Ic7ad136e54a4d68abe0fd4345146a86b0ba824e1
      7459fee8
  5. 17 Jan, 2014 3 commits
  6. 15 Jan, 2014 3 commits
  7. 14 Jan, 2014 1 commit
  8. 13 Jan, 2014 1 commit
  9. 10 Jan, 2014 2 commits
  10. 09 Jan, 2014 2 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 · af31b27a
      Jingning Han authored
      This commit further optimizes SSE2 operations in the second 1-D
      inverse 16x16 DCT, with (<10) non-zero coefficients. The average
      runtime of this module goes down from 779 cycles -> 725 cycles.
      
      Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
      af31b27a
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
      511d218c
  11. 08 Jan, 2014 7 commits
    • Jingning Han's avatar
      Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 · ba6ab46c
      Jingning Han authored
      This commit is the first patch optimizing SSE2 implementation of inverse
      16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
      transformation. It exploits the fact that only top-left 4x4 block contains
      non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
      
      The average runtime of idct16x16_10 unit is reduced from
      883 cycles -> 779 cycles (12% faster).
      
      For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
      down from 310651 ms  -> 305910 ms. The decoding speed goes up from
      80.37 fps -> 80.87 fps.
      
      Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
      ba6ab46c
    • Dmitry Kovalev's avatar
      Renaming 'Mode' to 'mode'. · 962c8b24
      Dmitry Kovalev authored
      Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
      962c8b24
    • Dmitry Kovalev's avatar
      Renaming 'Sharpness' to 'sharpness'. · 57be8136
      Dmitry Kovalev authored
      Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
      57be8136
    • Alex Converse's avatar
      Add a C fallback for get_msb() and change inline to INLINE. · ce7ff3b6
      Alex Converse authored
      For systems without __builtin_clz() or _BitScanReverse(), taken from libwep
      
      Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
      ce7ff3b6
    • hkuang's avatar
      Add initial intra frame neon optimization. 1~2% gain. · 691111aa
      hkuang authored
      More intra optimizations will be added.
      
      Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
      691111aa
    • levytamar82's avatar
      AVX2 Variance Optimization · 357b6536
      levytamar82 authored
      Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
      vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
      vp9_mse16x16 by migrating to AVX2
      some of the functions were optimized by processing 32 elements instead of 16.
      some of the functions were optimized by processing 2 loop strides of 16
      elements in a single 256 bit register
      This optimization gives between 2.4% - 2.7% user level performance gain
      and 42% function level gain.
      
      Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
      357b6536
    • Alex Converse's avatar
      Replace RD modeling with a fixed point approximation. · f2ca665f
      Alex Converse authored
      Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
      f2ca665f
  12. 03 Jan, 2014 4 commits
    • Jingning Han's avatar
      Tune IDCT8_1D macro function interface · 3e0c62b5
      Jingning Han authored
      This commit adds input/output ports for IDCT8_1D macro function to
      provide more flexibility in variable use. It allows to skip several
      buffer swap operations.
      
      Change-Id: I21f3450509537322293043b3281bfd3949868677
      3e0c62b5
    • Dmitry Kovalev's avatar
      Adding RefBuffer struct. · ba41e9d4
      Dmitry Kovalev authored
      Adding RefBuffer to simplify reference buffer management. The struct has a
      pointer to image data and scale factors relative to the current frame.
      
      Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
      ba41e9d4
    • Jingning Han's avatar
      Reduce num of buffer swap calls in idct8_1d_sse2 · 0b1a2713
      Jingning Han authored
      This commit merges the initial buffer swap operations in idct8_1d_sse2
      into the array transpose step, hence reducing number of instructions
      therein.
      
      Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
      0b1a2713
    • Jingning Han's avatar
      Rework idct8x8_10 SSE2 implementation · 1bb11781
      Jingning Han authored
      This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
      the fact that only top-left 4x4 block contains non-zero coefficients,
      and hence reduces the instructions needed.
      
      The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
      estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
      frames coded at 4000kbps, the average decoding speed goes up from
      79.3 fps to 79.7 fps.
      
      Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
      1bb11781
  13. 26 Dec, 2013 1 commit
  14. 20 Dec, 2013 3 commits
  15. 19 Dec, 2013 2 commits
    • Dmitry Kovalev's avatar
      Call set_scaled_offsets() just before scale_mv() call. · c872d2be
      Dmitry Kovalev authored
      Before mv scaling it is required to calculate x_offset_q4/y_offset_q4
      by calling set_scaled_offsets(). Now offset configuration can not be
      missed because it happens just before scale_mv().
      
      Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a
      c872d2be
    • Yunqing Wang's avatar
      Code clean up · 09faf559
      Yunqing Wang authored
      Removed unused filter coefficients.
      
      Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd
      09faf559
  16. 18 Dec, 2013 3 commits
  17. 17 Dec, 2013 2 commits