1. 08 May, 2014 2 commits
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
      41a350a8
    • Jingning Han's avatar
      SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero · 9e7b09bc
      Jingning Han authored
      This commit enables ssse3 assembly implementation of the 8x8
      inverse 2D-DCT with only first 10 coefficients non-zero. The
      average runtime for this unit goes down from 198 cycles to 129
      cycles (34.8% faster).
      
      Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
      9e7b09bc
  2. 06 May, 2014 1 commit
  3. 05 May, 2014 2 commits
    • Alex Converse's avatar
      Add an MMX fwht4x4 · 89fbf3de
      Alex Converse authored
      7% faster encoding a desktop lossless at RT speed 4.
      
      Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
      89fbf3de
    • Jingning Han's avatar
      SSSE3 implementation of full inverse 8x8 2D-DCT · 52ae97b6
      Jingning Han authored
      This commit enables SSSE3 version full inverse 8x8 2D-DCT and
      reconstruction. It makes the runtime of vp9_idct8x8_64_add down
      from 256 cycles (SSE2) to 246 cycles.
      
      Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
      52ae97b6
  4. 29 Apr, 2014 2 commits
    • Jingning Han's avatar
      Enable SSSE3 implementation of 8x8 forward 2D-DCT · 1eaa3a76
      Jingning Han authored
      Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
      version is turned on only for x86_64. The average unit runtime
      goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
      This translates into about 1.5% speed-up for pedestrian_area 1080p
      at speed 2.
      
      Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
      1eaa3a76
    • Dmitry Kovalev's avatar
      Adding search_site_config struct. · aa464eca
      Dmitry Kovalev authored
      Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
      aa464eca
  5. 25 Apr, 2014 1 commit
  6. 24 Apr, 2014 1 commit
  7. 11 Apr, 2014 1 commit
  8. 09 Apr, 2014 1 commit
    • Yunqing Wang's avatar
      Use source frame difference to make partition decision · 4e66293f
      Yunqing Wang authored
      Calculate the difference variance between last source frame and
      current source frame. The variance is calculated at 16x16 block
      level. The variances are compared to several thresholds to decide
      final partition sizes.
      
      An adaptive strategy is implemented to decide using
      SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
      in the video. The switching test is done once every
      search_type_check_frequency frames.
      
      The selection of source_var_thresh needs to be investigated
      further later.
      
      RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
      ssim gain. For clips with large enough static area, the
      encoding speedup is around 2% to 15%.
      
      Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
      4e66293f
  9. 21 Mar, 2014 1 commit
    • levytamar82's avatar
      AVX2 SAD Optimization: · 0fa8b668
      levytamar82 authored
      2 functions were optimized for avx2 by using full 256 bit register
      In order to handle 32 elements in parallel instead of only 16 in parallel:
      1. vp9_sad32x32x4d
      2. vp9_sad64x64x4d
      
      The function level gain is 66% and the user level gain is ~1%.
      
      Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
      0fa8b668
  10. 03 Mar, 2014 1 commit
    • James Zern's avatar
      build: convert rtcd.sh to perl · 805078a1
      James Zern authored
      significantly speeds up file generation.
      
      the goal of this change is to convert rtcd.sh to perl as directly as
      possible to allow for simple comparison. future changes can make it more
      perl-like.
      
      ---
      Linux
          [CREATE] vpx_scale_rtcd.h
      real    0m0.485s ->    0m0.022s
          [CREATE] vp8_rtcd.h
      real    0m4.619s ->    0m0.060s
          [CREATE] vp9_rtcd.h
      real    0m10.102s ->    0m0.087s
      
      Windows
          [CREATE] vpx_scale_rtcd.h
      real    0m8.360s ->    0m0.080s
          [CREATE] vp8_rtcd.h
      real    1m8.083s ->    0m0.160s
          [CREATE] vp9_rtcd.h
      real    2m6.489s ->    0m0.233s
      
      Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
      805078a1