1. 12 Jun, 2014 1 commit
    • Jingning Han's avatar
      Fast computation path for forward transform and quantization · ccba289f
      Jingning Han authored
      This commit enables a fast path computational flow for forward
      transformation. It checks the sse and variance of prediction
      residuals and decides if the quantized coefficients are all
      zero, dc only, or more. It then selects the corresponding coding
      path in the forward transformation and quantization stage.
      
      It is currently enabled in rtc coding mode. Will do it for rd
      coding mode next.
      
      In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
      goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
      Overall coding performance for rtc set is changed by -0.18%.
      
      Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
      ccba289f
  2. 10 Jun, 2014 5 commits
    • James Zern's avatar
      vp9_rtcd: correct avx2 references · 9f3a0dbb
      James Zern authored
      s/"\$avx2_x86inc"/"avx2"/
      
      avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm
      
      Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
      9f3a0dbb
    • James Zern's avatar
      vp9_sub_pixel_*variance*: disable avx2 variants · 520cb3f3
      James Zern authored
      tests failing under Win32/Win64
      
      + variance_test: add missing avx2 functions (partially disabled)
      
      Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
      520cb3f3
    • James Zern's avatar
      vp9_sad*x4d: disable avx2 variants · d3ff009d
      James Zern authored
      tests failing under Win32/Win64
      
      + sad_test: add missing avx2 functions (disabled)
      
      Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
      d3ff009d
    • James Zern's avatar
      vp9_f(dct|ht): disable avx2 variants · dd9f5029
      James Zern authored
      tests failing under Win32/Win64
      
      + dct16x16_test: add missing avx2 functions (partially disabled)
      
      exercises the forward transforms
      no idct/iht implementations, so the c-code is used
      
      Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
      dd9f5029
    • James Zern's avatar
      convolve: disable avx2 variants · 5704578f
      James Zern authored
      tests failing under Win32/Win64
      
      Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
      5704578f
  3. 02 Jun, 2014 1 commit
  4. 01 Jun, 2014 1 commit
  5. 29 May, 2014 1 commit
  6. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
      6d21cbd2
  7. 27 May, 2014 1 commit
  8. 23 May, 2014 2 commits
    • Jingning Han's avatar
      Inverse 16x16 2D-DCT SSSE3 implementation · 48b08913
      Jingning Han authored
      This commit enables the SSSE3 implementation of full inverse 16x16
      2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
      about 7% speed-up.
      
      Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
      48b08913
    • Deb Mukherjee's avatar
      Remove Wextra warnings from vp9_sad.c · 91655042
      Deb Mukherjee authored
      As a side-effect, the sad unit tests for VP8 and VP9
      had to be separated.
      
      Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
      91655042
  9. 20 May, 2014 1 commit
  10. 15 May, 2014 1 commit
  11. 14 May, 2014 2 commits
    • levytamar82's avatar
      AVX2 To VP9 Block Error Optimization · 1fbab853
      levytamar82 authored
      vp9_block_error_sse2 can only handle 16 bytes at a time but
      the function requires to handle a sequence of 32 bytes at a time
      so each 16 bytes is handled in a different register.
      With AVX2 optimization the 32 bytes can be handled in one register instead
      of two in the SSE2
      The vp9_block_error was optimized by 85%.
      The user level was optimized by 1.2%
      
      Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
      1fbab853
    • Deb Mukherjee's avatar
      Remove Wextra warnings from vp9_sad.c · 7ab9a958
      Deb Mukherjee authored
      As a side-effect, the max_sad check is removed from the
      C-implementation of VP8, for consistency with VP9, and to
      ensure that the SAD tests common to VP8/VP9 pass.
      That will make the VP8 C implementation of sad a little slower
      but given that is rarely used in practice, the impact will be
      minimal.
      
      Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
      7ab9a958
  12. 12 May, 2014 1 commit
  13. 08 May, 2014 3 commits
    • Alex Converse's avatar
      Add an x86inc MMX fwht4x4. · b5422fab
      Alex Converse authored
      Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
      b5422fab
    • Jingning Han's avatar
      Change eob threshold for partial inverse 8x8 2D-DCT to 12 · 41a350a8
      Jingning Han authored
      The scanning order has the first 12 coefficients of the 8x8 2D-DCT
      sitting in the top left 4x4 block. Hence the partial inverse 8x8
      2D-DCT allows to handle cases with eob below 12.
      
      The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
      166 cycles (using SSE2) to 150 cycles (using SSSE3).
      
      Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
      41a350a8
    • Jingning Han's avatar
      SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero · 9e7b09bc
      Jingning Han authored
      This commit enables ssse3 assembly implementation of the 8x8
      inverse 2D-DCT with only first 10 coefficients non-zero. The
      average runtime for this unit goes down from 198 cycles to 129
      cycles (34.8% faster).
      
      Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
      9e7b09bc
  14. 07 May, 2014 1 commit
    • Paul Wilkins's avatar
      Revert "Add an MMX fwht4x4" · 33b1c457
      Paul Wilkins authored
      Includes changes that are not compatible with VS windows builds.
      Amongst other things stdint.h is not supported in VS.
      
      This reverts commit 89fbf3de.
      
      Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
      33b1c457
  15. 06 May, 2014 1 commit
  16. 05 May, 2014 2 commits
    • Alex Converse's avatar
      Add an MMX fwht4x4 · 89fbf3de
      Alex Converse authored
      7% faster encoding a desktop lossless at RT speed 4.
      
      Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
      89fbf3de
    • Jingning Han's avatar
      SSSE3 implementation of full inverse 8x8 2D-DCT · 52ae97b6
      Jingning Han authored
      This commit enables SSSE3 version full inverse 8x8 2D-DCT and
      reconstruction. It makes the runtime of vp9_idct8x8_64_add down
      from 256 cycles (SSE2) to 246 cycles.
      
      Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
      52ae97b6
  17. 29 Apr, 2014 2 commits
    • Jingning Han's avatar
      Enable SSSE3 implementation of 8x8 forward 2D-DCT · 1eaa3a76
      Jingning Han authored
      Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
      version is turned on only for x86_64. The average unit runtime
      goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
      This translates into about 1.5% speed-up for pedestrian_area 1080p
      at speed 2.
      
      Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
      1eaa3a76
    • Dmitry Kovalev's avatar
      Adding search_site_config struct. · aa464eca
      Dmitry Kovalev authored
      Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
      aa464eca
  18. 25 Apr, 2014 1 commit
  19. 24 Apr, 2014 1 commit
  20. 11 Apr, 2014 1 commit
  21. 09 Apr, 2014 1 commit
    • Yunqing Wang's avatar
      Use source frame difference to make partition decision · 4e66293f
      Yunqing Wang authored
      Calculate the difference variance between last source frame and
      current source frame. The variance is calculated at 16x16 block
      level. The variances are compared to several thresholds to decide
      final partition sizes.
      
      An adaptive strategy is implemented to decide using
      SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
      in the video. The switching test is done once every
      search_type_check_frequency frames.
      
      The selection of source_var_thresh needs to be investigated
      further later.
      
      RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
      ssim gain. For clips with large enough static area, the
      encoding speedup is around 2% to 15%.
      
      Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
      4e66293f
  22. 21 Mar, 2014 1 commit
    • levytamar82's avatar
      AVX2 SAD Optimization: · 0fa8b668
      levytamar82 authored
      2 functions were optimized for avx2 by using full 256 bit register
      In order to handle 32 elements in parallel instead of only 16 in parallel:
      1. vp9_sad32x32x4d
      2. vp9_sad64x64x4d
      
      The function level gain is 66% and the user level gain is ~1%.
      
      Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
      0fa8b668
  23. 03 Mar, 2014 1 commit
    • James Zern's avatar
      build: convert rtcd.sh to perl · 805078a1
      James Zern authored
      significantly speeds up file generation.
      
      the goal of this change is to convert rtcd.sh to perl as directly as
      possible to allow for simple comparison. future changes can make it more
      perl-like.
      
      ---
      Linux
          [CREATE] vpx_scale_rtcd.h
      real    0m0.485s ->    0m0.022s
          [CREATE] vp8_rtcd.h
      real    0m4.619s ->    0m0.060s
          [CREATE] vp9_rtcd.h
      real    0m10.102s ->    0m0.087s
      
      Windows
          [CREATE] vpx_scale_rtcd.h
      real    0m8.360s ->    0m0.080s
          [CREATE] vp8_rtcd.h
      real    1m8.083s ->    0m0.160s
          [CREATE] vp9_rtcd.h
      real    2m6.489s ->    0m0.233s
      
      Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
      805078a1