1. 20 Jun, 2014 1 commit
  2. 13 Jun, 2014 1 commit
  3. 12 Jun, 2014 1 commit
    • Jingning Han's avatar
      Fast computation path for forward transform and quantization · ccba289f
      Jingning Han authored
      This commit enables a fast path computational flow for forward
      transformation. It checks the sse and variance of prediction
      residuals and decides if the quantized coefficients are all
      zero, dc only, or more. It then selects the corresponding coding
      path in the forward transformation and quantization stage.
      
      It is currently enabled in rtc coding mode. Will do it for rd
      coding mode next.
      
      In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
      goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
      Overall coding performance for rtc set is changed by -0.18%.
      
      Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
      ccba289f
  4. 10 Jun, 2014 6 commits
    • James Zern's avatar
      vp9_rtcd: correct avx2 references · 9f3a0dbb
      James Zern authored
      s/"\$avx2_x86inc"/"avx2"/
      
      avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm
      
      Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
      9f3a0dbb
    • James Zern's avatar
      vp9_sub_pixel_*variance*: disable avx2 variants · 520cb3f3
      James Zern authored
      tests failing under Win32/Win64
      
      + variance_test: add missing avx2 functions (partially disabled)
      
      Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
      520cb3f3
    • James Zern's avatar
      vp9_sad*x4d: disable avx2 variants · d3ff009d
      James Zern authored
      tests failing under Win32/Win64
      
      + sad_test: add missing avx2 functions (disabled)
      
      Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
      d3ff009d
    • hkuang's avatar
      Add mode info arrays and mode info index. · cdffeaaa
      hkuang authored
      In non frame-parallel decoding, this works the same way as
      current decoding scheme. Every time after decoder finish
      decoding a frame, it will swap the current mode info pointer
      and  previous mode info pointer if the decoded frame needs
      to be shown. Both mode info pointer and previous mode info
      pointer are from mode info arrays.
      
      In frame-parallel decoding, this will become more complicated
      as current frame's mode info pointer will be shared with next
      frame as previous mode info pointer. But when one decoder
      thread finishes decoding one frame and starts to work on next
      available frame, it needs to retain the decoded frame's mode
      info pointers until next frame finishes decoding. The mode info
      index will serve this purpose. The decoder will use different
      buffer in the mode info arrays and use the other buffer to save
      previous decoded frame’s mode info.
      
      Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
      cdffeaaa
    • James Zern's avatar
      vp9_f(dct|ht): disable avx2 variants · dd9f5029
      James Zern authored
      tests failing under Win32/Win64
      
      + dct16x16_test: add missing avx2 functions (partially disabled)
      
      exercises the forward transforms
      no idct/iht implementations, so the c-code is used
      
      Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
      dd9f5029
    • James Zern's avatar
      convolve: disable avx2 variants · 5704578f
      James Zern authored
      tests failing under Win32/Win64
      
      Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
      5704578f
  5. 02 Jun, 2014 1 commit
  6. 01 Jun, 2014 1 commit
  7. 29 May, 2014 2 commits
  8. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
      6d21cbd2
  9. 27 May, 2014 2 commits
  10. 23 May, 2014 5 commits
  11. 22 May, 2014 2 commits
  12. 21 May, 2014 2 commits
    • Deb Mukherjee's avatar
      Renames x86_64 specific asm files · e2722734
      Deb Mukherjee authored
      Renames all x86_64 specific assembly files to consistently
      end in _x86_64.asm. This will be useful for build systems to
      handle these files differently.
      All new 64-bit specific assembly files should use the new
      naming convention.
      
      Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
      e2722734
    • Dmitry Kovalev's avatar
      Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK. · 35a83677
      Dmitry Kovalev authored
      The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
      This patch does it in the decoder.
      
      Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
      35a83677
  13. 20 May, 2014 2 commits
    • Deb Mukherjee's avatar
      Extends temporal filtering to work for 422 data · a185bc33
      Deb Mukherjee authored
      This is needed for profiles 1 and 2.
      
      Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
      a185bc33
    • hkuang's avatar
      Refactor decode_tiles and loopfilter code. · 20c1edf6
      hkuang authored
      The current decode_tiles decodes the frame one tile by one tile
      and then loopfilter the whole frame or use another worker thread to
      do loopfiltering.
      
      |------|------|------|------|
      |Tile1-|Tile2-|Tile3-|Tile4-|
      |------|------|------|------|
      
      For example, if a tile video has one row and four cols, decode_tiles
      will decode the Tile1, then Tile2, then Tile3, then Tile4.
      And during decode each tile, decode_tile will decode row by row in
      each tile.
      
      For frame parallel decoding, decode_tiles will decode video in row order
      across the tiles. So the order will be:
      "Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
      -> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
      -> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
      -> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
      
      Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
      20c1edf6
  14. 16 May, 2014 1 commit
  15. 15 May, 2014 3 commits
  16. 14 May, 2014 4 commits
    • Dmitry Kovalev's avatar
      Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file. · 021eaabd
      Dmitry Kovalev authored
      Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4
      021eaabd
    • levytamar82's avatar
      AVX2 To VP9 Block Error Optimization · 1fbab853
      levytamar82 authored
      vp9_block_error_sse2 can only handle 16 bytes at a time but
      the function requires to handle a sequence of 32 bytes at a time
      so each 16 bytes is handled in a different register.
      With AVX2 optimization the 32 bytes can be handled in one register instead
      of two in the SSE2
      The vp9_block_error was optimized by 85%.
      The user level was optimized by 1.2%
      
      Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
      1fbab853
    • Yaowu Xu's avatar
      vp9_decodeframe.c: cleanup -wextra warnings · ed095807
      Yaowu Xu authored
      Change-Id: I0315cea6a5e58182bc2556e9825ec2ef0b1480c3
      ed095807
    • Deb Mukherjee's avatar
      Remove Wextra warnings from vp9_sad.c · 7ab9a958
      Deb Mukherjee authored
      As a side-effect, the max_sad check is removed from the
      C-implementation of VP8, for consistency with VP9, and to
      ensure that the SAD tests common to VP8/VP9 pass.
      That will make the VP8 C implementation of sad a little slower
      but given that is rarely used in practice, the impact will be
      minimal.
      
      Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
      7ab9a958
  17. 13 May, 2014 2 commits
    • Jingning Han's avatar
      Silience -wextra warnings in vp9_reconintra.c · 806fa6aa
      Jingning Han authored
      The warning messages complained that there are unused arguments
      in a few prediction modes. This structure was designed on purpose,
      such that a wrapper function can cover all prediction mode cases
      and make them readily accessible as an pointer array.
      
      This commit silences such warnings.
      
      Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
      806fa6aa
    • Adrian Grange's avatar
      vp9_convolve.c: cleanup -wextra warnings · fd6bf31b
      Adrian Grange authored
      Change-Id: I04930aca2293ebbaeb96dfedd2f9c5a55762fd2e
      fd6bf31b
  18. 12 May, 2014 2 commits
  19. 08 May, 2014 1 commit