1. 02 Jul, 2014 1 commit
    • Jingning Han's avatar
      Re-design quantization process · 9ac2f663
      Jingning Han authored
      This commit re-designs the quantization process for transform
      coefficient blocks of size 4x4 to 16x16. It improves compression
      performance for speed 7 by 3.85%. The SSSE3 version for the
      new quantization process is included.
      
      The average runtime of the 8x8 block quantization is reduced
      from 285 cycles -> 255 cycles, i.e., over 10% faster.
      
      Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
      9ac2f663
  2. 28 Jun, 2014 1 commit
    • James Zern's avatar
      vp9: disable postproc buffer alloc when unnecessary · 44472cde
      James Zern authored
      the buffer is only used in encoding and only when
      CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
      a future change should decouple this from the frame buffer allocation
      and make it conditional based on runtime flags when the above config
      options are enabled.
      reduces decode heap usage by at least 12%
      
      Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
      44472cde
  3. 27 Jun, 2014 1 commit
    • Jim Bankoski's avatar
      Better validation of invalid files · 9f37d149
      Jim Bankoski authored
      This patch checks that a decoder never tries to reference frame that's
      outside the range of 2x to 1/16th the size of this frame.  Any attempt
      to do so causes a failure.
      
      Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
      9f37d149
  4. 26 Jun, 2014 1 commit
    • Jingning Han's avatar
      Enable real-time version reference motion vector search · 46ea9ec7
      Jingning Han authored
      This commit enables a fast reference motion vector search scheme.
      It checks the nearest top and left neighboring blocks to decide the
      most probable predicted motion vector. If it finds the two have
      the same motion vectors, it then skip finding exterior range for
      the second most probable motion vector, and correspondingly skips
      the check for NEARMV.
      
      The runtime of speed -5 goes down
      pedestrian at 1080p 29377 ms -> 27783 ms
      vidyo at 720p       11830 ms -> 10990 ms
      i.e., 6%-8% speed-up.
      
      For rtc set, the compression performance
      goes down by about -1.3% for both speed -5 and -6.
      
      Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
      46ea9ec7
  5. 24 Jun, 2014 1 commit
    • Adrian Grange's avatar
      Fix test on maximum downscaling limits · 8357292a
      Adrian Grange authored
      There is a normative scaling range of (x1/2, x16)
      for VP9. This patch fixes the maximum downscaling
      tests that are applied in the convolve function.
      
      The code used a maximum downscaling limit of x1/5
      for historic reasons related to the scalable
      coding work. Since the downsampling in this
      application is non-normative it will revert to
      using a separate non-normative scaler.
      
      Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
      8357292a
  6. 23 Jun, 2014 1 commit
    • Adrian Grange's avatar
      Allocate buffers based on correct chroma format · 8c1f071f
      Adrian Grange authored
      The encoder currently allocates frame buffers before
      it establishes what the chroma sub-sampling factor is,
      always allocating based on the 4:4:4 format.
      
      This patch detects the chroma format as early as
      possible allowing the encoder to allocate buffers of
      the correct size.
      
      Future patches will change the encoder to allocate
      frame buffers on demand to further reduce the memory
      profile of the encoder and rationalize the buffer
      management in the encoder and decoder.
      
      Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
      8c1f071f
  7. 20 Jun, 2014 2 commits
    • Johann's avatar
      Don't return value for void functions · d6582162
      Johann authored
      Clears "warning: 'return' with a value, in function returning void"
      
      Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
      d6582162
    • Johann's avatar
      Include type defines · baef0b89
      Johann authored
      Clears error: unknown type name 'uint8_t'
      
      Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
      baef0b89
  8. 18 Jun, 2014 2 commits
    • Alex Converse's avatar
      BITSTREAM: Handle transform size and motion vectors more logically for non-420. · 7557a65d
      Alex Converse authored
      This breaks the profile 1 bitstream.
      
      Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
      chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
      chroma and luma planes are the same size. Disallowing larger transforms
      can result in a loss of compression efficiency and is inconsistent.
      
      For sub-8x8 blocks only average corresponding motion vectors.
      
      4:2:0 and profile 0 behavior remains unchanged.
      
      Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
      7557a65d
    • Jingning Han's avatar
      Remove unused vp9_init_quant_tables function · 3b9c19aa
      Jingning Han authored
      This function is not effectively used, hence removed.
      
      Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49
      3b9c19aa
  9. 13 Jun, 2014 1 commit
  10. 12 Jun, 2014 1 commit
    • Jingning Han's avatar
      Fast computation path for forward transform and quantization · ccba289f
      Jingning Han authored
      This commit enables a fast path computational flow for forward
      transformation. It checks the sse and variance of prediction
      residuals and decides if the quantized coefficients are all
      zero, dc only, or more. It then selects the corresponding coding
      path in the forward transformation and quantization stage.
      
      It is currently enabled in rtc coding mode. Will do it for rd
      coding mode next.
      
      In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
      goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
      Overall coding performance for rtc set is changed by -0.18%.
      
      Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
      ccba289f
  11. 10 Jun, 2014 6 commits
    • James Zern's avatar
      vp9_rtcd: correct avx2 references · 9f3a0dbb
      James Zern authored
      s/"\$avx2_x86inc"/"avx2"/
      
      avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm
      
      Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
      9f3a0dbb
    • James Zern's avatar
      vp9_sub_pixel_*variance*: disable avx2 variants · 520cb3f3
      James Zern authored
      tests failing under Win32/Win64
      
      + variance_test: add missing avx2 functions (partially disabled)
      
      Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
      520cb3f3
    • James Zern's avatar
      vp9_sad*x4d: disable avx2 variants · d3ff009d
      James Zern authored
      tests failing under Win32/Win64
      
      + sad_test: add missing avx2 functions (disabled)
      
      Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
      d3ff009d
    • hkuang's avatar
      Add mode info arrays and mode info index. · cdffeaaa
      hkuang authored
      In non frame-parallel decoding, this works the same way as
      current decoding scheme. Every time after decoder finish
      decoding a frame, it will swap the current mode info pointer
      and  previous mode info pointer if the decoded frame needs
      to be shown. Both mode info pointer and previous mode info
      pointer are from mode info arrays.
      
      In frame-parallel decoding, this will become more complicated
      as current frame's mode info pointer will be shared with next
      frame as previous mode info pointer. But when one decoder
      thread finishes decoding one frame and starts to work on next
      available frame, it needs to retain the decoded frame's mode
      info pointers until next frame finishes decoding. The mode info
      index will serve this purpose. The decoder will use different
      buffer in the mode info arrays and use the other buffer to save
      previous decoded frame’s mode info.
      
      Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
      cdffeaaa
    • James Zern's avatar
      vp9_f(dct|ht): disable avx2 variants · dd9f5029
      James Zern authored
      tests failing under Win32/Win64
      
      + dct16x16_test: add missing avx2 functions (partially disabled)
      
      exercises the forward transforms
      no idct/iht implementations, so the c-code is used
      
      Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
      dd9f5029
    • James Zern's avatar
      convolve: disable avx2 variants · 5704578f
      James Zern authored
      tests failing under Win32/Win64
      
      Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
      5704578f
  12. 02 Jun, 2014 1 commit
  13. 01 Jun, 2014 1 commit
  14. 29 May, 2014 2 commits
  15. 28 May, 2014 1 commit
    • Jingning Han's avatar
      Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs · 6d21cbd2
      Jingning Han authored
      This commit enables SSSE3 implementation of the inverse 2D-DCT
      with only first 10 coefficients non-zero. It reduces the runtime
      of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
      
      Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
      6d21cbd2
  16. 27 May, 2014 2 commits
  17. 23 May, 2014 5 commits
  18. 22 May, 2014 2 commits
  19. 21 May, 2014 2 commits
    • Deb Mukherjee's avatar
      Renames x86_64 specific asm files · e2722734
      Deb Mukherjee authored
      Renames all x86_64 specific assembly files to consistently
      end in _x86_64.asm. This will be useful for build systems to
      handle these files differently.
      All new 64-bit specific assembly files should use the new
      naming convention.
      
      Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
      e2722734
    • Dmitry Kovalev's avatar
      Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK. · 35a83677
      Dmitry Kovalev authored
      The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
      This patch does it in the decoder.
      
      Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
      35a83677
  20. 20 May, 2014 2 commits
    • Deb Mukherjee's avatar
      Extends temporal filtering to work for 422 data · a185bc33
      Deb Mukherjee authored
      This is needed for profiles 1 and 2.
      
      Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
      a185bc33
    • hkuang's avatar
      Refactor decode_tiles and loopfilter code. · 20c1edf6
      hkuang authored
      The current decode_tiles decodes the frame one tile by one tile
      and then loopfilter the whole frame or use another worker thread to
      do loopfiltering.
      
      |------|------|------|------|
      |Tile1-|Tile2-|Tile3-|Tile4-|
      |------|------|------|------|
      
      For example, if a tile video has one row and four cols, decode_tiles
      will decode the Tile1, then Tile2, then Tile3, then Tile4.
      And during decode each tile, decode_tile will decode row by row in
      each tile.
      
      For frame parallel decoding, decode_tiles will decode video in row order
      across the tiles. So the order will be:
      "Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
      -> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
      -> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
      -> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
      
      Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
      20c1edf6
  21. 16 May, 2014 1 commit
  22. 15 May, 2014 3 commits