1. 04 Dec, 2015 2 commits
  2. 30 Nov, 2015 1 commit
    • Jian Zhou's avatar
      SSE2 speed up of h_predictor_4x4 · 9d29d762
      Jian Zhou authored
      Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
      Speed up by ~25% in ./test_intra_pred_speed.
      Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
  3. 25 Nov, 2015 1 commit
    • James Zern's avatar
      add vp9_satd_neon · eb1d0f8d
      James Zern authored
      ~60-65% faster at the function level across block sizes
      Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
  4. 23 Nov, 2015 1 commit
  5. 20 Nov, 2015 2 commits
    • James Zern's avatar
      fix vp9_satd_sse2 · 60760f71
      James Zern authored
      accumulate satd in 32-bits
      + add unit test
      Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
    • James Zern's avatar
      vp9_satd: return an int · 3e0138ed
      James Zern authored
      the final sum may use up to 26 bits
      + add a unit test
      + disable the sse2 as the result will rollover; this will be fixed in a
      future commit
      Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
  6. 19 Nov, 2015 1 commit
    • Jian Zhou's avatar
      Speed up tm_predictor_4x4 · 79b68626
      Jian Zhou authored
      tm_predictor_4x4 is implemented with SSE2 using XMM registers.
      Speed up by ~25% in ./test_intra_pred_speed.
      Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
  7. 14 Nov, 2015 1 commit
  8. 13 Nov, 2015 2 commits
  9. 10 Nov, 2015 1 commit
  10. 09 Nov, 2015 2 commits
  11. 06 Nov, 2015 3 commits
  12. 05 Nov, 2015 1 commit
  13. 03 Nov, 2015 1 commit
  14. 31 Oct, 2015 1 commit
  15. 30 Oct, 2015 1 commit
  16. 29 Oct, 2015 1 commit
  17. 28 Oct, 2015 2 commits
  18. 22 Oct, 2015 1 commit
  19. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      Further 2 optimizations are applied:
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
  20. 16 Oct, 2015 1 commit
  21. 09 Oct, 2015 2 commits
  22. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
  23. 07 Oct, 2015 1 commit
  24. 06 Oct, 2015 1 commit
    • James Zern's avatar
      invalid_file_test: loosen error check w/tile-threading · fb209003
      James Zern authored
      The serial decode check is too strict for tile-threaded decoding as
      there is no guarantee on the decode order nor which specific error
      will take precedence. Currently a tile-level error is not forwarded so
      the frame will simply be marked corrupt.
      Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
  25. 30 Sep, 2015 4 commits
  26. 26 Sep, 2015 2 commits
    • Ronald S. Bultje's avatar
      vp9/10: improve support for render_width/height. · 812945a8
      Ronald S. Bultje authored
      In the decoder, map this to the output variable vpx_image_t.r_w/h.
      This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
      which doesn't work with parallel frame decoding. In the encoder,
      map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
      a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
      Also add render_size to the encoder_param_get_to_decoder unit test.
      See issue 1030.
      Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
    • Angie Chiang's avatar
      comment out fdct32 · 6a382101
      Angie Chiang authored
      comment out fdct32
      remove fdct32 test
      Change-Id: I31c47fb435377465cd3265e39621ca50d3aae656
  27. 25 Sep, 2015 1 commit
  28. 24 Sep, 2015 1 commit