1. 20 Apr, 2011 2 commits
  2. 19 Apr, 2011 1 commit
    • Johann's avatar
      modify SAVE_XMM for potential 64bit use · 4a2b684e
      Johann authored
      the win64 abi requires saving and restoring xmm6:xmm15. currently
      SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
      specifying the highest register used and if the stack is unaligned.
      
      Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
      4a2b684e
  3. 18 Apr, 2011 3 commits
    • Johann's avatar
      Add save/restore xmm registers in x86 assembly code · c7cfde42
      Johann authored
      Went through the code and fixed it. Verified on Windows.
      
      Where possible, remove dependencies on xmm[67]
      
      Current code relies on pushing rbp to the stack to get 16 byte
      alignment. This broke when rbp wasn't pushed
      (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
      memory accesses. Revisit this and the offsets in
      vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
      
      Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
      c7cfde42
    • Yunqing Wang's avatar
      Use sub-pixel search's SSE in mode selection · b8f0b599
      Yunqing Wang authored
      Passed SSE from sub-pixel search back to pick_inter_mode
      function, which is compared with the encode_breakout to
      see if we could skip evaluating the remaining modes.
      
      Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
      b8f0b599
    • Scott LaVarnway's avatar
      Removed unused timers · e1a8b6c8
      Scott LaVarnway authored
      Change-Id: I209803b9dbed2b2f6d02258fd7a3963a6645f4ab
      e1a8b6c8
  4. 15 Apr, 2011 4 commits
    • Yunqing Wang's avatar
      Handle long delay between video frames in multi-thread decoder(issue 312) · 8ba58951
      Yunqing Wang authored
      This is reported by m...@hesotech.de (see issue 312):
      "The decoder causes an access violation
      when you decode the first frame, then make a pause of about
      60 seconds and then decode further frames. But only if
      vpx_codec_dec_cfg_t.threads> 1.
      
      This is caused by a timeout of WaitForSingleObject.
      When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
      the problem is solved."
      
      Reproduced the crash and verified the changes on Windows platform.
      This brings the behavior inline with the other platforms using sem_wait().
      
      Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
      8ba58951
    • Johann's avatar
      remove executable bit · f64f425a
      Johann authored
      source files are not executable
      
      Change-Id: Id2c7294695a22217468426423979f68f02d82340
      f64f425a
    • Johann's avatar
      remove dead code, add missing RESTORE_XMM · 487c0299
      Johann authored
      vp8_filter_block1d16_h4_ssse3 was never called
      
      because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
      masked. however, if/when win64 used those registers for persistant data,
      issues could/will arise.
      
      Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
      487c0299
    • John Koleszar's avatar
      Fix off-by-one in copy_and_extend_plane · a3399291
      John Koleszar authored
      Should only copy h lines, not h+1.
      
      Change-Id: I802a85686635900459c6dc79596189033e5298d8
      a3399291
  5. 14 Apr, 2011 2 commits
    • Yunqing Wang's avatar
      Reduce unnecessary distortion computation · 918fb548
      Yunqing Wang authored
      In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
      from motion search as distortion. This helps performance in real-
      time mode.
      
      Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
      918fb548
    • Adrian Grange's avatar
      Fix usage of value returned by vp8_pick_intra4x4mby_modes · 8608de1c
      Adrian Grange authored
      The value of distortion2 returned by vp8_pick_intra4x4mby_modes
      was being overwritten by the value returned by get16x16prederror
      before it was tested.
      
      Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
      8608de1c
  6. 13 Apr, 2011 4 commits
    • Fritz Koenig's avatar
      Use consistent delimiters. · 33cefd6f
      Fritz Koenig authored
      opsnr.stt file was using \t for delimiters on everything
      except between VPXSSIM and Time.
      
      Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df
      33cefd6f
    • Adrian Grange's avatar
      Fixed use of early breakout in vp8_pick_intra4x4mby_modes · 88611746
      Adrian Grange authored
      Index i is used to detect early breakout from the first loop, but
      its value is lost due to reuse in the second for loop. I moved
      the position of the second loop and did some format cleanup.
      
      Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
      88611746
    • John Koleszar's avatar
      Refactor lookahead ring buffer · 88841f10
      John Koleszar authored
      This patch cleans up the source buffer storage and copy mechanism to
      allow access through a standard push/pop/peek interface. This approach
      also avoids an extra copy in the case where the source is not a
      multiple of 16, fixing issue #102.
      
      Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
      88841f10
    • Johann's avatar
      store quant_shift as an unsigned char · 70f30aa9
      Johann authored
      in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant
      
      only use 8 bits to store this, instead of 16. will allow saving an
      xmm register in an updated version of the regular quantize
      
      Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
      70f30aa9
  7. 12 Apr, 2011 2 commits
    • John Koleszar's avatar
      Bugfix for error accumulator stats · e689a27d
      John Koleszar authored
      Previous to commit de4e9e3b, there was an early return in the alt-ref
      case that was inadvertantly removed when the function was refactored
      to return void. This patch restores the prior behavior.
      
      Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
      e689a27d
    • Attila Nagy's avatar
      Fix encoder range check for frame width and height · 1aadcedc
      Attila Nagy authored
      14 bits available in the bistream => valid range [1..16383]
      Removed unused local vars.
      
      Change-Id: Icf3385e47a9fa13af70053129c2248671f285583
      1aadcedc
  8. 11 Apr, 2011 3 commits
  9. 08 Apr, 2011 2 commits
    • Yunqing Wang's avatar
      Fix input MV for full search · 4b43167a
      Yunqing Wang authored
      Input MV needs to be modified to full-pixel precision.
      
      Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c
      4b43167a
    • Paul Wilkins's avatar
      Error accumulator stats bug. · de4e9e3b
      Paul Wilkins authored
      The error accumulator stats values cpi->prediction_error and
      cpi->intra_error were being populated with rd values not
      distortion values.
      
      These are only "currently" used in a limited way for RT compress
      key frame detection.
      
      Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
      de4e9e3b
  10. 07 Apr, 2011 3 commits
    • Jim Bankoski's avatar
      fixed an overflow in ssim calculation · d4cdb683
      Jim Bankoski authored
      This commit fixed an overflow in ssim calculation, added register
      save and restore to make sure assembly code working for x64 platform.
      It also changed the sampling points to every 4x4 instead of 8x8 and
      adjusted the constants in SSIM calculation to match the scale of
      previous VPXSSIM.
      
      Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
      d4cdb683
    • Johann Koenig's avatar
      use asm_offsets with vp8_fast_quantize_b_sse3 · 08702002
      Johann Koenig authored
      on the same order as the sse2 fast quantize change: ~2%
      except for 32bit. only a slight improvment there.
      
      Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
      08702002
    • James Berry's avatar
      Use correct 32 bit comparisons for SAD breakout. · aec5487c
      James Berry authored
      Rax updated to eax to avoid uninitialized memory
      usage.
      
      Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
      aec5487c
  11. 06 Apr, 2011 1 commit
  12. 04 Apr, 2011 3 commits
  13. 01 Apr, 2011 3 commits
    • Yunqing Wang's avatar
      Use full-pixel MV in mvsadcost calculation · 3d681581
      Yunqing Wang authored
      MV sad cost error is only used in full-pixel motion search,
      which only need full-pixel resolution instead of quarter-pixel
      resolution. This change reduced mvsadcost table size, and
      removed unneccessary pamameter passing since this table is
      constant once it is generated.
      
      Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
      3d681581
    • Johann's avatar
      tweak vp8_regular_quantize_b_sse2 · 8520b5c7
      Johann authored
      rather than look up rc in the zig zag table, embed it in the macro. this
      also allows us to shuffle some values in the macro and keep *d in rsi
      
      gains of about the same order as the obj_int_extract implementation: ~2%
      
      Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
      8520b5c7
    • Tero Rintaluoma's avatar
      Wrapper function removed from vp8_subtract_b_neon function call · cec76a36
      Tero Rintaluoma authored
      Address calculations moved from encodemb_arm.c file to neon
      optimized assembly function to save cycles in function calls.
       - vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
         that contains all needed address calculations
       - unnecessary file encodemb_arm.c removed
       - consistent with ARMv6 optimized version
      
      Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
      cec76a36
  14. 30 Mar, 2011 1 commit
  15. 29 Mar, 2011 3 commits
  16. 28 Mar, 2011 2 commits
    • Johann's avatar
      add asm_enc_offsets.c for all targets · 4be062bb
      Johann authored
      now that we need asm_enc_offsets.c for x86 and arm and it is
      harmless to build it for other targets, add it unconditionally
      
      Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
      4be062bb
    • Tero Rintaluoma's avatar
      Half pixel variance further optimized for ARMv6 · f5e43346
      Tero Rintaluoma authored
      Half pixel interpolations optimized in variance calculations. Separate
      function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
      average, performance improvement is 6-7% for VGA@30fps sequences.
      
      Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
      f5e43346
  17. 24 Mar, 2011 1 commit
    • Johann's avatar
      use asm_offsets with vp8_regular_quantize_b_sse2 · 8edaf6e2
      Johann authored
      remove helper function and avoid shadowing all the arguments to the
      stack on 64bit systems
      
      when running with --good --cpu-used=0:
      ~2% on linux x86 and x86_64
      ~2% on win32 x86 msys and visual studio
      more on darwin10 x86_64
      significantly more on
      x86_64-win64-vs9
      
      Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
      8edaf6e2