1. 17 Feb, 2016 8 commits
  2. 16 Feb, 2016 2 commits
  3. 12 Feb, 2016 1 commit
  4. 04 Feb, 2016 1 commit
  5. 28 Jan, 2016 1 commit
    • hui su's avatar
      Fix some typos. · 5afc4e4c
      hui su authored
      Change-Id: I32aacd014df6c927cf2893dc096cbe6ec7604b9b
      5afc4e4c
  6. 20 Jan, 2016 1 commit
  7. 19 Jan, 2016 1 commit
    • paulwilkins's avatar
      Loop filter search resets on overlay frame. · 733bbab5
      paulwilkins authored
      This patch fixes a bug that causes the loop filter search to reset to
      a low value or zero after each arf overlay frame. We expect the overlay
      frames to need little or no loop filtering but this should not propagate.
      
      Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
      733bbab5
  8. 13 Jan, 2016 2 commits
  9. 12 Jan, 2016 1 commit
    • Scott LaVarnway's avatar
      VP9: Eliminate unnecessary nearest/near searches · d8aa4063
      Scott LaVarnway authored
      Prior to this patch, read_inter_block_mode_info() would
      find the nearmv and nearestmv for all modes.  Now it does not
      search for ZEROMV modes and breaks out early for NEARMV and
      NEWMV modes.
      
      Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
      d8aa4063
  10. 05 Jan, 2016 2 commits
    • Yaowu Xu's avatar
      Assert no mv clamping for scaled references · 2bd4f444
      Yaowu Xu authored
      Under --enable-better-hw-compabibility, this commit adds the asserts
      that no mv clamping is applied for scaled references, so when built
      with this configure option, decoder will assert if an input bitstream
      triggger mv clamping for scaled reference frames.
      
      Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
      2bd4f444
    • Yaowu Xu's avatar
      Assert no 8x4/4x8 partition for scaled references · 03a021a6
      Yaowu Xu authored
      This commit adds a new configure option:
      
      --enable-better-hw-compatibility
      
      The purpose of the configure option is to provide information on known
      hardware decoder implementation bugs, so encoder implementers may
      choose to implement their encoders in a way to avoid triggering these
      decoder bugs.
      
      The WebM team were made aware of that a number of hardware decoders
      have trouble in handling the combination of scaled frame reference
      frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
      decoder, so when built with above configure option, the decoder can
      assert if an input bitstream triggers such decoder bug.
      
      Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
      03a021a6
  11. 14 Dec, 2015 1 commit
  12. 09 Dec, 2015 1 commit
  13. 02 Dec, 2015 1 commit
  14. 25 Nov, 2015 1 commit
    • James Zern's avatar
      add vp9_satd_neon · eb1d0f8d
      James Zern authored
      ~60-65% faster at the function level across block sizes
      
      Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
      eb1d0f8d
  15. 20 Nov, 2015 2 commits
    • James Zern's avatar
      fix vp9_satd_sse2 · 60760f71
      James Zern authored
      accumulate satd in 32-bits
      + add unit test
      
      Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
      60760f71
    • James Zern's avatar
      vp9_satd: return an int · 3e0138ed
      James Zern authored
      the final sum may use up to 26 bits
      
      + add a unit test
      + disable the sse2 as the result will rollover; this will be fixed in a
      future commit
      
      Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
      3e0138ed
  16. 13 Nov, 2015 1 commit
    • paulwilkins's avatar
      Changes to exhaustive motion search. · 0149fb3d
      paulwilkins authored
      This change alters the nature and use of exhaustive motion search.
      
      Firstly any exhaustive search is preceded by a normal step search.
      The exhaustive search is only carried out if the distortion resulting
      from the step search is above a threshold value.
      
      Secondly the simple +/- 64 exhaustive search is replaced by a
      multi stage mesh based search where each stage has a range
      and step/interval size. Subsequent stages use the best position from
      the previous stage as the center of the search but use a reduced range
      and interval size.
      
      For example:
        stage 1: Range +/- 64 interval 4
        stage 2: Range +/- 32 interval 2
        stage 3: Range +/- 15 interval 1
      
      This process, especially when it follows on from a normal step
      search, has shown itself to be almost as effective as a full range
      exhaustive search with step 1 but greatly lowers the computational
      complexity such that it can be used in some cases for speeds 0-2.
      
      This patch also removes a double exhaustive search for sub 8x8 blocks
      which also contained  a bug (the two searches used different distortion
      metrics).
      
      For best quality in my test animation sequence this patch has almost
      no impact on quality but improves encode speed by more than 5X.
      
      Restricted use in good quality speeds 0-2 yields significant quality gains
      on the animation test of 0.2 - 0.5 db with only a small impact on encode
      speed. On most clips though the quality gain and speed impact are small.
      
      Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
      0149fb3d
  17. 11 Nov, 2015 1 commit
    • Geza Lore's avatar
      Add AVX vectorized vp9_diamond_search_sad · 5eefd3eb
      Geza Lore authored
      This function now has an AVX intrinsics version which is about 80%
      faster compared to the C implementation. This provides a 2-4% total
      speed-up for encode, depending on encoding parameters. The function
      utilizes 3 properties of the cost function lookup table, constructed
      in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
      For the joint cost:
        - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
      For the component costs:
        - For all i: mvsadcost[0][i] == mvsadcost[1][i]
              (equal per component cost)
        - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
              (Cost function is even)
      These must hold, otherwise the AVX version of the function cannot be used.
      
      Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
      5eefd3eb
  18. 06 Nov, 2015 1 commit
    • James Zern's avatar
      Revert "Add AVX vectorized vp9_diamond_search_sad" · 30466f26
      James Zern authored
      This reverts commit f1342a7b.
      
      This breaks 32-bit builds:
       runtime error: load of misaligned address 0xf72fdd48 for type 'const
      __m128i' (vector of 2 'long long' values), which requires 16 byte
      alignment
      
      + _mm_set1_epi64x is incompatible with some versions of visual studio
      
      Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
      30466f26
  19. 05 Nov, 2015 1 commit
    • Geza Lore's avatar
      Add AVX vectorized vp9_diamond_search_sad · f1342a7b
      Geza Lore authored
      This function now has an AVX intrinsics version which is about 80%
      faster compared to the C implementation. This provides a 2-4% total
      speed-up for encode, depending on encoding parameters. The function
      utilizes 3 properties of the cost function lookup table, constructed
      in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
      For the joint cost:
        - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
      For the component costs:
        - For all i: mvsadcost[0][i] == mvsadcost[1][i]
              (equal per component cost)
        - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
              (Cost function is even)
      These must hold, otherwise the AVX version of the function cannot be used.
      
      Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
      f1342a7b
  20. 03 Nov, 2015 1 commit
  21. 02 Nov, 2015 1 commit
  22. 29 Oct, 2015 1 commit
  23. 28 Oct, 2015 1 commit
  24. 26 Oct, 2015 1 commit
  25. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      
      Further 2 optimizations are applied:
      
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
      aa8f8522
  26. 16 Oct, 2015 1 commit
  27. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
      0134764f
  28. 06 Oct, 2015 1 commit
  29. 01 Oct, 2015 1 commit
    • hui su's avatar
      Small cleanup · 06bdc7f6
      hui su authored
      Change-Id: I5aeaa94b743f84738d288f8b027fec4c164f2ec3
      06bdc7f6