1. 15 Oct, 2013 2 commits
  2. 12 Oct, 2013 1 commit
  3. 11 Oct, 2013 8 commits
  4. 10 Oct, 2013 5 commits
    • Dmitry Kovalev's avatar
      Removing vp9_idct4_1d_sse2 function. · ddf1b762
      Dmitry Kovalev authored
      We have two SSE2-optimized functions for idct4_1d:
        vp9_idct4_1d_sse2 <-- removing this one
        idct4_1d_sse2
      
      vp9_idct4_1d_sse2 was used only by the following functions which already
      have SSE2 optimized variants:
        vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
        idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
        vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
      
      Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
      ddf1b762
    • Scott LaVarnway's avatar
      d207 intra prediction ssse3 using bytes · 83936e8c
      Scott LaVarnway authored
      byte version of ronalds d207 ssse3 optimizations
      (commit: f891f84d3ba9345b0074e682f0fea09b8ddf4f1e)
      
      Change-Id: If15f71a589ea16f78ac86a501b0c5c6231dc9af1
      83936e8c
    • Yunqing Wang's avatar
      SSE2 8-tap sub-pixel filter optimization · 3fb728c7
      Yunqing Wang authored
      To ensure fast encoding/decoding on devices without ssse3 support,
      SSE2 optimization of sub-pixel filters was done. Test using 1080p
      clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
      with sse2 filters, and ~15fps with c filters.
      
      Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
      3fb728c7
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 32x32 functions. · 1e766b50
      Dmitry Kovalev authored
      Renames:
        vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
        vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
        vp9_idct_add_32x32        -> vp9_idct32x32_add
      
      Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
      1e766b50
    • Dmitry Kovalev's avatar
      Adding const to several pointers. · d9d7040e
      Dmitry Kovalev authored
      Change-Id: I7231589bda71d0d23c730283febd5bb58585a0da
      d9d7040e
  5. 09 Oct, 2013 2 commits
  6. 08 Oct, 2013 2 commits
    • Jingning Han's avatar
      All zero coeff skip in IDCT 32x32 · 6594ca88
      Jingning Han authored
      When all coefficients are zeros, skip the corresponding 1-D inverse
      transform. This practice has been used in the SSE2 implementation of
      inverse 32x32 DCT. This commit imports this algorithm into the C code.
      
      Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
      6594ca88
    • Dmitry Kovalev's avatar
      Removing inv_txm4x4_1_add and inv_txm4x4_add function pointers. · c983c966
      Dmitry Kovalev authored
      We already have itxm_add member in MACROBLOCKD structure. Both
      inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
      different eob values. But eob logic is already implemented in
      vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
      inverse_transform_b_4x4_add).
      
      Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
      c983c966
  7. 07 Oct, 2013 7 commits
  8. 06 Oct, 2013 1 commit
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT 8x8 functions. · c6ad70d5
      Dmitry Kovalev authored
      Renames:
        vp9_short_idct8x8_add    -> vp9_idct8x8_64_add
        vp9_short_idct8x8_1_add  -> vp9_idct8x8_1_add
        vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
        vp9_idct_add_8x8         -> vp9_idct8x8_add
      
      Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
      c6ad70d5
  9. 04 Oct, 2013 3 commits
    • Dmitry Kovalev's avatar
      Cleaning up foreach_predicted_block_in_plane() function. · ee74054e
      Dmitry Kovalev authored
      Change-Id: Ibb3d9667eba56621667412f62097aa7a392659c2
      ee74054e
    • Dmitry Kovalev's avatar
      Giving consistent names to IDCT/IWHT functions. · 3a060257
      Dmitry Kovalev authored
      The idea is to have the following names for each transform size:
      
      vp9_idct4x4_add
        vp9_idct4x4_1_add
        vp9_idct4x4_10_add
        vp9_idct4x4_16_add
      
      vp9_idct8x8_add
        vp9_idct8x8_1_add
        vp9_idct8x8_10_add
        vp9_idct8x8_64_add
      
      etc for 16x16, 32x32
      
      The actual list of renames in this patch:
      
      vp9_idct_add_lossless     -> vp9_iwht4x4_add
      vp9_short_iwalsh4x4_add   -> vp9_iwht4x4_16_add
      vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
      
      vp9_idct_add            -> vp9_idct4x4_add
      vp9_short_idct4x4_add   -> vp9_idct4x4_16_add
      vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
      
      Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
      3a060257
    • Dmitry Kovalev's avatar
      Adding vp9_get_filter_kernel() function. · 9ec09700
      Dmitry Kovalev authored
      Moving INTERPOLATIONFILTERTYPE enum and subpix_fn_table struct to
      vp9_filter.h. Adding convenient typedef for subpel kernels.
      
      Function vp9_setup_interp_filters() besides setting xd->subpix.filter_x &
      xd->subpix.filter_y has a side effect of also setting scale factors. This
      is not required inside decode_modes_b() because scale factors have been
      already set by set_ref() calls. That's why replacing
      vp9_setup_interp_filters() call with newly created vp9_get_filter_kernel()
      call. The behavior of vp9_setup_interp_filters() is unchanged (it
      is used from the encoder).
      
      Change-Id: I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
      9ec09700
  10. 03 Oct, 2013 2 commits
    • Jingning Han's avatar
      Change b_mode_info definition from union to struct · 4093192e
      Jingning Han authored
      This commit defines b_mode_info as a struct type. This will allow
      us to further remove the use of PARTITION_INFO in the encoding process.
      
      Change-Id: I975b0f7d557b5e0f66545a61b472def76b671cce
      4093192e
    • Yunqing Wang's avatar
      Rewrite HORIZx4 and HORIZx8 in subpixel filter functions · ed22179a
      Yunqing Wang authored
      In subpixel filters, prefetched source data, unrolled loops,
      and interleaved instructions.
      
      In HORIZx4, integrated the idea in Scott's CL (commit:
      d22a504d), which was suggested by
      Erik/Tamar from Intel. Further tweaking was done to combine row 0,
      2, and row 1, 3 in registers to do more 2-row-in-1 operations until
      the last add.
      
      Test showed a ~2% decoder speedup.
      
      Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3
      ed22179a
  11. 02 Oct, 2013 5 commits
  12. 01 Oct, 2013 2 commits
    • Dmitry Kovalev's avatar
      Making decode_modes_b function more straightforward. · aeb603f2
      Dmitry Kovalev authored
      Moving out decode_tokens function calls and adding decode_blocks boolean
      variable. We only have to decode if eobtotal > 0, i.e. we have at least one
      non-zero coefficient. Also inlining and remove vp9_set_pred_flag_mbskip
      function.
      
      Change-Id: I7be38b12ee8206faf0beea2bbf4d52be42575b03
      aeb603f2
    • Yunqing Wang's avatar
      Modify HORIZx16 macro in subpixel filter functions · df8e1564
      Yunqing Wang authored
      Interleaved the instructions, reduced register dependency, and
      prefetched the source data. This improved the decoder speed
      by 0.6% - 2%.
      
      Change-Id: I568067aa0c629b2e58219326899c82aedf7eccca
      df8e1564