1. 12 Aug, 2013 2 commits
    • Jingning Han's avatar
      SSE2 high precision 32x32 forward DCT · 78136edc
      Jingning Han authored
      Enable SSE2 implementation of high precision 32x32 forward DCT. The
      intermediate stacks are of 32-bits. The run-time goes down from
      32126 cycles to 13442 cycles.
      
      Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
      78136edc
    • Dmitry Kovalev's avatar
      Removing foreach_predicted_block_uv function. · 76d166e4
      Dmitry Kovalev authored
      Adding function build_inter_predictors_for_planes to build inter
      predictors for specified planes. This function allows to remove
      condition "#if CONFIG_ALPHA" and use MAX_MB_PLANE for general case.
      Renaming 'which_mv' local var to 'ref', and 'weight' argument to 'ref'.
      
      Change-Id: I1a97160c9263006929d38953f266bc68e9c56c7d
      76d166e4
  2. 10 Aug, 2013 1 commit
  3. 09 Aug, 2013 3 commits
  4. 08 Aug, 2013 1 commit
  5. 07 Aug, 2013 1 commit
    • Dmitry Kovalev's avatar
      Adding ss_size_lookup table. · 8db2675b
      Dmitry Kovalev authored
      Removing the old one bsize_from_dim_lookup. Now we have a way to determine
      block size for plane using its subsampling values (ss_size_lookup). And
      then we can find the number of pixels in the block (num_pels_log2_lookup).
      
      Change-Id: I6fc981da2ae093de81741d3d78eaefed11015db9
      8db2675b
  6. 06 Aug, 2013 1 commit
    • Dmitry Kovalev's avatar
      Using only one scale function in scale_factors struct. · 1c552e79
      Dmitry Kovalev authored
      Functions scale_mv_q4 and scale_mv_q3_to_q4 were almost identical except
      q3->q4 conversion in scale_mv_q3_to_q4. Now q3->q4 conversion happens
      directly in vp9_build_inter_predictor.
      
      Also adding useful constants: SUBPEL_BITS and SUBPEL_MASK.
      
      Change-Id: Ia0a6ad2ac07c45fdf95a5139ece6286c035e9639
      1c552e79
  7. 05 Aug, 2013 2 commits
  8. 02 Aug, 2013 3 commits
  9. 30 Jul, 2013 1 commit
  10. 25 Jul, 2013 1 commit
  11. 24 Jul, 2013 1 commit
  12. 22 Jul, 2013 1 commit
    • Jingning Han's avatar
      Optimize operation flow in sub8x8 rd loop · 409e77f2
      Jingning Han authored
      Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
      the encoder to skip the forward transform, quantization, and coeff cost
      estimation, in the sub8x8 rd optimization search, if the motion
      vector(s) are of integer pixel value, and have been tested in the
      previous prediction filter type rd loops of the same block.
      
      This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
      Its efficacy depends how frequently the motion search will select an
      integer motion vector.
      
      Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
      409e77f2
  13. 19 Jul, 2013 3 commits
  14. 18 Jul, 2013 2 commits
    • Dmitry Kovalev's avatar
      Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT). · 0b562b2d
      Dmitry Kovalev authored
      Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
      0b562b2d
    • Ronald S. Bultje's avatar
      Merge scale_factors and scale_factors_uv. · 5ebe503f
      Ronald S. Bultje authored
      This prevents a duplicate memcpy of a 128-byte struct every time
      set_scale_factors() is called (which is a lot), thus leading to a
      decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
      RD/partition loop.
      
      Overall, this decreases encoding time of the first 50 frames of bus
      @ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
      overall speedup. We can likely get more gains by removing the copy
      of the other struct (and replacing it with an indexing) as well.
      
      Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
      5ebe503f
  15. 16 Jul, 2013 3 commits
  16. 14 Jul, 2013 1 commit
  17. 12 Jul, 2013 2 commits
    • James Zern's avatar
      vp9: consistent 'log2' variable naming · 0195fb53
      James Zern authored
      lg2 -> log2
      
      Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
      0195fb53
    • Deb Mukherjee's avatar
      Some minor cleanups for efficiency · 94c481f9
      Deb Mukherjee authored
      Implements some of the helper functions more efficiently with
      lookups rathers than branches. Modeling function is consolidated
      to reduce some computations.
      
      Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
      one because there is no need to keep them separate (even though
      the semantics are a little different).
      
      No bitstream or output change.
      
      About 0.5% speedup
      
      Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
      94c481f9
  18. 11 Jul, 2013 1 commit
    • Dmitry Kovalev's avatar
      Moving segmentation related vars into separate struct. · c4ad3273
      Dmitry Kovalev authored
      Adding segmentation struct to vp9_seg_common.h. Struct members are from
      macroblockd and VP9Common structs. Moving segmentation related constants
      and enums to vp9_seg_common.h.
      
      Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
      c4ad3273
  19. 10 Jul, 2013 4 commits
    • Jim Bankoski's avatar
      remove warnings when NDEBUG is set · 6591cf2f
      Jim Bankoski authored
      Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
      6591cf2f
    • Deb Mukherjee's avatar
      Prunes out full-rd computation based on modeled rd · 53ff43ad
      Deb Mukherjee authored
      Adds a speed feature to eliminate full-rd computation if the modeled
      rd or rd based on a different parameter in the same mode is already
      a lot larger than the best rd yet.
      
      Specifically, only search the sharp and smooth filters if the modeled
      rd cost based on the  regular filter is within a certain factor of the
      best rd cost so far. Also, skip full-rd computation of non splitmv
      inter modes if the modeled rd cost based on pred error is within the
      same factor of the best rd cost so far.
      
      Also adds some enhancements in the rd search for splitmv mode to
      speed things up by early breakouts. Negligible impact on performance.
      
      Resuts on derfraw300:
      psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
               breakout feature on.
      speedup: 6% with splitmv enhancements, 20% with also residual breakout
               (tested on football sequence at 600 Kbps)
      
      Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
      53ff43ad
    • Jim Bankoski's avatar
      mi_width_log2 & mi_height_log2 · 863204e6
      Jim Bankoski authored
      converted to lookup to avoid unnecessary code
      
      Change-Id: I2ee6a01f06984cc2c4ba74b3fffd215318f749d2
      863204e6
    • Jim Bankoski's avatar
      b_width_log2 and b_height_log2 lookups · 6c8170af
      Jim Bankoski authored
          Replace case statement with lookup.
          Small speed gain at low speed settings but at speed 2+ where the
          number of motion searches etc. falls the impact rises to ~3-4%.
      
          Change-Id: Idff639b7b302ee65e042b7bf836943ac0a06fad8
      
      Change-Id: I5940719a4a161f8c26ac9a6753f1678494cec644
      6c8170af
  20. 02 Jul, 2013 3 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
      be77f6bb
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Dmitry Kovalev's avatar
      Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
      Dmitry Kovalev authored
      Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
      1ac05402
  21. 29 Jun, 2013 1 commit
    • James Zern's avatar
      fix test compile error · a63e31e8
      James Zern authored
      since:
      92479d95 Make update_partition_context faster
      
      fixes:
      vp9/common/vp9_blockd.h:408:22: error:
      non-constant-expression cannot be narrowed from type 'int' to 'char' in
      initializer list [-Wc++11-narrowing]
        char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
                           ^~~~~~~~~~~~~~~~~
      
      Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
      a63e31e8
  22. 27 Jun, 2013 1 commit
    • Jingning Han's avatar
      Make update_partition_context faster · 92479d95
      Jingning Han authored
      Use vpx_memset for updating the partition contexts. Thanks to Noah
      for pointing out the need of refactoring in this part.
      
      Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
      92479d95
  23. 26 Jun, 2013 1 commit