1. 27 Sep, 2013 1 commit
  2. 26 Sep, 2013 1 commit
  3. 25 Sep, 2013 1 commit
  4. 23 Sep, 2013 1 commit
    • Jingning Han's avatar
      Enable per transformed block zero coeffs forcing · a517343c
      Jingning Han authored
      This commit enables forcing all coefficients zero per transformed
      block, when its rate-distortion cost is lower than regular coeff
      quantization.
      
      The overall performance improvement (including its parent patch on
      calculating rd cost per transformed block) at speed 1:
      derf:  0.298%
      yt:    0.452%
      hd:    0.741%
      stdhd: 0.006%
      
      Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
      a517343c
  5. 19 Sep, 2013 1 commit
  6. 11 Sep, 2013 1 commit
    • Scott LaVarnway's avatar
      New mode_info_context storage -- undo revert · ac6093d1
      Scott LaVarnway authored
      mode_info_context was stored as a grid of MODE_INFO structs.
      The grid now constists of pointers to MODE_INFO structs.  The
      MODE_INFO structs are now stored as a stream (decoder only),
      eliminating unnecessary copies and is a little more cache
      friendly.
      
      Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
      ac6093d1
  7. 09 Sep, 2013 1 commit
  8. 07 Sep, 2013 1 commit
    • Jingning Han's avatar
      Fix overflow issue in 16x16 quantization SSSE3 · 09bc942b
      Jingning Han authored
      The 16x16 transform unit test suggested that the peak coefficient
      value can reach 32639. This could cause potential overflow issue
      in the SSSE3 implmentation of 16x16 block quantization. This commit
      fixes this issue by replacing addition with saturated addition.
      
      Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
      09bc942b
  9. 06 Sep, 2013 1 commit
    • Scott LaVarnway's avatar
      New mode_info_context storage · dae17734
      Scott LaVarnway authored
      mode_info_context was stored as a grid of MODE_INFO structs.
      The grid now constists of a pointer to a MODE_INFO struct and
      a "in the image" flag.  The MODE_INFO structs are now stored
      as a stream, eliminating unnecessary copies and is a little
      more cache friendly.
      
      For the test clips used, the decoder performance improved
      by ~4.3% (1080p) and ~9.7% (720p).
      
      Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
      and 5.9% (720p).
      
      Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
      dae17734
  10. 28 Aug, 2013 1 commit
  11. 27 Aug, 2013 1 commit
  12. 21 Aug, 2013 1 commit
  13. 19 Aug, 2013 2 commits
    • Dmitry Kovalev's avatar
      Passing plane_bsize to foreach_transformed_block_visitor. · 82d4d9a0
      Dmitry Kovalev authored
      Updating all foreach_transformed_block_visitor functions to work with
      plane block size instead of general block. Removing a lot of duplicated
      code.
      
      Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
      82d4d9a0
    • Dmitry Kovalev's avatar
      Using plane_bsize instead of bsize. · 2e3478a5
      Dmitry Kovalev authored
      This change set is intermediate. The next one will remove all repetitive
      plane_bsize calculations, because it will be passed as argument to
      foreach_transformed_block_visitor.
      
      Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
      2e3478a5
  14. 16 Aug, 2013 2 commits
    • Dmitry Kovalev's avatar
      Removing unused or redundant arguments from *_args structures. · 26e5b5e2
      Dmitry Kovalev authored
      Redundant dst, pre[2] from build_inter_predictors_args, unused cm from
      encode_b_args.
      
      Change-Id: I2c476cd328c5c0cca4c78ba451ca6ba2a2c37e2d
      26e5b5e2
    • Dmitry Kovalev's avatar
      Moving from ss_txfrm_size to tx_size. · afd9bd3e
      Dmitry Kovalev authored
      Updating foreach_transformed_block_visitor and corresponding functions
      to accept tx_size instead of ss_txfrm_size. List of functions per file:
      
      vp9_decodframe.c
        decode_block
        decode_block_intra
      
      vp9_detokenize.c
        decode_block
      
      vp9_encodemb.c
        optimize_block
        vp9_xform_quant
        vp9_encode_block_intra
      
      vp9_rdopt.c
        dist_block
        rate_block
        block_yrd_txfm
      
      vp9_tokenize.c
        set_entropy_context_b
        tokenize_b
        is_skippable
      
      Change-Id: I351bf563eb36cf34db71c3f06b9bbc9a61b55b73
      afd9bd3e
  15. 15 Aug, 2013 1 commit
  16. 14 Aug, 2013 1 commit
  17. 09 Aug, 2013 1 commit
  18. 08 Aug, 2013 1 commit
  19. 07 Aug, 2013 1 commit
    • Jingning Han's avatar
      Use low precision 32x32fdct for encodemb in speed1 · debb9c68
      Jingning Han authored
      The low precision 32x32 fdct has all the intermediate steps within
      16-bit depth, hence allowing faster SSE2 implementation, at the
      expense of larger round-trip error. It was used in the rate-distortion
      optimization search loop only.
      
      Using the low precision version, in replace of the high precision one,
      affects the compression performance by about 0.7% (derf, stdhd) at
      speed 0. For speed 1, it makes derf set down by only 0.017%.
      
      Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
      debb9c68
  20. 05 Aug, 2013 1 commit
  21. 02 Aug, 2013 1 commit
    • Dmitry Kovalev's avatar
      Adding is_inter_block function. · 680ec32d
      Dmitry Kovalev authored
      Using it instead of long unclear verbose check
      "mbmi->ref_frame[0] != INTRA_FRAME".
      
      Change-Id: I9c7b4b3797942fa962bf3ba7460fff3084beabe9
      680ec32d
  22. 01 Aug, 2013 1 commit
  23. 29 Jul, 2013 1 commit
    • Jingning Han's avatar
      16x16 inverse 2D-DCT with DC only · a7c4de22
      Jingning Han authored
      This commit provides special handle on 16x16 inverse 2D-DCT, where
      only DC coefficient is quantized to be non-zero value.
      
      Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
      a7c4de22
  24. 27 Jul, 2013 2 commits
    • Ronald S. Bultje's avatar
      Inverse dimension order in token_cost array. · 118ccdcd
      Ronald S. Bultje authored
      This allows us to increment the position at the band-level only as
      we go from one band to the next; more importantly, that allows us to
      use an add instead of multiply instruction, and omit the instruction
      altogether if the band doesn't change from one coef to the next, thus
      being slightly faster (probably more noticeable on systems where a
      multiply is expensive, like arm).
      
      Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
      118ccdcd
    • Jingning Han's avatar
      Shortcut 8x8/16x16 inverse 2D-DCT · 38fa4871
      Jingning Han authored
      This commit brought back the shortcut implementation of 8x8/16x16
      inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
      operations on row 4:7/4:15 in the first round. For bus_cif at 1000
      kbps, this provides about 2% speed-up at speed 0.
      
      Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
      38fa4871
  25. 26 Jul, 2013 1 commit
    • Jingning Han's avatar
      Special handle on DC only inverse 8x8 2D-DCT · 325e0aa6
      Jingning Han authored
      This commit enables a special handle for the 8x8 inverse 2D-DCT,
      where only DC coefficient is quantized to be non-zero. For bus_cif
      at 2000 kbps, it provides about 1% speed-up at speed 0.
      
      Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
      325e0aa6
  26. 25 Jul, 2013 1 commit
    • Jingning Han's avatar
      Make coeff_optimize initialized per-plane · 2f58faff
      Jingning Han authored
      This commit makes the initialization of trellis coeff optimization
      a per-plane operation, thereby eliminating the redundant steps in
      encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
      faster.
      
      Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
      2f58faff
  27. 24 Jul, 2013 1 commit
  28. 23 Jul, 2013 4 commits
    • Jingning Han's avatar
      Unify the use of encode_b_args/optimize_block_args · ab77828b
      Jingning Han authored
      The struct optimize_block_args is defined same as encode_b_args.
      Remove this redundant definition, and use encode_b_args consistently.
      
      Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
      ab77828b
    • Jingning Han's avatar
      Make xform_quant operations tx_type independent · e9e2fe8e
      Jingning Han authored
      The xform_quant() module is only used by inter modes, hence removing
      the redundant switches therein conditioned on tx_type.
      
      Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
      e9e2fe8e
    • Jingning Han's avatar
      Skip inverse transform when eob is zero · 0359ad7f
      Jingning Han authored
      When all the transform coefficients were quantized to zero, skip
      the inverse transform operation. For bus_cif at 1000 kbps, the
      runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
      at speed 0.
      
      Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
      0359ad7f
    • Jim Bankoski's avatar
      clean up bw, bh · 86a9dec7
      Jim Bankoski authored
      many structures use bw and bh and they have different meanings.   This cl attempts
      to start this clean up and remove unneccessary 2 step look up log and then
      shift operations...
      
      also removed partition type multiple operation code in bitstream.c.
      
      Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
      86a9dec7
  29. 16 Jul, 2013 1 commit
    • Ronald S. Bultje's avatar
      Inline vp9_quantize() in xform_quant(). · 1ff94fea
      Ronald S. Bultje authored
      Cycle times:
      4x4:    151 to  131 cycles (15% faster)
      8x8:    334 to  306 cycles (9% faster)
      16x16: 1401 to 1368 cycles (2.5% faster)
      32x32: 7403 to 7367 cycles (0.5% faster)
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
      goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.
      
      Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
      1ff94fea
  30. 15 Jul, 2013 3 commits
    • Ronald S. Bultje's avatar
      Inline xform_quant() in encode_block_intra(). · 6fb41874
      Ronald S. Bultje authored
      Also inline some of the block calculations to assist the compiler to
      not do silly things like calculating the same offset (or converting
      between raster/transform block offset or block, mi and pixel unit)
      many, many, many times.
      
      Cycle times:
      4x4:     584 ->   505 cycles (16% faster)
      8x8:    1651 ->  1560 cycles (6% faster)
      16x16:  7897 ->  7704 cycles (2.5% faster)
      32x32: 16096 -> 15852 cycles (1.5% faster)
      
      Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
      first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.
      
      Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
      6fb41874
    • Jingning Han's avatar
      Skip inter-coded block reconstruction in rd loop · 043e0f9d
      Jingning Han authored
      Skip the inverse transform and reconstruction of inter-mode coded
      blocks in the rate-distortion optimization loop, when skip_encode_sb
      feature is turned on. This provides about 1% speed-up at speed 0,
      and 1.5% speed-up at speed 1. No performance change in both settings.
      
      Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
      043e0f9d
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      
      This operation is currently turned on in settings of speed 1.
      
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
      faff6ed0
  31. 11 Jul, 2013 1 commit
  32. 08 Jul, 2013 1 commit