1. 17 Jul, 2013 6 commits
    • Ronald S. Bultje's avatar
      Best_rd breakout in rd partition search. · 9f427bfe
      Ronald S. Bultje authored
      About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
      goes from 1min36 to 1min24. Results become slightly better (+0.2% on
      derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
      super_block_yrd(). Overall speed change (on derfraw300) is roughly
      -13%. This can probably be improved further by caching best_yrd
      between partition searches. Also, we might be able to get more
      speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
      just at the sb8x8 level.
      Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
    • Ronald S. Bultje's avatar
      Do a skip-block check for sub8x8 partitions also. · 83c7e13a
      Ronald S. Bultje authored
      +0.2% SSIM and glbPSNR on derfraw300.
      Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      Further improvement will be done later.
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
    • Paul Wilkins's avatar
      Move uv intra mode selection in rd loop. · 2ee338ce
      Paul Wilkins authored
      Use an estimate based on DC_PRED for intra uv cost
      within the rd loop then only do a full uv mode analysis
      if an intra mode is chosen.
      Significant speed gains in some cases. Currently only
      enabled for speed 2 pending speed/quality tests.
      Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
    • Paul Wilkins's avatar
      Limit transform sizes searched for uv intra. · 6c667f0f
      Paul Wilkins authored
      Apply limit if search_method == USE_LARGESTALL
      to the range of UV tx sizes searched.
      Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
    • Jingning Han's avatar
      Skip redundant motion search in 4x4 level rd loop · a142d6fc
      Jingning Han authored
      This commit makes the encoder to perform motion search only once
      per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
      at 2000 kbps, the runtime goes from 253812ms -> 217817ms
      (14% speed-up) for speed 0.
      Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
  2. 16 Jul, 2013 16 commits
    • Dmitry Kovalev's avatar
      Removing MV_GROUP_UPDATE define and corresponding code. · 3997da0d
      Dmitry Kovalev authored
      Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
    • Dmitry Kovalev's avatar
      Cleaning up tile code. · 9482a0bf
      Dmitry Kovalev authored
      Removing tile_rows and tile_columns from VP9Common, removing redundant
      constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
      Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
    • Dmitry Kovalev's avatar
      Loop filter code cleanup. · 2de3c8d2
      Dmitry Kovalev authored
      Cosmetic code changes, renaming 'flat' local var to 'mask', removing
      unused field 'blim' from loopfilter_info_n and loop_filter_info structs.
      Change-Id: I51e6ccf727fe361ad9a08e29e1201aa7abd4987f
    • James Zern's avatar
      use consistent framerate naming · 9581eb6e
      James Zern authored
      Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
    • James Zern's avatar
      delete vp9_loopfilter_sse2.asm · 50015f6e
      James Zern authored
      sse2 functions are provided by vp9_loopfilter_intrin_sse2.c
      Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
    • James Zern's avatar
      vp9_loopfilter_intrin_sse2: cosmetics: fix indent · 8f4787a3
      James Zern authored
      Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
    • James Zern's avatar
      delete x86/vp9_loopfilter_x86.h · af582542
      James Zern authored
      also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h
      Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
    • Jingning Han's avatar
      SSE2 16x16 inverse ADST/DCT hybrid transform · d05f66aa
      Jingning Han authored
      This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
      hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
      This provides about 1% encoding speed-up at speed 0.
      Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
    • Ronald S. Bultje's avatar
      Replace generated quant tables with static lookup tables. · e965cccc
      Ronald S. Bultje authored
      This prevents possible float rounding issues between architectures.
      Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167
    • Dmitry Kovalev's avatar
      Moving vp9_kf_default_bmode_probs to vp9_entropymode.c. · baf0c959
      Dmitry Kovalev authored
      Removing vp9_modelcontext.c.
      Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
    • Dmitry Kovalev's avatar
      Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2
      Dmitry Kovalev authored
      Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
      with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
      mi_row and mi_col to functions explicitly instead of replying on
      mb_to_right_edge and mb_to_bottom_edge.
      Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
    • Paul Wilkins's avatar
      Minor cleanup in code to fine uv tx_size. · 30d2ea45
      Paul Wilkins authored
      Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
    • John Koleszar's avatar
      Fix above context pointers · 5efd9609
      John Koleszar authored
      In the prior code, the above context pointers used for entropy
      decoding were initialized on the first frame, and not updated when
      the frame size changed. The per-frame code which initializes the
      contexts assumes that the contexts are contiguous, leading to an
      incomplete initialization when the frame is smaller. This commit
      updates the pointers so that the context is contigous whenever
      the frame size changes.
      Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c
    • Yaowu Xu's avatar
      Change to extend full border only when needed · 5b915ebd
      Yaowu Xu authored
      This is a short term optimization till we work out a decoder
      implementation requiring no frame border extension.
      Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
    • Dmitry Kovalev's avatar
      Removing and moving around constant definitions. · ca75f125
      Dmitry Kovalev authored
      Removing unused and duplicated constants, moving them from *.h to *.c
      if possible.
      Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
    • Ronald S. Bultje's avatar
      Inline vp9_quantize() in xform_quant(). · 1ff94fea
      Ronald S. Bultje authored
      Cycle times:
      4x4:    151 to  131 cycles (15% faster)
      8x8:    334 to  306 cycles (9% faster)
      16x16: 1401 to 1368 cycles (2.5% faster)
      32x32: 7403 to 7367 cycles (0.5% faster)
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
      goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.
      Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
  3. 15 Jul, 2013 6 commits
    • Dmitry Kovalev's avatar
      Consistent naming for loop-filter filters. · e973b4e2
      Dmitry Kovalev authored
      Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to
      hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to
      Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1
    • Ronald S. Bultje's avatar
      Inline xform_quant() in encode_block_intra(). · 6fb41874
      Ronald S. Bultje authored
      Also inline some of the block calculations to assist the compiler to
      not do silly things like calculating the same offset (or converting
      between raster/transform block offset or block, mi and pixel unit)
      many, many, many times.
      Cycle times:
      4x4:     584 ->   505 cycles (16% faster)
      8x8:    1651 ->  1560 cycles (6% faster)
      16x16:  7897 ->  7704 cycles (2.5% faster)
      32x32: 16096 -> 15852 cycles (1.5% faster)
      Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
      first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.
      Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
    • Dmitry Kovalev's avatar
      Code cleanup inside vp9_decodeframe.c. · 2c317298
      Dmitry Kovalev authored
      Removing unused DEC_DEBUG define and dec_debug variable. Changing function
      signatures to eliminate code duplication, renaming function
      mb_init_dequantizer to init_dequantizer. Also removing redundant curly
      braces, and comments.
      Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
    • Frank Galligan's avatar
      Neon: Update mbfilter if all vectors follow one branch. · f4f60f60
      Frank Galligan authored
      Change the mbfilter Neon code from executing both branches if all
      vectors follow only one branch.
      The code is about 5% faster when executing only one branch and about
      1% slower when executing both branches.
      -PS5: Remove local stack space from mbfilter.
      Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
    • Jingning Han's avatar
      Skip inter-coded block reconstruction in rd loop · 043e0f9d
      Jingning Han authored
      Skip the inverse transform and reconstruction of inter-mode coded
      blocks in the rate-distortion optimization loop, when skip_encode_sb
      feature is turned on. This provides about 1% speed-up at speed 0,
      and 1.5% speed-up at speed 1. No performance change in both settings.
      Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
               performance loss.
      This operation is currently turned on in settings of speed 1.
      Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
  4. 14 Jul, 2013 6 commits
  5. 13 Jul, 2013 5 commits
  6. 12 Jul, 2013 1 commit