- 16 Jul, 2013 35 commits
-
-
Dmitry Kovalev authored
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
-
James Zern authored
* changes: delete vp9_loopfilter_sse2.asm vp9_loopfilter_intrin_sse2: cosmetics: fix indent delete x86/vp9_loopfilter_x86.h vp9_loopfilter_intrin_sse2: make some funcs static vp9_loopfilter_intrin_sse2: remove unused uv funcs vp9_loopfilter: remove uv function typedef filter_block_plane: reuse some constants vp9_loopfilter.c: make some functions static
-
James Zern authored
-
James Zern authored
s/frame_rate/framerate/g Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
-
Jingning Han authored
-
Dmitry Kovalev authored
-
Dmitry Kovalev authored
-
James Zern authored
sse2 functions are provided by vp9_loopfilter_intrin_sse2.c Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
-
James Zern authored
Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
-
James Zern authored
also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
-
James Zern authored
-
James Zern authored
-
Jingning Han authored
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles. This provides about 1% encoding speed-up at speed 0. Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
-
James Zern authored
-
James Zern authored
-
James Zern authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
This prevents possible float rounding issues between architectures. Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167
-
John Koleszar authored
-
Jingning Han authored
-
Dmitry Kovalev authored
Removing vp9_modelcontext.c. Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
-
Dmitry Kovalev authored
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent with vp9_get_segment_id without using confusing sub(a, b) macro. Passing mi_row and mi_col to functions explicitly instead of replying on mb_to_right_edge and mb_to_bottom_edge. Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
-
John Koleszar authored
In the prior code, the above context pointers used for entropy decoding were initialized on the first frame, and not updated when the frame size changed. The per-frame code which initializes the contexts assumes that the contexts are contiguous, leading to an incomplete initialization when the frame is smaller. This commit updates the pointers so that the context is contigous whenever the frame size changes. Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c
-
Johann authored
-
Jingning Han authored
-
Dmitry Kovalev authored
-
Yaowu Xu authored
-
Yaowu Xu authored
This is a short term optimization till we work out a decoder implementation requiring no frame border extension. Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
-
Dmitry Kovalev authored
Removing unused and duplicated constants, moving them from *.h to *.c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
-
Dmitry Kovalev authored
-
Johann authored
-
Ronald S. Bultje authored
This is required because upon downscaling, if a motion vector points partially into the UMV (e.g. all minus 1 of 64+7 pixels, i.e. 70), then we can point up to 140 pixels into the larger-resolution (2x) reference buffer UMV, which means the UMV for reference buffers in downscaling needs to be 140 rounded up to the nearest multiple of 32, i.e. 160. Longer-term, we should probably handle the UMV differently by detecting edge coverage on-the-fly and using a temporary buffer for edge extensions instead of adding 160 pixels on all sides of the image (which means a CIF image uses 3x its own area size for borders). Change-Id: I5184443e6731cd6721fc6a5d430a53e7d91b4f7e
-
Ronald S. Bultje authored
Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
-
Ronald S. Bultje authored
-
Frank Galligan authored
-
- 15 Jul, 2013 5 commits
-
-
Dmitry Kovalev authored
Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to filter16. Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1
-
Ronald S. Bultje authored
Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
-
Dmitry Kovalev authored
Removing unused DEC_DEBUG define and dec_debug variable. Changing function signatures to eliminate code duplication, renaming function mb_init_dequantizer to init_dequantizer. Also removing redundant curly braces, and comments. Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
-
Frank Galligan authored
Change the mbfilter Neon code from executing both branches if all vectors follow only one branch. The code is about 5% faster when executing only one branch and about 1% slower when executing both branches. -PS5: Remove local stack space from mbfilter. Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
-
Jingning Han authored
Make the codes consistent with conventions. Change-Id: Id044ed8382f83a3c3f54f9edd569f00bcd0523db
-