- 20 Nov, 2014 1 commit
-
-
Peter de Rivaz authored
Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
-
- 14 Nov, 2014 1 commit
-
-
Peter de Rivaz authored
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc) (cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)
-
- 12 Nov, 2014 1 commit
-
-
Peter de Rivaz authored
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)
-
- 19 Oct, 2014 1 commit
-
-
levytamar82 authored
All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
-
- 14 Oct, 2014 1 commit
-
-
Alex Converse authored
This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
-
- 07 Oct, 2014 1 commit
-
-
Jim Bankoski authored
The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
-
- 06 Oct, 2014 1 commit
-
-
JackyChen authored
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
-
- 06 Sep, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0
-
- 02 Sep, 2014 1 commit
-
-
Dmitry Kovalev authored
Removed functions: * vp9_sad_16x16_mmx * vp9_sad_8x16_mmx * vp9_sad_16x8_mmx * vp9_sad_8x8_mmx * vp9_sad_4x4_mmx Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3
-
- 29 Aug, 2014 1 commit
-
-
Dmitry Kovalev authored
Removed functions: * vp9_mse16x16_mmx * vp9_get_mb_ss_mmx * vp9_get4x4var_mmx * vp9_get8x8var_mmx * vp9_variance4x4_mmx * vp9_variance8x8_mmx * vp9_variance16x16_mmx * vp9_variance16x8_mmx * vp9_variance8x16_mmx They all have SSE2 equivalent. Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615
-
- 31 Jul, 2014 1 commit
-
-
Scott LaVarnway authored
On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.2% Change-Id: I8862497264142171b7efc32df1a67714a23539f4
-
- 30 Jul, 2014 2 commits
-
-
Scott LaVarnway authored
On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~12.4% Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7
-
Scott LaVarnway authored
vp9_variance16x16(), and vp9_get16x16var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~16.7%. Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
-
- 29 Jul, 2014 1 commit
-
-
Scott LaVarnway authored
On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.7%. Change-Id: I428c72c40df82c6d537955e320a8debf99343004
-
- 24 Jul, 2014 1 commit
-
-
Tim Kopp authored
This should prevent confusion with the VP8 CONFIG_TEMPORAL_DENOISING and other flags. Change-Id: I1fe4e2977895b7966841d861ab74317ad875b6c8
-
- 16 Jul, 2014 1 commit
-
-
Scott LaVarnway authored
and vp9_sad16x16_neon() On a Nexus 7, vpxenc (in realtime mode, speed -6) reported a performance improvement of ~17%. Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85
-
- 02 Jul, 2014 1 commit
-
-
Alex Converse authored
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all other rd related routines. Anything used outside of making an rd optimal decision belongs in rd. Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
-
- 25 Jun, 2014 1 commit
-
-
James Zern authored
same reasoning as: 9f3a0dbb vp9_rtcd: correct avx2 references these are all intrinsics, so don't depend on x86inc.asm Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827
-
- 12 Jun, 2014 1 commit
-
-
Tim Kopp authored
Change-Id: Iccf6ede4c4f85646b0f8daec47050ce93e267c90
-
- 21 May, 2014 1 commit
-
-
Deb Mukherjee authored
Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
-
- 14 May, 2014 1 commit
-
-
levytamar82 authored
vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
-
- 08 May, 2014 1 commit
-
-
Alex Converse authored
Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
-
- 07 May, 2014 1 commit
-
-
Paul Wilkins authored
Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit 89fbf3de. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
-
- 05 May, 2014 1 commit
-
-
Alex Converse authored
7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
-
- 30 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Corresponding C functions were removed in I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3 Change-Id: I50a5575065a7a9e41904eb2161afd739def927db
-
- 29 Apr, 2014 1 commit
-
-
Jingning Han authored
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
-
- 22 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Actual renames: vp9_onyx_if.c -> vp9_encoder.c vp9_onyx_int.h -> vp9_encoder.h Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c
-
- 17 Apr, 2014 1 commit
-
-
Jim Bankoski authored
This patch sets up a quad_tree structure (pc_tree) for holding all of pick_mode_context data we use at any square block size during encoding or picking modes. That includes contexts for 2 horizontal and 2 vertical splits, one none, and pointers to 4 sub pc_tree nodes corresponding to split. It also includes a pointer to the current chosen partitioning. This replaces code that held an index for every level in the pick modes array including: sb_index, mb_index, b_index, ab_index. These were used as stateful indexes that pointed to the current pick mode contexts you had at each level stored in the following arrays array ab4x4_context[][][], sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][], sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][], sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[], sb64_context and the partitioning that had been stored in the following: b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning. Prior to this patch before doing an encode you had to set the appropriate index for your block size ( switch statement), update it ( up to 3 lookups for the index array value) and then make your call into a recursive function at which point you'd have to call get_context which then had to do a switch statement based on the blocksize, and then up to 3 lookups based upon the block size to find the context to use. With the new code the context for the block size is passed around directly avoiding the extraneous switch statements and multi dimensional array look ups that were listed above. At any level in the search all of the contexts are local to the pc_tree you are working on (in?). In addition in most places code that used to call sub functions and then check if the block size was 4x4 and index was > 0 and return now don't preferring instead to call the right none function on the inside. Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
-
- 14 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
We don't use declarations from this file. The real declarations (differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad. Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
-
- 08 Apr, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Ib3b3864a6018c62ac1ea18e30795af74464596cd
-
- 28 Mar, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I7d9874da8ff78a2d7e0cf11073af9c30538bc9a6
-
- 27 Mar, 2014 1 commit
-
-
Marco Paniconi authored
Change-Id: Iffa45b9b04196c1ded6037622a8644a2500a62de
-
- 24 Mar, 2014 1 commit
-
-
Jim Bankoski authored
Change-Id: I12c29a630da1fbc5508f11b61d182f9b527b3a35
-
- 21 Mar, 2014 2 commits
-
-
Marco Paniconi authored
Change-Id: Id76a628495c822e23825b66a7589b4a3279680e2
-
levytamar82 authored
2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
-
- 18 Mar, 2014 1 commit
-
-
Marco Paniconi authored
Activated using aq_mode=3. Change-Id: Ied628b9e7bd0e88b0c75790276bca75b19eb5c07
-
- 13 Mar, 2014 1 commit
-
-
Marco Paniconi authored
Change-Id: Ie47c139d48cb18409d71f98f6a5b9eeb9f9437a9
-
- 05 Mar, 2014 1 commit
-
-
Dmitry Kovalev authored
Change-Id: If90c1bc822873156d4e38fca1938e4907f6c95f0
-
- 27 Feb, 2014 1 commit
-
-
Dmitry Kovalev authored
Removing all copies of identical vp8_mse2psnr/vp9_mse2psnr functions. Using vpx_sse_to_psnr() instead in all places. Change-Id: I15beef9834d43d8fc8a8a7a2d1fc5de3d658fed8
-
- 14 Feb, 2014 1 commit
-
-
levytamar82 authored
Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
-