- 09 Oct, 2014 1 commit
-
-
Deb Mukherjee authored
Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
-
- 18 Sep, 2014 1 commit
-
-
Deb Mukherjee authored
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
-
- 11 Sep, 2014 1 commit
-
-
Johann authored
If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
-
- 23 May, 2014 2 commits
-
-
Yunqing Wang authored
In 8-tap filtering, to guarantee the intermediate results fit in 16 bits, the order of accumulating the products needs to be done correctly, and the largest product should be added last. This patch fixed the problem using the method in commit "Correct ssse3 8/16-pixel wide sub-pixel filter calculation". Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
-
Yaowu Xu authored
As mismatchs were found between the intrinsic version and c only. The commit temporarily revert to use the matching assembly version to allow further investigation. Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
-
- 14 Feb, 2014 1 commit
-
-
levytamar82 authored
Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization is done only for 64 bit Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
-
- 13 Feb, 2014 1 commit
-
-
levytamar82 authored
Two convolve functions were optimized for AVX2: 1. vp9_filter_block1d16_h8 2. vp9_filter_block1d16_v8 vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of loop strides by half, two strides were processed in parallel. vp9_filter_block1d16_v8 was also optimized in the same way also some of the loads were being done outside of the loop and by that preventing redundant loads. This Optimization gives 43% function level gain and 1.3% user level gain. Now can be compiled in Windows Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
-
- 10 Feb, 2014 1 commit
-
-
Tom Finegan authored
Update filter_1dfunction definition to match usage. Change-Id: Ie3cae13dc1ec3f5838c5f29d1c76a1a98a9217fa
-
- 04 Feb, 2014 1 commit
-
-
Yunqing Wang authored
This patch added ssse3 optimization of bilinear sub-pixel filters. The real time encoder was speeded up by ~1%. Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
-
- 03 Feb, 2014 1 commit
-
-
Yunqing Wang authored
Using bilinear filters could speed up the codec in real-time mode. This patch added sse2 optimizations of bilinear filters that operate on different-sized blocks. Tests showed that the real-time encoder was speeded up by 3%. Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
-
- 29 Jan, 2014 1 commit
-
-
Yunqing Wang authored
Added macros to reduce the code duplication. Change-Id: I1916aa5a386ea07d961d4ec439ab09bb8c45487d
-
- 17 Jan, 2014 1 commit
-
-
Yunqing Wang authored
This reverts commit f9404f24. This patch caused some ASAN error. Change-Id: If15b7e581310e19061d111c69f2931809662ed19
-
- 13 Jan, 2014 1 commit
-
-
Yunqing Wang authored
This reverts commit b6452571. Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
-
- 10 Jan, 2014 1 commit
-
-
Paul Wilkins authored
This reverts commit 511d218c. In current form intrinsics break borg build. Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
-
- 09 Jan, 2014 1 commit
-
-
levytamar82 authored
Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization done only for 64bit. Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
-
- 19 Dec, 2013 1 commit
-
-
Yunqing Wang authored
Removed unused filter coefficients. Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd
-
- 15 Oct, 2013 1 commit
-
-
Jingning Han authored
Change-Id: Iac55891ac9e6f13718c9f822aa099b5ca491832a
-
- 10 Oct, 2013 1 commit
-
-
Yunqing Wang authored
To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
-
- 13 Sep, 2013 1 commit
-
-
James Zern authored
This is incompatible with most toolchains other than gcc. Revert "Deleted #include <inttypes.h>" This reverts commit 4d018be9. This reverts commit d22a504d. Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
-
- 11 Sep, 2013 1 commit
-
-
Scott LaVarnway authored
Reformatted version of a patch submitted by Erik/Tamar from Intel. For the test clips used, the decoder performance improved by ~2%. Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
-
- 11 Jul, 2013 2 commits
-
-
Johann authored
Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
-
Ronald S. Bultje authored
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
-
- 19 Apr, 2013 1 commit
-
-
John Koleszar authored
The C code was being used as a fallback for the >16 case, but only for 2D. Change-Id: I1e2e6da9e4b28bd88bde9ba4dd32724ce466cf6f
-
- 18 Apr, 2013 1 commit
-
-
John Koleszar authored
Updates the common convoloution code to support blocks larger than 16x16, and rectangular blocks. This uncovered a bug in the SSSE3 filtering routines due to the order of application of saturation. This commit fixes that bug, adjusts the unit test to bias its random values towards the extremes, and adds a test to ensure that all filters conform to the expected pairwise addition structure. Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0
-
- 13 Feb, 2013 1 commit
-
-
Scott LaVarnway authored
Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
-
- 09 Feb, 2013 1 commit
-
-
Scott LaVarnway authored
A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters. (vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem. Also added ssse3 support for 4x4 subpixel filtering. Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
-
- 08 Feb, 2013 1 commit
-
-
John Koleszar authored
This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
-
- 05 Feb, 2013 1 commit
-
-
John Koleszar authored
Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
-
- 26 Dec, 2012 1 commit
-
-
John Koleszar authored
Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
-
- 21 Dec, 2012 2 commits
-
-
Scott LaVarnway authored
These filters will not work with VP9. Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85
-
Jim Bankoski authored
Change-Id: Ibc077cf1c1da0c86063f88c6d3073c6876989119
-
- 03 Dec, 2012 1 commit
-
-
Jim Bankoski authored
Change-Id: I467bf0fdf3b35326bcce58d5459e6d2dbfd6c5e5
-
- 29 Nov, 2012 1 commit
-
-
Jim Bankoski authored
Change-Id: I20c426e91ee49666db42e20eb074095ab6b8ec5d
-
- 27 Nov, 2012 1 commit
-
-
John Koleszar authored
Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
-
- 01 Nov, 2012 3 commits
-
-
Ronald S. Bultje authored
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
-
Ronald S. Bultje authored
Most of these were picked up by jenkins in the commit that changed the vp8 namespace to vp9 in common/. Change-Id: I5cbd56ffc753b92ef805133cda6acc1713a13878
-
Ronald S. Bultje authored
For non-static functions, change the prefix to vp9_. For static functions, remove the prefix. Also fix some comments, remove unused code or unused function prototypes. Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
-
- 31 Oct, 2012 2 commits
-
-
Ronald S. Bultje authored
For local symbols, make them static instead. Change-Id: I13d60947a46f711bc8991e16100cea2a13e3a22e
-
Ronald S. Bultje authored
Change-Id: Ic5a5f60e1ff9d9ccae4174160d36529466eeb509
-
- 26 Oct, 2012 1 commit
-
-
Scott LaVarnway authored
Quickly modified the ssse3 sixtap filters to support eight taps. For the test clip used, a 23+% boost in decoder performance was seen. We can revisit later and improve further. Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
-