- Oct 25, 2010
-
-
Johann Koenig authored
clean up compiler warnings, man in the yellow hat warnings, and start to remove unused #includes Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6
-
Johann Koenig authored
there were four versions for the regular and macroblock loopfilters: horizontal [y|uv] vertical [y|uv] this moves all the common code into 2 functions: vp8_loop_filter_neon vp8_mbloop_filter_neon this provides no gain in performance. there's a bit of jitter, but it trends down ~0.25-0.5%. however, this is a huge gain maintenance. also, there is the potential to drop some stack usage in the macroblock loopfilter. Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
-
Timothy B. Terriberry authored
The primary goal is to allow a binary to be built which supports NEON, but can fall back to non-NEON routines, since some Android devices do not have NEON, even if they are otherwise ARMv7 (e.g., Tegra). The configure-generated flags HAVE_ARMV7, etc., are used to decide which versions of each function to build, and when CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen at run time. In order for this to work, the CFLAGS must be set to something appropriate (e.g., without -mfpu=neon for ARMv7, and with appropriate -march and -mcpu for even earlier configurations), or the native C code will not be able to run. The ASFLAGS must remain set for the most advanced instruction set required at build time, since the ARM assembler will refuse to emit them otherwise. I have not attempted to make any changes to configure to do this automatically. Doing so will probably require the addition of new configure options. Many of the hooks for RTCD on ARM were already there, but a lot of the code had bit-rotted, and a good deal of the ARM-specific code is not integrated into the RTCD structs at all. I did not try to resolve the latter, merely to add the minimal amount of protection around them to allow RTCD to work. Those functions that were called based on an ifdef at the calling site were expanded to check the RTCD flags at that site, but they should be added to an RTCD struct somewhere in the future. The functions invoked with global function pointers still are, but these should be moved into an RTCD struct for thread safety (I believe every platform currently supported has atomic pointer stores, but this is not guaranteed). The encoder's boolhuff functions did not even have _c and armv7 suffixes, and the correct version was resolved at link time. The token packing functions did have appropriate suffixes, but the version was selected with a define, with no associated RTCD struct. However, for both of these, the only armv7 instruction they actually used was rbit, and this was completely superfluous, so I reworked them to avoid it. The only non-ARMv4 instruction remaining in them is clz, which is ARMv5 (not even ARMv5TE is required). Considering that there are no ARM-specific configs which are not at least ARMv5TE, I did not try to detect these at runtime, and simply enable them for ARMv5 and above. Finally, the NEON register saving code was completely non-reentrant, since it saved the registers to a global, static variable. I moved the storage for this onto the stack. A single binary built with this code was tested on an ARM11 (ARMv6) and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder, and produced identical output, while using the correct accelerated functions on each. I did not test on any earlier processors. Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
-
Johann Koenig authored
onyx_if is getting pretty big. split out the temporal code to make it easier to look at. Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
-
- Oct 22, 2010
-
-
John Koleszar authored
Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
-
Timothy B. Terriberry authored
The code was not checking for frame sizes smaller than 3 bytes, and the partition size checks might have failed if the input buffer was within 16MB of the top of the heap. In addition, the reference count on the current frame buffer was not being decremented on error, so after a small number of errors, no new frame buffer could be found and it would run off the list of them. Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
-
Timothy B. Terriberry authored
Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
-
- Oct 21, 2010
-
-
Frank Galligan authored
Change the pts of the altref frame to be as close as possible to the pts of the preceding frame and still be strictly increasing. Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7
-
John Koleszar authored
-
John Koleszar authored
The first implementation of the firstpass motion map for motion compensated temporal filtering created a file, fpmotionmap.stt, in the current working directory. This was not safe for multiple encoder instances. This patch merges this data into the first pass stats packet interface, so that it is handled like the other (numerical) firstpass stats. The new stats packet is defined as follows: Numerical Stats (16 doubles) -- 128 bytes Motion Map -- 1 byte / Macroblock Padding -- to align packet to 8 bytes The fpmotionmap.stt file can still be generated for debugging purposes in the same way that the textual version of the stats are available (defining OUTPUT_FPF in firstpass.c) Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
-
Yunqing Wang authored
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
-
Yunqing Wang authored
-
Yunqing Wang authored
-
Yunqing Wang authored
This rewriting reflects changes made in commit "Improve the accuracy of forward walsh-hadamard transform". Since this function is not called much, only a small encoder performance gain (~0.5% ) is seen. Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
-
- Oct 20, 2010
-
-
John Koleszar authored
-
Frank Galligan authored
Change-Id: I8eb49c56f7509f0a8074d440e8345b9e3344b85b
-
- Oct 19, 2010
- Oct 18, 2010
-
-
Yunqing Wang authored
Instead of doing 8-bit data unpack and 16-bit subtraction, use psubb to do 16 8-bit subtractions and pcmpgtb to preserve the sign information. This does not bring noticable gain since these functions are not called frequently. Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
-
Johann Koenig authored
generic version got fixed, but not the arm version. fixes: vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3': vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing argument 5 of 'fn_ptr->sdx3f' differ in signedness vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but argument is of type 'int *' and another unsigned change to keep the files similar Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
-
- Oct 15, 2010
-
-
Johann Koenig authored
vp8_diamond_search_sadx4 isn't used in arm because there is no corrosponding sdx4df as in x86. rather than keep it in sync with ../mcomp.c, delete it vp8_hex_search had the original, more readable/understandable code if`d out. it's also available in ../mcomp.c, so remove the dead copy Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
-
Yaowu Xu authored
when a subsequent frame is encoded as an alt reference frame, it is unlikely that any mb in current frame will be used as reference for future frames, so we can enable quantization optimization even when the RD constant is slightly rate-biased. The change has an overall benefit between 0.1% to 0.2% bit savings on the test sets based on vpxssim scores. Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
-
- Oct 14, 2010
-
-
Yunqing Wang authored
-
Yunqing Wang authored
../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’: ../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’ Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f
-
Yunqing Wang authored
-
Yunqing Wang authored
In order to know if all 4/8 neighbor points are within the bounds, 4 bounds checking are enough instead of checking 4 bounds for each points (16/32 checkings). This improvement reduces cost of vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5% performance gain (test options: 1 pass, good, speed=4). Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
-
Fritz Koenig authored
Typo had function defined as _ssse2 and prototyped as _sse2. Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
-
- Oct 13, 2010
-
-
Fritz Koenig authored
QWORD was being undefined because it was being used incorrectly. Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
-
Fritz Koenig authored
Use cpuid to check the vendor string against known architectures. Change-Id: I3fbd7f73638d71857a0c4a44a6275eb295fb4cef
-
- Oct 12, 2010
-
-
Fritz Koenig authored
=r was not restrictive enough and the compiler was not returning ebx correctly. Change-Id: I7606e384067bd5fb69189802f1ff64ccc5aa02d6
-
John Koleszar authored
This patch moves the scattered updates to the mb skip state (mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent changes to the quantizer exposed a bug where if a macroblock could be coded as a skip but isn't, the encoder would run the loopfilter but the decoder wouldn't, causing a reference buffer mismatch. The loopfilter is controlled by a flag called dc_diff. The decoder looks at the number of decoded coefficients when setting this flag. The encoder sets this flag based on the skip state, since any skippable macroblock should be transmitted as a skip. The coefficient optimization pass (vp8_optimize_b()) could change the coefficients such that a block that was not a skip becomes one. The encoder was not updating the skip state in this situation for intra coded blocks. The underlying issue predates it, but this bug was recently triggered by enabling trellis quantization on the Y2 block in commit dcd29e36, and by changing the quantizer range control in commit 305be4e4. Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
-
John Koleszar authored
-
Timothy B. Terriberry authored
These functions should never change their input, and there's no reason not to declare that. This allows them to be passed static const data. Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
-
John Koleszar authored
-
John Koleszar authored
-
- Oct 11, 2010
-
-
Timothy B. Terriberry authored
There is currently no inexact version of this function, so do not even compile it without EXACT_QUANT. This will prevent someone from inadvertently trying to use it without the proper EXACT_QUANT setup. Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
-
Timothy B. Terriberry authored
This is just eliminating some cruft. Although a number of variables are declared only when INTRARDOPT is defined, they are used elsewhere without that protection, and no longer just for intra RDO. The intra_rd_opt flag was hard-coded to 1 and never checked. Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
-
Scott LaVarnway authored
-
John Koleszar authored
-
John Koleszar authored
An earlier automatic transform changed eg '\nOptions' to '\n_options' which is incorrect in these printfs. Fix these. Change-Id: I7e0f37931ef82b79fadddd7058ce0df5572e2ca1
-