- May 24, 2013
-
-
Jean-Marc Valin authored
Also creates a new hardcoded 5th order fir.
-
Ralph Giles authored
-
Ralph Giles authored
-
Ralph Giles authored
If there's no complete author tag, we need to add an opening quote character manually. See the EBU entry.
-
Ralph Giles authored
-
Ralph Giles authored
-
Ralph Giles authored
The normative reference is now the channel configurations give directly in the draft.
-
Ralph Giles authored
Previous markup was invalid.
-
Ralph Giles authored
-
Ralph Giles authored
I've done some editing for clarity, but more needs to be done. The language needs clean-up, we should forward-reference the LPC Extrapolation section, and we need a reference for actually computing linear prediction coefficients.
-
Ralph Giles authored
More recent versions of draft-sheffer-running-code suggest referring to a wiki. We'd like to try maintaining the implementation status separately.
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
-
Jean-Marc Valin authored
About 30% faster on x86.
-
Timothy B. Terriberry authored
Introduced in c152d602. Thanks to Pedro Becerra for the report.
-
- May 23, 2013
-
-
Timothy B. Terriberry authored
-
Timothy B. Terriberry authored
Introduced in c152d602. Thanks to Pedro Becerra for the report.
-
Timothy B. Terriberry authored
Thanks to John Ridges for the report.
-
- May 22, 2013
-
-
Timothy B. Terriberry authored
Define ARMv4_ASM to 1 like the other ARM defines.
-
Timothy B. Terriberry authored
Remove a redundant include and some dead stores. Patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>.
-
Timothy B. Terriberry authored
Measures a 0.1% speedup on 96 kbps stereo encode+decode on a Cortex A8.
-
Reorder register usage to take advantage of early termination on multiplications and reorder a load instruction to hide its latency on ARM9. Speeds up decoding of a 64 kbps test file by 0.1MHz on an ARM7TDMI and 0.2MHz on an ARM9TDMI. Signed-off-by:
Timothy B. Terriberry <tterribe@xiph.org>
-
Uses a C implementation with a 32*32 => 64 multiplication, which ARM has. Speeds up decoding of a 64 kbps test file by 0.5MHz on an ARM7TDMI and 1.0MHz on an ARM9TDMI. 0.2% speedup on a 96 kbps enc+dec test on a Cortex A8. Signed-off-by:
Timothy B. Terriberry <tterribe@xiph.org>
-
Timothy B. Terriberry authored
This splits out the non-arch-specific portions of a patch written by Aurélien Zanelli <aurelien.zanelli@parrot.com http://lists.xiph.org/pipermail/opus/2013-May/002088.html I also added support for odd n, for custom modes. 0.25% speedup on 96 kbps stereo encode+decode on a Cortex A8.
-
Timothy B. Terriberry authored
Missed the armv5e extension on a couple of functions.
-
Timothy B. Terriberry authored
58.4% speedup (2.4x faster) on test_unit_cwrs32 (no custom modes). Gives a 3.2% speedup on ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw /dev/null on a 600 MHz Cortex A8.
-
- May 21, 2013
-
-
Timothy B. Terriberry authored
Otherwise make dist does not include these files in the source tarball.
-
Timothy B. Terriberry authored
-
Timothy B. Terriberry authored
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html says that "Rd cannot be the same as Rm." http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html says that "RdLo, RdHi, and Rm must all be different registers." This means that some of the early clobbers I removed really should have been there (to prevent aliasing Rd, RdLo, or RdHi with Rm). It also means that we should reverse some of the operands in the FFT's complex multiplies. This should only affect the ARMv4 optimizations. Thanks to Nils Wallménius for the report. While we're here, audit the commutative pair flags again, since I screwed up at least one of them, and eliminate some dead code.
-
- May 20, 2013
-
-
Timothy B. Terriberry authored
-
Ron authored
-
Ron authored
Needed by commit 972a34ec. Use autoreconf in autogen.sh instead of the handwritten version, it's simpler, and also updates things that we weren't handling. Drop the hand-written INSTALL file. Its information content was ~zero, and autotools wants to overwrite it with its own version, so don't fight that, just .gitignore it.
-
Timothy B. Terriberry authored
In most cases these will use __builtin_clz(). In a follow-up, we should audit usage of silk_CLZ32() and convert the places where its argument must be non-zero to use EC_ILOG() directly to avoid the test for zero (which is necessary on x86).
-
Timothy B. Terriberry authored
Since the last patch originally had them mangled (presumably by mailer, http server, or something else), let's just get rid of them.
-
Timothy B. Terriberry authored
Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>: http://lists.xiph.org/pipermail/opus/2013-May/002078.html Revised version: - Add autconf detection (ported from libtheora). - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!). - Use actual macros so they can still be selectively overridden. - Split out ARMv4 parts and add a few more ARMv4 macros. - Label blocks to make them easy to find in generated assembly. - Fix MULT16_32_Q15() so we can pass make check. The MDCT test passes in values larger than 2**30 for b. The new version should be just as fast (or faster, since it's easier to merge the shift with following instructions), and there's no appreciable impact on accuracy (FFT/MDCT SNR actually goes up in most cases). - Fix register constraints. We were using early-clobber flags in a bunch of places that didn't need them, and commutative-pair flags in a bunch of places that weren't actually commutative. This was Jean-Marc's fault (the original code came from Speex). - Simplify silk_CLZ16(). - Port over iFFT C_MULC asm by Andree Buschmann <AndreeBuschmann@t-online.de> from Rockbox. - Speed up the C_MULC asm by using LDRD, allowing more flexible addressing, re-ordering instructions to avoid some stalls, allowing more flexible register allocation, and getting things out of the inline asm block so the compiler can schedule them better. - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the new C_MULC. In total, this patch gives a 22.3% speed-up on test_opus_encoder on a 600 MHz Cortex A8 using gcc 4.2.1, When restricted to ARMv4 optimizations, it gives a 9.6% speed-up on the same processor/compiler. On the conformance test vectors: Average mono quality is 97.0583 % Average stereo quality is 97.775 %
-
- May 19, 2013
-
-
Jean-Marc Valin authored
-
- May 18, 2013
-
-
Jean-Marc Valin authored
-
Ron authored
We shouldn't ever have any trailing newlines that need trimming here, and the _s version wasn't added to m4sugar.m4 until autoconf 2.63b, so this will let it work with 2.13 again.
-
Jean-Marc Valin authored
A fixed shift factor was insufficient to properly estimate the decay factor, resulting in extreme attenuation of the PLC excitation.
-