Commits · fbf99981a6a5acdb032f42d6377ca5b5dff19a20 · Alexander Traud / Opus

May 24, 2013
- Merges the 4th order FIR with the first order FIR in pitch_downsample() · fbf99981
  Jean-Marc Valin authored 11 years ago
  
  Also creates a new hardcoded 5th order fir.
  fbf99981
- Try to clarify that opus maps to flac/wav but wav doesn't map to opus. · 1b0552bf
  Ralph Giles authored 11 years ago
  
  1b0552bf
- Reference before period. · bd5cfda8
  Ralph Giles authored 11 years ago
  
  bd5cfda8
- Hack quoting of hanning article. · 4a0bf960
  Ralph Giles authored 11 years ago
  
  If there's no complete author tag, we need to add an opening quote character manually. See the EBU entry.
  4a0bf960
- Wrap lookahead code example in a figure. · b243dca3
  Ralph Giles authored 11 years ago
  
  b243dca3
- Add a wikipedia reference for the Hanning window. · 9e85220f
  Ralph Giles authored 11 years ago
  
  9e85220f
- Move the vorbis channel mapping to informative references. · 6bdbd26c
  Ralph Giles authored 11 years ago
  
  The normative reference is now the channel configurations give directly in the draft.
  6bdbd26c
- Fix Ogg draft formatting. · 7918ac13
  Ralph Giles authored 11 years ago
  
  Previous markup was invalid.
  7918ac13
- Remove an unnecessary comma. · 5b6fe646
  Ralph Giles authored 11 years ago
  
  5b6fe646
- Merge JM's encoder suggestions. · 2ad6eafc
  Ralph Giles authored 11 years ago
  
  I've done some editing for clarity, but more needs to be done. The language needs clean-up, we should forward-reference the LPC Extrapolation section, and we need a reference for actually computing linear prediction coefficients.
  2ad6eafc
- Bump Ogg draft version and date. · 25ffd5cd
  Ralph Giles authored 11 years ago
  
  View commits for tag draft-ietf-codec-oggopus-01 draft-ietf-codec-oggopus-01
  
  25ffd5cd
- Move implementation status details to wiki.xiph.org. · dfda81eb
  Ralph Giles authored 11 years ago
  
  More recent versions of draft-sheffer-running-code suggest referring to a wiki. We'd like to try maintaining the implementation status separately.
  dfda81eb
- Make pitch_xcorr() work when len and max_pitch aren't multiples of 4. · 85a6618a
  Jean-Marc Valin authored 11 years ago
  
  85a6618a
- oops, removed a minus sign that should never have appeared · 088929d1
  Jean-Marc Valin authored 11 years ago
  
  088929d1
- Unrolled version of the pitch correlation · 559fbe8b
  Jean-Marc Valin authored 11 years ago
  
  About 30% faster on x86.
  559fbe8b
- Move misplaced RESTORE_STACK. · e3ad4ea1
  Timothy B. Terriberry authored 11 years ago
  
  Introduced in c152d602. Thanks to Pedro Becerra for the report.
  e3ad4ea1
May 23, 2013
- Make dump_modes compile again. · d19fb79b
  Timothy B. Terriberry authored 11 years ago
  
  d19fb79b
- Move misplaced RESTORE_STACK. · 7e783b14
  Timothy B. Terriberry authored 11 years ago
  
  Introduced in c152d602. Thanks to Pedro Becerra for the report.
  7e783b14
- Remove an unused variable added in 85ede2c6 . · 7c74bc39
  Timothy B. Terriberry authored 11 years ago
  
  Thanks to John Ridges for the report.
  7c74bc39
May 22, 2013

Minor configure adjustment. · 33511f74
Timothy B. Terriberry authored 11 years ago
```
Define ARMv4_ASM to 1 like the other ARM defines.
```
33511f74

Timothy B. Terriberry authored 11 years ago

Remove a redundant include and some dead stores.

Patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>.

cc6e26a2

Port 1ed17cc2 to C_MUL and C_MUL4. · cd3850c1
Timothy B. Terriberry authored 11 years ago
```
Measures a 0.1% speedup on 96 kbps stereo encode+decode on a
 Cortex A8.
```
cd3850c1

Slightly faster C_MULC for ARMv4. · 7cb54537

Nils Wallménius authored 11 years ago and

Timothy B. Terriberry committed 11 years ago


Reorder register usage to take advantage of early termination on
 multiplications and reorder a load instruction to hide its
 latency on ARM9.
Speeds up decoding of a 64 kbps test file by 0.1MHz on an ARM7TDMI
 and 0.2MHz on an ARM9TDMI.

Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>

7cb54537

Faster MULT32_32_Q31 for ARM. · 70485d89

Nils Wallménius authored 11 years ago and

Timothy B. Terriberry committed 11 years ago


Uses a C implementation with a 32*32 => 64 multiplication, which
 ARM has.
Speeds up decoding of a 64 kbps test file by 0.5MHz on an ARM7TDMI
 and 1.0MHz on an ARM9TDMI.
0.2% speedup on a 96 kbps enc+dec test on a Cortex A8.

Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>

70485d89

Use more MAC16_16's and unroll a loop. · 85ede2c6

Timothy B. Terriberry authored 11 years ago

This splits out the non-arch-specific portions of a patch written
 by Aurélien Zanelli <aurelien.zanelli@parrot.com
 http://lists.xiph.org/pipermail/opus/2013-May/002088.html

I also added support for odd n, for custom modes.

0.25% speedup on 96 kbps stereo encode+decode on a Cortex A8.

85ede2c6

Minor ARMv5E cleanups. · 2040606f
Timothy B. Terriberry authored 11 years ago
```
Missed the armv5e extension on a couple of functions.
```
2040606f

Use a table for PVQ encoding/decoding. · 006273c5

Timothy B. Terriberry authored 11 years ago

58.4% speedup (2.4x faster) on test_unit_cwrs32 (no custom modes).
Gives a 3.2% speedup on
 ./opus_demo restricted-lowdelay 48000 2 96000 comp48-stereo.sw /dev/null
 on a 600 MHz Cortex A8.

006273c5

May 21, 2013

Add new ARM headers to top-level file lists. · 9d056284
Timothy B. Terriberry authored 11 years ago
```
Otherwise make dist does not include these files in the source
 tarball.
```
9d056284
Move ARM asm into its own directories. · e095c3eb
Timothy B. Terriberry authored 11 years ago

e095c3eb

Clean up register constraints. · b518b56f

Timothy B. Terriberry authored 11 years ago

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html
 says that "Rd cannot be the same as Rm."
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html
 says that "RdLo, RdHi, and Rm must all be different registers."
This means that some of the early clobbers I removed really should
 have been there (to prevent aliasing Rd, RdLo, or RdHi with Rm).
It also means that we should reverse some of the operands in the
 FFT's complex multiplies.
This should only affect the ARMv4 optimizations.

Thanks to Nils Wallménius for the report.

While we're here, audit the commutative pair flags again, since I
 screwed up at least one of them, and eliminate some dead code.

b518b56f

May 20, 2013

Fix bustage in a16cef62 . · 9880c4cd
Timothy B. Terriberry authored 11 years ago

9880c4cd
Make autogen.sh cut and paste proof · 41ce6e35
Ron authored 11 years ago

41ce6e35

Add support for autoconf macros in m4/ · 50b395bf

Ron authored 11 years ago

Needed by commit 972a34ec.

Use autoreconf in autogen.sh instead of the handwritten version,
it's simpler, and also updates things that we weren't handling.

Drop the hand-written INSTALL file.  Its information content was
~zero, and autotools wants to overwrite it with its own version,
so don't fight that, just .gitignore it.

50b395bf

Replace silk_CLZ functions with EC_ILOG(). · a16cef62

Timothy B. Terriberry authored 11 years ago

In most cases these will use __builtin_clz().
In a follow-up, we should audit usage of silk_CLZ32() and convert
 the places where its argument must be non-zero to use EC_ILOG()
 directly to avoid the test for zero (which is necessary on x86).

a16cef62

Convert quotes in license headers to ASCII. · 80ad3837

Timothy B. Terriberry authored 11 years ago

Since the last patch originally had them mangled (presumably by
 mailer, http server, or something else), let's just get rid of
 them.

80ad3837

Add ARMv4/ARMv5E macros. · 972a34ec

Timothy B. Terriberry authored 11 years ago

Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
 http://lists.xiph.org/pipermail/opus/2013-May/002078.html

Revised version:
- Add autconf detection (ported from libtheora).
- Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
- Use actual macros so they can still be selectively overridden.
- Split out ARMv4 parts and add a few more ARMv4 macros.
- Label blocks to make them easy to find in generated assembly.
- Fix MULT16_32_Q15() so we can pass make check.
  The MDCT test passes in values larger than 2**30 for b.
  The new version should be just as fast (or faster, since it's
   easier to merge the shift with following instructions), and
   there's no appreciable impact on accuracy (FFT/MDCT SNR actually
   goes up in most cases).
- Fix register constraints.
  We were using early-clobber flags in a bunch of places that
   didn't need them, and commutative-pair flags in a bunch of
   places that weren't actually commutative.
  This was Jean-Marc's fault (the original code came from Speex).
- Simplify silk_CLZ16().
- Port over iFFT C_MULC asm by Andree Buschmann
   <AndreeBuschmann@t-online.de> from Rockbox.
- Speed up the C_MULC asm by using LDRD, allowing more flexible
   addressing, re-ordering instructions to avoid some stalls,
   allowing more flexible register allocation, and getting things
   out of the inline asm block so the compiler can schedule them
   better.
- Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
   new C_MULC.

In total, this patch gives a 22.3% speed-up on test_opus_encoder on
 a 600 MHz Cortex A8 using gcc 4.2.1,
When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
 on the same processor/compiler.
On the conformance test vectors:
 Average mono quality is 97.0583 %
 Average stereo quality is 97.775 %

972a34ec

May 19, 2013
- celt_maxabs16() now returns an opus_val32 to avoid problems with -32768 · b7bd4c20
  Jean-Marc Valin authored 11 years ago
  
  b7bd4c20
May 18, 2013

Change few remaining instances of short to opus_int16 · 35930698
Jean-Marc Valin authored 11 years ago

35930698

Use m4_esyscmd instead of m4_esyscmd_s · 918acd15

Ron authored 11 years ago

We shouldn't ever have any trailing newlines that need trimming here,
and the _s version wasn't added to m4sugar.m4 until autoconf 2.63b,
so this will let it work with 2.13 again.

918acd15

Fixes fixed-point PLC issue reported in trac ticket #1954 · efdd3143

Jean-Marc Valin authored 11 years ago

A fixed shift factor was insufficient to properly estimate the decay
factor, resulting in extreme attenuation of the PLC excitation.

efdd3143