Commits · b518b56fe11bf53f88fe30d57ea9d668337983a9 · Alexander Traud / Opus

May 21, 2013

Clean up register constraints. · b518b56f

Timothy B. Terriberry authored 11 years ago

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html
 says that "Rd cannot be the same as Rm."
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBJEHG.html
 says that "RdLo, RdHi, and Rm must all be different registers."
This means that some of the early clobbers I removed really should
 have been there (to prevent aliasing Rd, RdLo, or RdHi with Rm).
It also means that we should reverse some of the operands in the
 FFT's complex multiplies.
This should only affect the ARMv4 optimizations.

Thanks to Nils Wallménius for the report.

While we're here, audit the commutative pair flags again, since I
 screwed up at least one of them, and eliminate some dead code.

b518b56f

May 20, 2013

Add ARMv4/ARMv5E macros. · 972a34ec

Timothy B. Terriberry authored 11 years ago

Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
 http://lists.xiph.org/pipermail/opus/2013-May/002078.html

Revised version:
- Add autconf detection (ported from libtheora).
- Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
- Use actual macros so they can still be selectively overridden.
- Split out ARMv4 parts and add a few more ARMv4 macros.
- Label blocks to make them easy to find in generated assembly.
- Fix MULT16_32_Q15() so we can pass make check.
  The MDCT test passes in values larger than 2**30 for b.
  The new version should be just as fast (or faster, since it's
   easier to merge the shift with following instructions), and
   there's no appreciable impact on accuracy (FFT/MDCT SNR actually
   goes up in most cases).
- Fix register constraints.
  We were using early-clobber flags in a bunch of places that
   didn't need them, and commutative-pair flags in a bunch of
   places that weren't actually commutative.
  This was Jean-Marc's fault (the original code came from Speex).
- Simplify silk_CLZ16().
- Port over iFFT C_MULC asm by Andree Buschmann
   <AndreeBuschmann@t-online.de> from Rockbox.
- Speed up the C_MULC asm by using LDRD, allowing more flexible
   addressing, re-ordering instructions to avoid some stalls,
   allowing more flexible register allocation, and getting things
   out of the inline asm block so the compiler can schedule them
   better.
- Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
   new C_MULC.

In total, this patch gives a 22.3% speed-up on test_opus_encoder on
 a 600 MHz Cortex A8 using gcc 4.2.1,
When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
 on the same processor/compiler.
On the conformance test vectors:
 Average mono quality is 97.0583 %
 Average stereo quality is 97.775 %

972a34ec

May 19, 2013
- celt_maxabs16() now returns an opus_val32 to avoid problems with -32768 · b7bd4c20
  Jean-Marc Valin authored 11 years ago
  
  b7bd4c20
May 16, 2012
- Revert "Adds 3rd clause to CELT license" · 88f22f2d
  Jean-Marc Valin authored 12 years ago
  
  This reverts commit 9f407afa.
  88f22f2d
Apr 24, 2012
- Adds 3rd clause to CELT license · 9f407afa
  Jean-Marc Valin authored 12 years ago
  
  9f407afa
Apr 20, 2012
- s/FOUNDATION/COPYRIGHT OWNER/ in CELT code and "glue code" · cb05e7cd
  Jean-Marc Valin authored 12 years ago
  
  Also added 3rd clause to "master" COPYING file
  cb05e7cd
Sep 14, 2011
- renames the libcelt/ directory to celt/ · c3749909
  Jean-Marc Valin authored 13 years ago
  
  c3749909
Aug 02, 2011
- Remove many unused defines and convert some double constants to float. · 662587d9
  Gregory Maxwell authored 13 years ago
  
  662587d9
Jul 31, 2011
- Correct many whitespace errors under libcelt/ and remove · 71d39ad8
  Gregory Maxwell authored 13 years ago and Jean-Marc Valin committed 13 years ago
  
  non-ascii characters from the source.
  71d39ad8
Jul 29, 2011
- Renamed celt_word* to opus_val* · ff5f7228
  Jean-Marc Valin authored 13 years ago
  
  ff5f7228
Feb 10, 2011
- Relicensing under the simplified (2-clause) BSD license · 3806c1d7
  Jean-Marc Valin authored 14 years ago
  
  Got authorization from all copyright holders
  3806c1d7
Oct 18, 2009
- Removed the _t from all the celt*_t types to avoid clashing with POSIX · 234969c9
  Jean-Marc Valin authored 15 years ago
  
  234969c9
Apr 10, 2008
- Defining IMUL32 for 32x32=>32 int multiplications and using it in the range · 821945d9
  Jean-Marc Valin authored 16 years ago
  
  coder
  821945d9
Mar 27, 2008
- Revert ABS16/32 on C55 -- ended up being slower · 9c50c6bc
  Jean-Marc Valin authored 17 years ago
  
  9c50c6bc
- ABS16 and ABS32 for the C55 · 4fd989e8
  Jean-Marc Valin authored 17 years ago
  
  4fd989e8
Mar 23, 2008
- include "dsplib.h" in fixed_c5x.h · a75e25da
  Jean-Marc Valin authored 17 years ago
  
  a75e25da
Mar 22, 2008
- defined find_max16 and overrode it for C55x · 17ad401c
  Jean-Marc Valin authored 17 years ago
  
  17ad401c
Mar 21, 2008
- fix for TI version of celt_maxabs16() · 59f42b5d
  Jean-Marc Valin authored 17 years ago
  
  59f42b5d
- fixed-point: defined celt_maxabs16() as basic operator · 9901cb9e
  Jean-Marc Valin authored 17 years ago
  
  9901cb9e
Mar 20, 2008
- fixed-point: MULT16_32_Q15 for TI DSP (not entirely happy with it) · 948dabc7
  Jean-Marc Valin authored 17 years ago
  
  948dabc7
- fixed-point: using TI intrinsic for celt_ilog2() if available. · 83006eec
  Jean-Marc Valin authored 17 years ago
  
  83006eec
Mar 16, 2008
- fixed-point: more TI macros. Comments on the existing ones. · b311554c
  Jean-Marc Valin authored 17 years ago
  
  b311554c
Mar 14, 2008
- New C55 macro · bfcbd184
  Jean-Marc Valin authored 17 years ago
  
  bfcbd184
Mar 11, 2008
- Added macro definitions for the TI C5x family (untested) · 72e8003f
  Jean-Marc Valin authored 17 years ago
  
  72e8003f
Mar 05, 2008

fixed-point: changed find_spectral_pitch() to use single-precision (16-bit) FFT. · f93747c4

Jean-Marc Valin authored 17 years ago

This involved adding kfft_single.[ch] that redefines kiss_fft a second time
with a different prefix. All this is still a bit of a mess now. The mask
had to be converted to 16-bit input, but we're still using floats to apply it.

f93747c4

Feb 27, 2008

fixed-point: log-energy for previous frame now a 16-bit value. This currently · 5d561834

Jean-Marc Valin authored 17 years ago

intruduces a bit of an encoder-decoder mismatch (Q8 in dB), but it'll be
reduced when the interals of quant_energy_mono() are properly converted to
fixed-point and oldEBands gets rounded instead of truncated.

5d561834

Feb 26, 2008
- fixed-point: added a celt_ener_t type for band energy. · e901fe35
  Jean-Marc Valin authored 17 years ago
  
  e901fe35
Feb 13, 2008
- Introducing a (very) crude budget for the energy encoder. · c9cc6d3e
  Jean-Marc Valin authored 17 years ago
  
  c9cc6d3e
Dec 07, 2007
- No more cheating, everything fully quantised · 98d2a491
  Jean-Marc Valin authored 17 years ago
  
  98d2a491
- energy decoding partially done (cheating a bit) · 8143be30
  Jean-Marc Valin authored 17 years ago
  
  8143be30
Dec 05, 2007
- Quantisation of band energies (adding files) · 8b0137aa
  Jean-Marc Valin authored 17 years ago
  
  8b0137aa
- conversion to modes complete · 96870d93
  Jean-Marc Valin authored 17 years ago
  
  96870d93
- Converting the code to use the modes instead of global arrays. · 73e51b3e
  Jean-Marc Valin authored 17 years ago
  
  73e51b3e
Dec 04, 2007
- Adding mode infrastructure (still incomplete) · ecb36a33
  Jean-Marc Valin authored 17 years ago
  
  ecb36a33
Nov 30, 2007
- Added pitch analysis. Doesn't crash, but otherwise untested. · 14191b3c
  Jean-Marc Valin authored 17 years ago
  
  14191b3c
- Got MDCT analysis-synthesis to work · 013c31d6
  Jean-Marc Valin authored 17 years ago
  
  013c31d6