Skip to content
Snippets Groups Projects
  1. May 20, 2013
    • Timothy B. Terriberry's avatar
      Add ARMv4/ARMv5E macros. · 972a34ec
      Timothy B. Terriberry authored
      Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
       http://lists.xiph.org/pipermail/opus/2013-May/002078.html
      
      Revised version:
      - Add autconf detection (ported from libtheora).
      - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
      - Use actual macros so they can still be selectively overridden.
      - Split out ARMv4 parts and add a few more ARMv4 macros.
      - Label blocks to make them easy to find in generated assembly.
      - Fix MULT16_32_Q15() so we can pass make check.
        The MDCT test passes in values larger than 2**30 for b.
        The new version should be just as fast (or faster, since it's
         easier to merge the shift with following instructions), and
         there's no appreciable impact on accuracy (FFT/MDCT SNR actually
         goes up in most cases).
      - Fix register constraints.
        We were using early-clobber flags in a bunch of places that
         didn't need them, and commutative-pair flags in a bunch of
         places that weren't actually commutative.
        This was Jean-Marc's fault (the original code came from Speex).
      - Simplify silk_CLZ16().
      - Port over iFFT C_MULC asm by Andree Buschmann
         <AndreeBuschmann@t-online.de> from Rockbox.
      - Speed up the C_MULC asm by using LDRD, allowing more flexible
         addressing, re-ordering instructions to avoid some stalls,
         allowing more flexible register allocation, and getting things
         out of the inline asm block so the compiler can schedule them
         better.
      - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
         new C_MULC.
      
      In total, this patch gives a 22.3% speed-up on test_opus_encoder on
       a 600 MHz Cortex A8 using gcc 4.2.1,
      When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
       on the same processor/compiler.
      On the conformance test vectors:
       Average mono quality is 97.0583 %
       Average stereo quality is 97.775 %
      972a34ec
Loading