Skip to content
Snippets Groups Projects
  • Timothy B. Terriberry's avatar
    972a34ec
    Add ARMv4/ARMv5E macros. · 972a34ec
    Timothy B. Terriberry authored
    Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
     http://lists.xiph.org/pipermail/opus/2013-May/002078.html
    
    Revised version:
    - Add autconf detection (ported from libtheora).
    - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
    - Use actual macros so they can still be selectively overridden.
    - Split out ARMv4 parts and add a few more ARMv4 macros.
    - Label blocks to make them easy to find in generated assembly.
    - Fix MULT16_32_Q15() so we can pass make check.
      The MDCT test passes in values larger than 2**30 for b.
      The new version should be just as fast (or faster, since it's
       easier to merge the shift with following instructions), and
       there's no appreciable impact on accuracy (FFT/MDCT SNR actually
       goes up in most cases).
    - Fix register constraints.
      We were using early-clobber flags in a bunch of places that
       didn't need them, and commutative-pair flags in a bunch of
       places that weren't actually commutative.
      This was Jean-Marc's fault (the original code came from Speex).
    - Simplify silk_CLZ16().
    - Port over iFFT C_MULC asm by Andree Buschmann
       <AndreeBuschmann@t-online.de> from Rockbox.
    - Speed up the C_MULC asm by using LDRD, allowing more flexible
       addressing, re-ordering instructions to avoid some stalls,
       allowing more flexible register allocation, and getting things
       out of the inline asm block so the compiler can schedule them
       better.
    - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
       new C_MULC.
    
    In total, this patch gives a 22.3% speed-up on test_opus_encoder on
     a 600 MHz Cortex A8 using gcc 4.2.1,
    When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
     on the same processor/compiler.
    On the conformance test vectors:
     Average mono quality is 97.0583 %
     Average stereo quality is 97.775 %
    972a34ec
    History
    Add ARMv4/ARMv5E macros.
    Timothy B. Terriberry authored
    Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
     http://lists.xiph.org/pipermail/opus/2013-May/002078.html
    
    Revised version:
    - Add autconf detection (ported from libtheora).
    - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
    - Use actual macros so they can still be selectively overridden.
    - Split out ARMv4 parts and add a few more ARMv4 macros.
    - Label blocks to make them easy to find in generated assembly.
    - Fix MULT16_32_Q15() so we can pass make check.
      The MDCT test passes in values larger than 2**30 for b.
      The new version should be just as fast (or faster, since it's
       easier to merge the shift with following instructions), and
       there's no appreciable impact on accuracy (FFT/MDCT SNR actually
       goes up in most cases).
    - Fix register constraints.
      We were using early-clobber flags in a bunch of places that
       didn't need them, and commutative-pair flags in a bunch of
       places that weren't actually commutative.
      This was Jean-Marc's fault (the original code came from Speex).
    - Simplify silk_CLZ16().
    - Port over iFFT C_MULC asm by Andree Buschmann
       <AndreeBuschmann@t-online.de> from Rockbox.
    - Speed up the C_MULC asm by using LDRD, allowing more flexible
       addressing, re-ordering instructions to avoid some stalls,
       allowing more flexible register allocation, and getting things
       out of the inline asm block so the compiler can schedule them
       better.
    - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
       new C_MULC.
    
    In total, this patch gives a 22.3% speed-up on test_opus_encoder on
     a 600 MHz Cortex A8 using gcc 4.2.1,
    When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
     on the same processor/compiler.
    On the conformance test vectors:
     Average mono quality is 97.0583 %
     Average stereo quality is 97.775 %