Skip to content
Snippets Groups Projects
Forked from Xiph.Org / Opus
Loading
Timothy B. Terriberry's avatar
Timothy B. Terriberry authored
This patch makes all symbols conditional on whether or not there's
 enough space left in the buffer to code them, and eliminates much
 of the redundancy in the side information.

A summary of the major changes:
* The isTransient flag is moved up to before the the coarse energy.
  If there are not enough bits to code the coarse energy, the flag
   would get forced to 0, meaning what energy values were coded
   would get interpreted incorrectly.
  This might not be the end of the world, and I'd be willing to
   move it back given a compelling argument.
* Coarse energy switches coding schemes when there are less than 15
   bits left in the packet:
  - With at least 2 bits remaining, the change in energy is forced
     to the range [-1...1] and coded with 1 bit (for 0) or 2 bits
     (for +/-1).
  - With only 1 bit remaining, the change in energy is forced to
     the range [-1...0] and coded with one bit.
  - If there is less than 1 bit remaining, the change in energy is
     forced to -1.
    This effectively low-passes bands whose energy is consistently
     starved; this might be undesirable, but letting the default be
     zero is unstable, which is worse.
* The tf_select flag gets moved back after the per-band tf_res
   flags again, and is now skipped entirely when none of the
   tf_res flags are set, and the default value is the same for
   either alternative.
* dynalloc boosting is now limited so that it stops once it's given
   a band all the remaining bits in the frame, or when it hits the
   "stupid cap" of (64<<LM)*(C<<BITRES) used during allocation.
* If dynalloc boosing has allocated all the remaining bits in the
   frame, the alloc trim parameter does not get encoded (it would
   have no effect).
* The intensity stereo offset is now limited to the range
   [start...codedBands], and thus doesn't get coded until after
   all of the skip decisions.
  Some space is reserved for it up front, and gradually given back
   as each band is skipped.
* The dual stereo flag is coded only if intensity>start, since
   otherwise it has no effect.
  It is now coded after the intensity flag.
* The space reserved for the final skip flag, the intensity stereo
   offset, and the dual stereo flag is now redistributed to all
   bands equally if it is unused.
  Before, the skip flag's bit was given to the band that stopped
   skipping without it (usually a dynalloc boosted band).

In order to enable simple interaction between VBR and these
 packet-size enforced limits, many of which are encountered before
 VBR is run, the maximum packet size VBR will allow is computed at
 the beginning of the encoding function, and the buffer reduced to
 that size immediately.
Later, when it is time to make the VBR decision, the minimum packet
 size is set high enough to ensure that no decision made thus far
 will have been affected by the packet size.
As long as this is smaller than the up-front maximum, all of the
 encoder's decisions will remain in-sync with the decoder.
If it is larger than the up-front maximum, the packet size is kept
 at that maximum, also ensuring sync.
The minimum used now is slightly larger than it used to be, because
 it also includes the bits added for dynalloc boosting.
Such boosting is shut off by the encoder at low rates, and so
 should not cause any serious issues at the rates where we would
 actually run out of room before compute_allocation().
76469c64
History
CELT is a very low delay audio codec designed for high-quality communications.

Traditional full-bandwidth  codecs such as Vorbis and AAC can offer high
quality but they require codec delays of hundreds of milliseconds, which
makes them unsuitable  for real-time interactive applications like tele-
conferencing. Speech targeted codecs, such as Speex or G.722, have lower
20-40ms delays but their speech focus and limited sampling rates 
restricts their quality, especially for music.

Additionally, the other mandatory components of a full network audio system—
audio interfaces, routers, jitter buffers— each add their own delay. For lower
speed networks the time it takes to serialize a  packet onto the network cable
takes considerable time, and over the long distances the speed of light
imposes a significant delay.

In teleconferencing— it is important to keep delay low so that the participants
can communicate fluidly without talking on top of each  other and so that their
own voices don't return after a round trip as an annoying echo.

For network music performance— research has show that the total one way delay
must be kept under 25ms to avoid degrading the musicians performance. 

Since many of the sources of delay in a complete system are outside of the
user's control (such as the  speed of light) it is often  only possible to
reduce the total delay by reducing the codec delay. 

Low delay has traditionally been considered a challenging area in audio codec
design, because as a codec is forced to work on the smaller chunks of audio
required for low delay it has access to less redundancy and less perceptual
information which it can use to reduce the size of the transmitted audio.

CELT is designed to bridge the gap between "music" and "speech" codecs,
permitting new very high quality teleconferencing applications, and to go
further, permitting latencies much lower than speech codecs normally provide
to enable applications such as remote musical collaboration even over long
distances.  

In keeping with the Xiph.Org mission—  CELT is also designed to accomplish
this without copyright or patent encumbrance. Only by keeping the formats
that drive our Internet communication free and unencumbered can we maximize
innovation, collaboration, and interoperability.  Fortunately, CELT is ahead
of the adoption curve in its target application space, so there should be 
no reason for someone who needs what CELT provides to go with a proprietary
codec.

CELT has been tested on x86, x86_64, ARM, and the TI C55x DSPs, and should
be portable to any platform with a working C compiler and on the order of
100 MIPS of processing power. 

The code is still in early stage, so it may be broken from time to time, and
the bit-stream is not frozen yet, so it is different from one version to 
another. Oh, and don't complain if it sets your house on fire.

Complaints and accolades can be directed to the CELT mailing list:
http://lists.xiph.org/mailman/listinfo/celt-dev/

To compile:
% ./configure
% make

For platforms without fast floating point support (such as ARM) use the
--enable-fixed argument to configure to build a fixed-point version of CELT.

There are Ogg-based encode/decode tools in tools/. These are quite similar to
the speexenc/speexdec tools. Use the --help option for details.

There is also a basic tool for testing the encoder and decoder called
"testcelt" located in libcelt/: 

% testcelt <rate> <channels> <frame size> <bytes per packet> input.sw output.sw

where input.sw is a 16-bit (machine endian) audio file sampled at 32000 Hz to 
96000 Hz. The output file is already decompressed.  

For example, for a 44.1 kHz mono stream at ~64kbit/sec and with 256 sample
frames:

% testcelt 44100 1 256 46 intput.sw output.sw 

Since 44100/256*46*8 = 63393.74 bits/sec.

All even frame sizes from 64 to 512 are currently supported, although
power-of-two sizes are recommended  and most CELT development is done
using a size of 256.  The delay imposed by CELT is  1.25x - 1.5x  the 
frame duration depending on the frame size and some details of CELT's
internal operation.  For 256 sample frames the delay is 1.5x  or  384
samples, so the total codec delay in the above example is 8.70ms 
(1000/(44100/384)).