This document contains a detailed description of both the encoder and the decoder, along with a reference implementation. In most circumstances, and unless otherwise stated, the calculations in other implementations do NOT need to produce results that are bit-identical with the reference implementation, so alternate algorithms can sometimes be used. However, there are a few (clearly identified) cases where bit-exactness is required. An implementation is considered to be compatible if, for any valid bit-stream, the decoder's output is perceptually very close to the output produced by the reference decoder.
</t>
<t>
The CELT codec does not use a standard <spanxstyle="emph">bit-packer</spanx>,
but rather uses a range coder to pack both integers and entropy-coded symbols.
The bit-stream generated by the encoder contains (in the same order) the
following parameters:
</t>
<t>
<liststyle="symbols">
<t>Feature flags (2-4 bits)</t>
<t>if P=1
<liststyle="symbols">
<t>Pitch period</t>
</list></t>
<t>if S=1
<liststyle="symbols">
<t>Transient scalefactor</t>
<t>if scalefactor=(1 or 2) AND more than 2 short MDCTs
<liststyle="symbols">
<t>ID of block before transient</t>
</list></t>
<t>if scalefactor=3
<liststyle="symbols">
<t>Transient time</t>
</list></t>
</list></t>
<t>Coarse energy encoding (for each band)</t>
<t>Fine energy encoding (for each band)</t>
<t>For each band
<liststyle="symbols">
<t>if P=1 and band is at the beginning of a pitch band
<list>
<t>Pitch gain bit</t>
</list></t>
<t>PVQ indices</t>
</list></t>
<t>More fine energy (using all remaining bits)</t>
</list>
</t>
<t>Note that due to the use of a range coder, all the parameters have to be encoded and decoded in order. </t>
</section>
</section>
<sectionanchor="CELT Modes"title="CELT Modes">
<t>
The operation of both the encoder and decoder depend on the
mode data. This data includes:
The operation of both the encoder and decoder depend on the mode data. A mode
definition can be created by celt_create_mode() (<xreftarget="modes.h">modes.h</xref>)
based on three parameters:
<liststyle="symbols">
<t>frame size (number of samples)</t>
<t>sampling rate (samples per second)</t>
<t>number of channels (1 or 2)</t>
</list>
</t>
<t>The mode data that is created defines how the encoder and the decoder operate. More specifically, the following information is contained in the mode object:
<liststyle="symbols">
<t>Frame size</t>
<t>Sampling rate</t>
...
...
@@ -155,6 +217,11 @@ mode data. This data includes:
<t>Pulse allocation data</t>
</list>
</t>
<t>
The windowing overlap is the amount of overlap between the frames. CELT uses a low-overlap window that is typically half of the frame size. For a frame size of 256 samples, the overlap is 128 samples, so the total algorithmic delay is 256+128=384. CELT divides the audio into frequency bands, for which the energy is preserved. These bands are chosen to follow the ear's critical bands (Bark scale), with the exception that each band has to contain at least 3 frequency bins.
The CELT codec has several optional features that be switched on of off, some of which are mutually exclusive. The four main flags are intra-frame energy (I), pitch (P), short blocks (S), and folding (F). Those are described in more details below. There are eight valid combinations of these four features, and they are encoded first into the stream using a variable length code (<xreftarget="flags-encoding"></xref>). It is left to the implementor to choose to enable each of the flags, with the only restriction that the combination of the four flags needs to correspond to a valid entry in <xreftarget="flags-encoding"></xref>.
The CELT codec has several optional features that can be switched on or off, some of which are mutually exclusive. The four main flags are intra-frame energy (I), pitch (P), short blocks (S), and folding (F). Those are described in more details below. There are eight valid combinations of these four features, and they are encoded first into the stream using a variable length code (<xreftarget="flags-encoding"></xref>). It is left to the implementor to choose to enable each of the flags, with the only restriction that the combination of the four flags needs to correspond to a valid entry in <xreftarget="flags-encoding"></xref>.
</t>
<texttableanchor="flags-encoding">
...
...
@@ -473,6 +502,12 @@ The last pulse is the only one considering the pitch and minimizes the cost func
J = -g_f * R^T*y + (g_f)^2 * ||y||^2
</t>
<t>
The search described above is considered to be a good trade-off between quality
and computational cost. However, there are other possible ways to search the PVQ
codebook and the implementors MAY use any other search methods.