diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml index 4b6c03a1a82e85f1f49a8dbc749e6135d90e8604..033a55c1cf9b8abfc3ef28339db3472ccd97ada4 100644 --- a/doc/draft-ietf-codec-opus.xml +++ b/doc/draft-ietf-codec-opus.xml @@ -506,25 +506,26 @@ Insert decoder figure. <ttcol align='center'>Symbol(s)</ttcol> <ttcol align='center'>PDF</ttcol> <ttcol align='center'>Condition</ttcol> -<c>silence</c> <c>logp=15</c> <c></c> -<c>post-filter</c> <c>logp=1</c> <c></c> +<c>silence</c> <c>[32767, 1]/32768</c> <c></c> +<c>post-filter</c> <c>[1, 1]/2</c> <c></c> <c>octave</c> <c>uniform (6)</c><c>post-filter</c> <c>period</c> <c>raw bits (4+octave)</c><c>post-filter</c> <c>gain</c> <c>raw bits (3)</c><c>post-filter</c> <c>tapset</c> <c>[2, 1, 1]/4</c><c>post-filter</c> -<c>transient</c> <c>logp=3</c><c></c> +<c>transient</c> <c>[7, 1]/8</c><c></c> +<c>intra</c> <c>[7, 1]/8</c><c></c> <c>coarse energy</c><c><xref target="energy-decoding"/></c><c></c> <c>tf_change</c> <c><xref target="transient-decoding"/></c><c></c> -<c>tf_select</c> <c>logp=1</c><c><xref target="transient-decoding"/></c> +<c>tf_select</c> <c>[1, 1]/2</c><c><xref target="transient-decoding"/></c> <c>spread</c> <c>[7, 2, 21, 2]/32</c><c></c> <c>dyn. alloc.</c> <c><xref target="allocation"/></c><c></c> <c>alloc. trim</c> <c>[2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2]/128</c><c></c> -<c>skip (*)</c> <c>logp=1</c><c><xref target="allocation"/></c> +<c>skip (*)</c> <c>[1, 1]/2</c><c><xref target="allocation"/></c> <c>intensity (*)</c><c>uniform</c><c><xref target="allocation"/></c> -<c>dual (*)</c> <c>logp=1</c><c></c> +<c>dual (*)</c> <c>[1, 1]/2</c><c></c> <c>fine energy</c> <c><xref target="energy-decoding"/></c><c></c> <c>residual</c> <c><xref target="PVQ-decoder"/></c><c></c> -<c>anti-collapse</c><c>logp=1</c><c>stereo && transient</c> +<c>anti-collapse</c><c>[1, 1]/2</c><c>stereo && transient</c> <c>finalize</c> <c><xref target="energy-decoding"/></c><c></c> <postamble>Order of the symbols in the CELT section of the bit-stream</postamble> </texttable> @@ -555,23 +556,71 @@ tf_change flags. </section> <section anchor="energy-decoding" title="Energy Envelope Decoding"> + <t> -The energy of each band is extracted from the bit-stream in two steps according -to the same coarse-fine strategy used in the encoder. First, the coarse energy is -decoded in unquant_coarse_energy() (quant_bands.c) -based on the probability of the Laplace model used by the encoder. -</t> +It is important to quantize the energy with sufficient resolution because +any energy quantization error cannot be compensated for at a later +stage. Regardless of the resolution used for encoding the shape of a band, +it is perceptually important to preserve the energy in each band. CELT uses a +three-step coarse-fine-fine strategy for encoding the energy in the base-2 log +domain, as implemented in quant_bands.c</t> +<section anchor="coarse-energy-decoding" title="Coarse energy decoding"> <t> -After the coarse energy is decoded, the same allocation function as used in the -encoder is called. This determines the number of -bits to decode for the fine energy quantization. The decoding of the fine energy bits -is performed by unquant_fine_energy() (quant_bands.c). -Finally, like the encoder, the remaining bits in the stream (that would otherwise go unused) -are decoded using unquant_energy_finalise() (quant_bands.c). +Coarse quantization of the energy uses a fixed resolution of 6 dB +(integer part of base-2 log). To minimize the bitrate, prediction is applied +both in time (using the previous frame) and in frequency (using the previous +bands). The part of the prediction that is based on the +previous frame can be disabled, creating an "intra" frame where the energy +is coded without reference to prior frames. The decoder first reads the intra flag +to determine what prediction is used. +The 2-D z-transform of +the prediction filter is: A(z_l, z_b)=(1-a*z_l^-1)*(1-z_b^-1)/(1-b*z_b^-1) +where b is the band index and l is the frame index. The prediction coefficients +applied depend on the frame size in use when not using intra energy and a=0 b=4915/32768 +when using intra energy. +The time-domain prediction is based on the final fine quantization of the previous +frame, while the frequency domain (within the current frame) prediction is based +on coarse quantization only (because the fine quantization has not been computed +yet). The prediction is clamped internally so that fixed point implementations with +limited dynamic range to not suffer desynchronization. +We approximate the ideal +probability distribution of the prediction error using a Laplace distribution +with seperate parameters for each frame size in intra and inter-frame modes. The +coarse energy quantization is performed by unquant_coarse_energy() and +unquant_coarse_energy_impl() (quant_bands.c). The encoding of the Laplace-distributed values is +implemented in ec_laplace_decode() (laplace.c). </t> + </section> +<section anchor="fine-energy-decoding" title="Fine energy quantization"> +<t> +The number of bits assigned to fine energy quantization in each band is determined +by the bit allocation computation described in <xref target="allocation"></xref>. +Let B_i be the number of fine energy bits +for band i; the refinement is an integer f in the range [0,2^B_i-1]. The mapping between f +and the correction applied to the coarse energy is equal to (f+1/2)/2^B_i - 1/2. Fine +energy quantization is implemented in quant_fine_energy() (quant_bands.c). +</t> +<t> +When some bits are left "unused" after all other flags have been decoded, these bits +are assigned to a "final" step of fine allocation. In effect, these bits are used +to add one extra fine energy bit per band per channel. The allocation process +determines two <spanx style="emph">priorities</spanx> for the final fine bits. +Any remaining bits are first assigned only to bands of priority 0, starting +from band 0 and going up. If all bands of priority 0 have received one bit per +channel, then bands of priority 1 are assigned an extra bit per channel, +starting from band 0. If any bit is left after this, they are left unused. +This is implemented in unquant_energy_finalise() (quant_bands.c). +</t> + +</section> <!-- fine energy --> + +</section> <!-- Energy decode --> + + + <section anchor="allocation" title="Bit allocation"> <t>Bit allocation is performed based only on information available to both the encoder and decoder. The same calculations are performed in a bit-exact