From 84846910c5133b2f53833c2c6a7a56add6de6df4 Mon Sep 17 00:00:00 2001 From: Jean-Marc Valin <jmvalin@jmvalin.ca> Date: Thu, 27 Oct 2011 15:34:21 -0400 Subject: [PATCH] draft: CELT encoder description for tf_analysis() and spreading_decision() --- doc/draft-ietf-codec-opus.xml | 50 ++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml index 5536a6eea..02b6d87a4 100644 --- a/doc/draft-ietf-codec-opus.xml +++ b/doc/draft-ietf-codec-opus.xml @@ -5778,17 +5778,17 @@ A block diagram of the encoder is illustrated below. <figure> <artwork> <![CDATA[ - +----------+ +-------+ - | sample | | SILK | - +->| rate |--->|encoder|--+ - +-----------+ | |conversion| | | | - | Optional | | +----------+ +-------+ | +-------+ --->| high-pass |---+ +--->| Range | - + filter + | +------------+ +-------+ |encoder|----> - +-----------+ | | Delay | | CELT | +--->| | bitstream - +->|compensation|->|encoder|--+ +-------+ - | | | | - +------------+ +-------+ + +----------+ +-------+ + | sample | | SILK | + +->| rate |--->|encoder|--+ + +-----------+ | |conversion| | | | + | Optional | | +----------+ +-------+ | +-------+ +->| high-pass |--+ +-->| Range | + + filter + | +------------+ +-------+ |encoder|----> + +-----------+ | | Delay | | CELT | +-->| | bit- + +->|compensation|->|encoder|--+ +-------+ stream + | | | | + +------------+ +-------+ ]]> </artwork> </figure> @@ -6388,7 +6388,7 @@ encoder are described here. </t> <section anchor="pitch-prefilter" title="Pitch Prefilter"> -<t>The pitch prefilter is applied after the pre-emphasis and before the de-emphasis. It's applied +<t>The pitch prefilter is applied after the pre-emphasis. It is applied in such a way as to be the inverse of the decoder's post-filter. The main non-obvious aspect of the prefilter is the selection of the pitch period. The pitch search should be optimised for the following criteria: @@ -6425,6 +6425,30 @@ the coding rate, the available bit-rate, and the current rate of packet loss. </t> </section> <!-- Energy quant --> +<section title="Time-Frequency Decision"> +<t> +The choice of time-frequency resolution used in <xref target="tf-change"></xref> is based on +rate-distortion (RD) optimization. The distortion is the L1-norm (sum of absolute values) of each band +after each TF resolution under consideration. The L1 norm is used because it represents the entropy +for a Laplacian source. The number of bits required to code a change in TF resolution between +two bands is higher than the cost of having those two bands use the same resolution, which is +what requires the RD optimization. The optimal decision is computed using the Viterbi algorithm. +See tf_analysis() in celt/celt.c. +</t> +</section> + +<section title="Spreading Values Decision"> +<t> +The choice of the spreading value in <xref target="spread values"></xref> has an +impact on the nature of the coding noise introduced by CELT. The larger the f_r value, the +lower the impact of the rotation, and the more tonal the coding noise. The +more tonal the signal, the more tonal the noise should be, so the CELT encoder determines +the optimal value for f_r by estimating how tonal the signal is. The tonality estimate +is based on discrete pdf (4-bin histogram) of each band. Bands that have a large number of small +values are considered more tonal and a decision is made by combining all bands with more than +8 samples. See spreading_decision() in celt/bands.c. +</t> +</section> <section anchor="pvq" title="Spherical Vector Quantization"> <t>CELT uses a Pyramid Vector Quantization (PVQ) <xref target="PVQ"></xref> @@ -6473,7 +6497,7 @@ J = -X * y / ||y|| <t> The search described above is considered to be a good trade-off between quality and computational cost. However, there are other possible ways to search the PVQ -codebook and the implementers MAY use any other search methods. +codebook and the implementers MAY use any other search methods. See alg_quant() in celt/vq.c. </t> </section> -- GitLab