CELT can use a pitch predictor (also known as long-term predictor) to improve the voice quality at lower bit-rates. While the pitch period can be estimated in any way, it is RECOMMENDED for performance reasons to estimate it using a frequency-domain correlation between the current frame and the history buffer, as implemented in find_spectral_pitch() (<xreftarget="pitch.c">pitch.c</xref>). When the <spanxstyle="emph">P</spanx> bit is set, the pitch period is encoded after the flag bits. The value encoded is an integer in the range [0, 1024-N-overlap-1].
CELT can use a pitch predictor (also known as long-term predictor) to improve the voice quality at lower bit-rates. When the <spanxstyle="emph">P</spanx> bit is set, the pitch period is encoded after the flag bits. The value encoded is an integer in the range [0, 1024-N-overlap-1].
</t>
</section>
...
...
@@ -689,11 +689,10 @@ using the projected allocation. In the reference implementation this is
performed by compute_allocation() (<xreftarget="rate.c">rate.c</xref>).
The target computation begins by calculating the available space as the
number of whole bits which can be fit in the frame after Q1 is stored according
to the range coder (ec_[enc/dec]_tell()), and iff the frame has pitch prediction,
subtracting the number of pitch bands and then multiplying by 16.
Then the two projected prototype allocations whose sums multiplied by 16 are nearest
to the range coder (ec_[enc/dec]_tell()) and then multiplying by 8.
Then the two projected prototype allocations whose sums multiplied by 8 are nearest
to that value are determined. These two projected prototype allocations are then interpolated
by finding the highest integer interpolation coefficient in the range 0-16
by finding the highest integer interpolation coefficient in the range 0-8
such that the sum of the higher prototype times the coefficient, plus the
sum of the lower prototype multiplied by
the difference of 16 and the coefficient, is less than or equal to the
The pitch period T is computed in the frequency domain using a generalized
cross-correlation, as implemented in find_spectral_pitch()
(<xreftarget="pitch.c">pitch.c</xref>). An MDCT is then computed on the
synthesis signal memory using the offset T.
If there is sufficient energy in this
part of the signal, the pitch gain for each pitch band
is computed as g_a = X^T*p, where X is the normalized (non-quantized) signal and
p is the normalized pitch MDCT.
The gain is computed by compute_pitch_gain() (<xreftarget="bands.c">bands.c</xref>),
and if a sufficient number of bands have a high enough gain, then the pitch bit is set.
Otherwise, no use of pitch is made.
This section needs to be updated.
</t>
<t>
For frequencies above the highest pitch band (~6374 Hz), the pitch prediction is replaced by
spectral folding if and only if the folding bit is set. Spectral folding is implemented in
intra_fold() (<xreftarget="vq.c">vq.c</xref>). If the folding bit is not set, then
the prediction is simply set to zero.
The folding prediction uses the quantized spectrum at lower frequencies with a gain that depends
both on the width of the band, N, and the number of pulses allocated, K:
</t>
<t>
g_a = N / (N + 2*K*(K+1)),
</t>
<t>
When the short block bit is not set, the spectral copy is performed starting with bin 0 (DC) and going up. When the short block bit is set, then the starting point is chosen between 0 and B-1 in such a way that the source and destination bins belong to the same MDCT (i.e., to prevent the folding from causing pre-echo). Before the folding operation, each band of the source spectrum is multiplied by sqrt(N) so that the expected value of the squared value for each bin is equal to 1. The copied spectrum is then renormalized to have norm (||p|| = g_a).
</t>
<t>For stereo streams, the folding is performed independently for each channel.</t>