@@ -6029,7 +6038,6 @@ fl=sum(f(i),i<k), fh=fl+f(i), and ft=sum(f(i)).
12: LTP state scaling coefficient. Controlling error propagation
/ prediction gain trade-off
13: Quantized signal
14: Range encoded bitstream
]]>
</artwork>
...
...
@@ -6059,18 +6067,9 @@ fl=sum(f(i),i<k), fh=fl+f(i), and ft=sum(f(i)).
</t>
</section>
<sectiontitle='High-Pass Filter'>
<t>
The input signal is filtered by a high-pass filter to remove the lowest part of the spectrum that contains little speech energy and may contain background noise. This is a second order Auto Regressive Moving Average (ARMA) filter with a cut-off frequency around 70 Hz.
</t>
<t>
In the future, a music detector may also be used to lower the cut-off frequency when the input signal is detected to be music rather than speech.
For a frame of voiced speech the pitch pulses will remain dominant in the pre-whitened input signal. Further whitening is desirable as it leads to higher quality at the same available bitrate. To achieve this, a Long-Term Prediction (LTP) analysis is carried out to estimate the coefficients of a fifth-order LTP filter for each of four subframes. The LTP coefficients are used to find an LTP residual signal with the simulated output signal as input to obtain better modeling of the output signal. This LTP residual signal is the input to an LPC analysis where the LPCs are estimated using Burg's method, such that the residual energy is minimized. The estimated LPCs are converted to a Line Spectral Frequency (LSF) vector and quantized as described in <xreftarget='lsf_quantizer_overview_section'/>. After quantization, the quantized LSF vector is converted back to LPC coefficients using the full procedure in <xreftarget="silk_nlsfs"/>. By using LPC coefficients derived from the quantized LSF coefficients, the encoder remains fully synchronized with the decoder. The LTP coefficients are quantized using a method described in <xreftarget='ltp_quantizer_overview_section'/>. The quantized LPC and LTP coefficients are then used to filter the high-pass filtered input signal and measure residual energy for each of the four subframes.
For a frame of voiced speech the pitch pulses will remain dominant in the pre-whitened input signal. Further whitening is desirable as it leads to higher quality at the same available bitrate. To achieve this, a Long-Term Prediction (LTP) analysis is carried out to estimate the coefficients of a fifth-order LTP filter for each of four subframes. The LTP coefficients are used to find an LTP residual signal with the simulated output signal as input to obtain better modeling of the output signal. This LTP residual signal is the input to an LPC analysis where the LPCs are estimated using Burg's method, such that the residual energy is minimized. The estimated LPCs are converted to a Line Spectral Frequency (LSF) vector and quantized as described in <xreftarget='lsf_quantizer_overview_section'/>. After quantization, the quantized LSF vector is converted back to LPC coefficients using the full procedure in <xreftarget="silk_nlsfs"/>. By using LPC coefficients derived from the quantized LSF coefficients, the encoder remains fully synchronized with the decoder. The LTP coefficients are quantized using a method described in <xreftarget='ltp_quantizer_overview_section'/>. The quantized LPC and LTP coefficients are then used to filter the input signal and measure residual energy for each of the four subframes.
For a speech signal that has been classified as unvoiced, there is no need for LTP filtering, as it has already been determined that the pre-whitened input signal is not periodic enough within the allowed pitch period range for LTP analysis to be worth the cost in terms of complexity and rate. The pre-whitened input signal is therefore discarded, and instead the high-pass filtered input signal is used for LPC analysis using Burg's method. The resulting LPC coefficients are converted to an LSF vector and quantized as described in the following section. They are then transformed back to obtain quantized LPC coefficients, which are then used to filter the high-pass filtered input signal and measure residual energy for each of the four subframes.
For a speech signal that has been classified as unvoiced, there is no need for LTP filtering, as it has already been determined that the pre-whitened input signal is not periodic enough within the allowed pitch period range for LTP analysis to be worth the cost in terms of complexity and rate. The pre-whitened input signal is therefore discarded, and instead the input signal is used for LPC analysis using Burg's method. The resulting LPC coefficients are converted to an LSF vector and quantized as described in the following section. They are then transformed back to obtain quantized LPC coefficients, which are then used to filter the input signal and measure residual energy for each of the four subframes.