diff --git a/doc/build_draft.sh b/doc/build_draft.sh index 4254f166d494ef3ff237a531e2ce07d1b2cb66c4..4990191650b6df4ed5c57aa3d6cecb97c53f1663 100755 --- a/doc/build_draft.sh +++ b/doc/build_draft.sh @@ -50,6 +50,11 @@ cat opus_source.tar.gz| base64 | tr -d '\n' | fold -w 64 | \ #echo '</artwork>' >> opus_compare_escaped.c #echo '</figure>' >> opus_compare_escaped.c +if [[ ! -d ../opus_testvectors ]] ; then + echo "Downloading test vectors..." + wget 'http://www.opus-codec.org/testvectors/opus_testvectors-draft11.tar.gz' + tar -C .. -xvzf opus_testvectors-draft11.tar.gz +fi echo '<figure>' > testvectors_sha1 echo '<artwork>' >> testvectors_sha1 echo '<![CDATA[' >> testvectors_sha1 diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml index f516920f2a9109df8752d8a54b3875e83a976377..145ac4fdc671698c878e85b17aa40e540be19e7c 100644 --- a/doc/draft-ietf-codec-opus.xml +++ b/doc/draft-ietf-codec-opus.xml @@ -4827,13 +4827,46 @@ bands that (roughly) follow the Bark scale, i.e. the scale of the ear's critical bands. The normal CELT layer uses 21 of those bands, though Opus Custom (see <xref target="opus-custom"/>) may use a different number of bands. A band can contain as little as one MDCT bin per channel, and as many as 176 -bins per channel. +bins per channel, as detailed in <xref target="celt_band_sizes"/>. In each band, the gain (energy) is coded separately from the shape of the spectrum. Coding the gain explicitly makes it easy to preserve the spectral envelope of the signal. The remaining unit-norm shape vector is encoded using a Pyramid Vector Quantizer (PVQ) <xref target='PVQ-decoder'/>. </t> +<texttable anchor="celt_band_sizes" + title="MDCT Bins Per Channel Per Band for Each Frame Size"> +<ttcol>Frame Size:</ttcol> +<ttcol align="right">2.5 ms</ttcol> +<ttcol align="right">5 ms</ttcol> +<ttcol align="right">10 ms</ttcol> +<ttcol align="right">20 ms</ttcol> +<ttcol align="right">Start Frequency</ttcol> +<ttcol align="right">Stop Frequency</ttcol> +<c>Band</c> <c>Bins:</c> <c/> <c/> <c/> <c/> <c/> + <c>0</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>0 Hz</c> <c>200 Hz</c> + <c>1</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>200 Hz</c> <c>400 Hz</c> + <c>2</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>400 Hz</c> <c>600 Hz</c> + <c>3</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>600 Hz</c> <c>800 Hz</c> + <c>4</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>800 Hz</c> <c>1000 Hz</c> + <c>5</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1000 Hz</c> <c>1200 Hz</c> + <c>6</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1200 Hz</c> <c>1400 Hz</c> + <c>7</c> <c>1</c> <c>2</c> <c>4</c> <c>8</c> <c>1400 Hz</c> <c>1600 Hz</c> + <c>8</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>1600 Hz</c> <c>2000 Hz</c> + <c>9</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2000 Hz</c> <c>2400 Hz</c> +<c>10</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2400 Hz</c> <c>2800 Hz</c> +<c>11</c> <c>2</c> <c>4</c> <c>8</c> <c>16</c> <c>2800 Hz</c> <c>3200 Hz</c> +<c>12</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>3200 Hz</c> <c>4000 Hz</c> +<c>13</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>4000 Hz</c> <c>4800 Hz</c> +<c>14</c> <c>4</c> <c>8</c> <c>16</c> <c>32</c> <c>4800 Hz</c> <c>5600 Hz</c> +<c>15</c> <c>6</c> <c>12</c> <c>24</c> <c>48</c> <c>5600 Hz</c> <c>6800 Hz</c> +<c>16</c> <c>6</c> <c>12</c> <c>24</c> <c>48</c> <c>6800 Hz</c> <c>8000 Hz</c> +<c>17</c> <c>8</c> <c>16</c> <c>32</c> <c>64</c> <c>8000 Hz</c> <c>9600 Hz</c> +<c>18</c> <c>12</c> <c>24</c> <c>48</c> <c>96</c> <c>9600 Hz</c> <c>12000 Hz</c> +<c>19</c> <c>18</c> <c>36</c> <c>72</c> <c>144</c> <c>12000 Hz</c> <c>15600 Hz</c> +<c>20</c> <c>22</c> <c>44</c> <c>88</c> <c>176</c> <c>15600 Hz</c> <c>20000 Hz</c> +</texttable> + <t> Transients are notoriously difficult for transform codecs to code. CELT uses two different strategies for them: @@ -5035,11 +5068,13 @@ free to implement the procedure in any way which produces identical results.</t> <t>The per-band gain-shape structure of the CELT layer ensures that using the same number of bits for the spectral shape of a band in every frame will result in a roughly constant signal-to-noise ratio in that band. - This results in a coding noise that has the same spectral envelope as the signal, - as is expected when using a standard psychoacoustic model. This provides a fairly - consistent perceptual performance <xref target='Valin2010'/>. -This structure means that the ideal allocation is more consistent from frame -to frame than it is for other codecs without an equivalent structure.</t> +This results in coding noise that has the same spectral envelope as the signal. +The masking curve produced by a standard psychoacoustic model also closely + follows the spectral envelope of the signal. +This structure means that the ideal allocation is more consistent from frame to + frame than it is for other codecs without an equivalent structure, and that a + fixed allocation provides fairly consistent perceptual + performance <xref target='Valin2010'/>.</t> <t>Many codecs transmit significant amounts of side information to control the bit allocation within a frame.