Commit a3bb5412 authored by Timothy B. Terriberry's avatar Timothy B. Terriberry

Address remaining document shepherd review comments.

Also remove most <preamble>/<postamble> usage for expository text,
 as most places center the result, which looks ugly (only local
 xml2rfc HTML output does not center: tools.ietf.org HTML output
 still does, as does the .txt version).
parent 53b4e5bd
...@@ -71,14 +71,6 @@ This document defines the Ogg encapsulation for the Opus interactive speech and ...@@ -71,14 +71,6 @@ This document defines the Ogg encapsulation for the Opus interactive speech and
audio codec. audio codec.
This allows data encoded in the Opus format to be stored in an Ogg logical This allows data encoded in the Opus format to be stored in an Ogg logical
bitstream. bitstream.
Ogg encapsulation provides Opus with a long-term storage format supporting
all of the essential features, including metadata, fast and accurate seeking,
corruption detection, recapture after errors, low overhead, and the ability to
multiplex Opus with other codecs (including video) with minimal buffering.
It also provides a live streamable format, capable of delivery over a reliable
stream-oriented transport, without requiring all the data, or even the total
length of the data, up-front, in a form that is identical to the on-disk
storage format.
</t> </t>
</abstract> </abstract>
</front> </front>
...@@ -91,6 +83,14 @@ The IETF Opus codec is a low-latency audio codec optimized for both voice and ...@@ -91,6 +83,14 @@ The IETF Opus codec is a low-latency audio codec optimized for both voice and
See <xref target="RFC6716"/> for technical details. See <xref target="RFC6716"/> for technical details.
This document defines the encapsulation of Opus in a continuous, logical Ogg This document defines the encapsulation of Opus in a continuous, logical Ogg
bitstream&nbsp;<xref target="RFC3533"/>. bitstream&nbsp;<xref target="RFC3533"/>.
Ogg encapsulation provides Opus with a long-term storage format supporting
all of the essential features, including metadata, fast and accurate seeking,
corruption detection, recapture after errors, low overhead, and the ability to
multiplex Opus with other codecs (including video) with minimal buffering.
It also provides a live streamable format, capable of delivery over a reliable
stream-oriented transport, without requiring all the data, or even the total
length of the data, up-front, in a form that is identical to the on-disk
storage format.
</t> </t>
<t> <t>
Ogg bitstreams are made up of a series of 'pages', each of which contains data Ogg bitstreams are made up of a series of 'pages', each of which contains data
...@@ -144,8 +144,6 @@ An Ogg Opus stream is organized as follows. ...@@ -144,8 +144,6 @@ An Ogg Opus stream is organized as follows.
</t> </t>
<t> <t>
There are two mandatory header packets. There are two mandatory header packets.
</t>
<t>
The first packet in the logical Ogg bitstream MUST contain the identification The first packet in the logical Ogg bitstream MUST contain the identification
(ID) header, which uniquely identifies a stream as Opus audio. (ID) header, which uniquely identifies a stream as Opus audio.
The format of this header is defined in <xref target="id_header"/>. The format of this header is defined in <xref target="id_header"/>.
...@@ -173,8 +171,8 @@ The value N is specified in the ID header (see ...@@ -173,8 +171,8 @@ The value N is specified in the ID header (see
logical Ogg bitstream. logical Ogg bitstream.
</t> </t>
<t> <t>
The first N-1 Opus packets, if any, are packed one after another into the Ogg The first (N&nbsp;-&nbsp;1) Opus packets, if any, are packed one after another
packet, using the self-delimiting framing from Appendix&nbsp;B of into the Ogg packet, using the self-delimiting framing from Appendix&nbsp;B of
<xref target="RFC6716"/>. <xref target="RFC6716"/>.
The remaining Opus packet is packed at the end of the Ogg packet using the The remaining Opus packet is packed at the end of the Ogg packet using the
regular, undelimited framing from Section&nbsp;3 of <xref target="RFC6716"/>. regular, undelimited framing from Section&nbsp;3 of <xref target="RFC6716"/>.
...@@ -224,8 +222,8 @@ That is, the first page in the logical stream, and the last header ...@@ -224,8 +222,8 @@ That is, the first page in the logical stream, and the last header
The granule position of an audio data page encodes the total number of PCM The granule position of an audio data page encodes the total number of PCM
samples in the stream up to and including the last fully-decodable sample from samples in the stream up to and including the last fully-decodable sample from
the last packet completed on that page. the last packet completed on that page.
That granule position MAY be larger than zero as described in The granule position of the first audio data page MAY be larger than zero as
<xref target="start_granpos_restrictions"/>. described in <xref target="start_granpos_restrictions"/>.
</t> </t>
<t> <t>
...@@ -273,6 +271,11 @@ For this to work, there cannot be any gaps. ...@@ -273,6 +271,11 @@ For this to work, there cannot be any gaps.
In order to support capturing a real-time stream that has lost or not In order to support capturing a real-time stream that has lost or not
transmitted packets, a muxer SHOULD emit packets that explicitly request the transmitted packets, a muxer SHOULD emit packets that explicitly request the
use of Packet Loss Concealment (PLC) in place of the missing packets. use of Packet Loss Concealment (PLC) in place of the missing packets.
Implementations that fail to do so still MUST NOT increment the granule
position for a page by anything other than the number of samples contained in
packets that actually complete on that page.
</t>
<t>
Only gaps that are a multiple of 2.5&nbsp;ms are repairable, as these are the Only gaps that are a multiple of 2.5&nbsp;ms are repairable, as these are the
only durations that can be created by packet loss or discontinuous only durations that can be created by packet loss or discontinuous
transmission. transmission.
...@@ -406,32 +409,30 @@ In this case, a value of at least 3840&nbsp;samples (80&nbsp;ms) provides ...@@ -406,32 +409,30 @@ In this case, a value of at least 3840&nbsp;samples (80&nbsp;ms) provides
<section anchor="pcm_sample_position" title="PCM Sample Position"> <section anchor="pcm_sample_position" title="PCM Sample Position">
<t> <t>
<figure align="center">
<preamble>
The PCM sample position is determined from the granule position using the The PCM sample position is determined from the granule position using the
formula formula
</preamble> </t>
<figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
'PCM sample position' = 'granule position' - 'pre-skip' . 'PCM sample position' = 'granule position' - 'pre-skip' .
]]></artwork> ]]></artwork>
</figure> </figure>
</t>
<t> <t>
For example, if the granule position of the first audio data page is 59,971, For example, if the granule position of the first audio data page is 59,971,
and the pre-skip is 11,971, then the PCM sample position of the last decoded and the pre-skip is 11,971, then the PCM sample position of the last decoded
sample from that page is 48,000. sample from that page is 48,000.
<figure align="center"> </t>
<preamble> <t>
This can be converted into a playback time using the formula This can be converted into a playback time using the formula
</preamble> </t>
<figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
'PCM sample position' 'PCM sample position'
'playback time' = --------------------- . 'playback time' = --------------------- .
48000.0 48000.0
]]></artwork> ]]></artwork>
</figure> </figure>
</t>
<t> <t>
The initial PCM sample position before any samples are played is normally '0'. The initial PCM sample position before any samples are played is normally '0'.
...@@ -691,17 +692,14 @@ This is a gain to be applied when decoding. ...@@ -691,17 +692,14 @@ This is a gain to be applied when decoding.
It is 20*log10 of the factor by which to scale the decoder output to achieve It is 20*log10 of the factor by which to scale the decoder output to achieve
the desired playback volume, stored in a 16-bit, signed, two's complement the desired playback volume, stored in a 16-bit, signed, two's complement
fixed-point value with 8 fractional bits (i.e., Q7.8). fixed-point value with 8 fractional bits (i.e., Q7.8).
<figure align="center"> <vspace blankLines="1"/>
<preamble>
To apply the gain, an implementation could use To apply the gain, an implementation could use
</preamble> <figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
sample *= pow(10, output_gain/(20.0*256)) , sample *= pow(10, output_gain/(20.0*256)) ,
]]></artwork> ]]></artwork>
<postamble>
where output_gain is the raw 16-bit value from the header.
</postamble>
</figure> </figure>
where output_gain is the raw 16-bit value from the header.
<vspace blankLines="1"/> <vspace blankLines="1"/>
Virtually all players and media frameworks SHOULD apply it by default. Virtually all players and media frameworks SHOULD apply it by default.
If a player chooses to apply any volume adjustment or gain modification, such If a player chooses to apply any volume adjustment or gain modification, such
...@@ -751,17 +749,16 @@ Future versions of this specification, even backwards-compatible versions, ...@@ -751,17 +749,16 @@ Future versions of this specification, even backwards-compatible versions,
might include additional fields in the ID header. might include additional fields in the ID header.
If an ID header has a compatible major version, but a larger minor version, If an ID header has a compatible major version, but a larger minor version,
an implementation MUST NOT reject it for containing additional data not an implementation MUST NOT reject it for containing additional data not
specified here. specified here, unless it contains so much additional data that it does not
However, implementations MAY reject streams in which the ID header does not
complete on the first page. complete on the first page.
</t> </t>
<section anchor="channel_mapping" title="Channel Mapping"> <section anchor="channel_mapping" title="Channel Mapping">
<t> <t>
An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly
larger number of decoded channels (M+N) to yet another number of output larger number of decoded channels (M&nbsp;+&nbsp;N) to yet another number of
channels (C), which might be larger or smaller than the number of decoded output channels (C), which might be larger or smaller than the number of
channels. decoded channels.
The order and meaning of these channels are defined by a channel mapping, The order and meaning of these channels are defined by a channel mapping,
which consists of the 'channel mapping family' octet and, for channel mapping which consists of the 'channel mapping family' octet and, for channel mapping
families other than family&nbsp;0, a channel mapping table, as illustrated in families other than family&nbsp;0, a channel mapping table, as illustrated in
...@@ -825,7 +822,8 @@ For channel mapping family&nbsp;0, this value defaults to (C&nbsp;-&nbsp;1) ...@@ -825,7 +822,8 @@ For channel mapping family&nbsp;0, this value defaults to (C&nbsp;-&nbsp;1)
This contains one octet per output channel, indicating which decoded channel This contains one octet per output channel, indicating which decoded channel
is to be used for each one. is to be used for each one.
Let 'index' be the value of this octet for a particular output channel. Let 'index' be the value of this octet for a particular output channel.
This value MUST either be smaller than (M+N), or be the special value 255. This value MUST either be smaller than (M&nbsp;+&nbsp;N), or be the special
value 255.
If 'index' is less than 2*M, the output MUST be taken from decoding stream If 'index' is less than 2*M, the output MUST be taken from decoding stream
('index'/2) as stereo and selecting the left channel if 'index' is even, and ('index'/2) as stereo and selecting the left channel if 'index' is even, and
the right channel if 'index' is odd. the right channel if 'index' is odd.
...@@ -834,7 +832,7 @@ If 'index' is 2*M or larger, but less than 255, the output MUST be taken from ...@@ -834,7 +832,7 @@ If 'index' is 2*M or larger, but less than 255, the output MUST be taken from
If 'index' is 255, the corresponding output channel MUST contain pure silence. If 'index' is 255, the corresponding output channel MUST contain pure silence.
<vspace blankLines="1"/> <vspace blankLines="1"/>
The number of output channels, C, is not constrained to match the number of The number of output channels, C, is not constrained to match the number of
decoded channels (M+N). decoded channels (M&nbsp;+&nbsp;N).
A single index value MAY appear multiple times, i.e., the same decoded channel A single index value MAY appear multiple times, i.e., the same decoded channel
might be mapped to multiple output channels. might be mapped to multiple output channels.
Some decoded channels might not be assigned to any output channel, as well. Some decoded channels might not be assigned to any output channel, as well.
...@@ -973,7 +971,7 @@ R output = ( 0.414214 * center + 0.585786 * right ) ...@@ -973,7 +971,7 @@ R output = ( 0.414214 * center + 0.585786 * right )
]]></artwork> ]]></artwork>
<postamble> <postamble>
Exact coefficient values are 1 and 1/sqrt(2), multiplied by Exact coefficient values are 1 and 1/sqrt(2), multiplied by
1/(1 + 1/sqrt(2)) for normalization. 1/(1&nbsp;+&nbsp;1/sqrt(2)) for normalization.
</postamble> </postamble>
</figure> </figure>
...@@ -1212,35 +1210,33 @@ The user comment strings follow the NAME=value format described by ...@@ -1212,35 +1210,33 @@ The user comment strings follow the NAME=value format described by
Two new comment tags are introduced here: Two new comment tags are introduced here:
</t> </t>
<t>First, an optional gain for track nomalization:</t>
<figure align="center"> <figure align="center">
<preamble>An optional gain for track nomalization</preamble>
<artwork align="left"><![CDATA[ <artwork align="left"><![CDATA[
R128_TRACK_GAIN=-573 R128_TRACK_GAIN=-573
]]></artwork> ]]></artwork>
<postamble> </figure>
representing the volume shift needed to normalize the track's volume <t>
representing the volume shift needed to normalize the track's volume
during isolated playback, in random shuffle, and so on. during isolated playback, in random shuffle, and so on.
The gain is a Q7.8 fixed point number in dB, as in the ID header's 'output The gain is a Q7.8 fixed point number in dB, as in the ID header's 'output
gain' field. gain' field.
</postamble>
</figure>
<t>
This tag is similar to the REPLAYGAIN_TRACK_GAIN tag in This tag is similar to the REPLAYGAIN_TRACK_GAIN tag in
Vorbis&nbsp;<xref target="replay-gain"/>, except that the normal volume Vorbis&nbsp;<xref target="replay-gain"/>, except that the normal volume
reference is the <xref target="EBU-R128"/> standard. reference is the <xref target="EBU-R128"/> standard.
</t> </t>
<t>Second, an optional gain for album nomalization:</t>
<figure align="center"> <figure align="center">
<preamble>An optional gain for album nomalization</preamble>
<artwork align="left"><![CDATA[ <artwork align="left"><![CDATA[
R128_ALBUM_GAIN=111 R128_ALBUM_GAIN=111
]]></artwork> ]]></artwork>
<postamble> </figure>
representing the volume shift needed to normalize the overall volume when <t>
representing the volume shift needed to normalize the overall volume when
played as part of a particular collection of tracks. played as part of a particular collection of tracks.
The gain is also a Q7.8 fixed point number in dB, as in the ID header's The gain is also a Q7.8 fixed point number in dB, as in the ID header's
'output gain' field. 'output gain' field.
</postamble> </t>
</figure>
<t> <t>
An Ogg Opus stream MUST NOT have more than one of each tag, and if present An Ogg Opus stream MUST NOT have more than one of each tag, and if present
their values MUST be an integer from -32768 to 32767, inclusive, their values MUST be an integer from -32768 to 32767, inclusive,
...@@ -1339,11 +1335,11 @@ This gives a size of 61,310&nbsp;octets, which is rounded up to a multiple of ...@@ -1339,11 +1335,11 @@ This gives a size of 61,310&nbsp;octets, which is rounded up to a multiple of
When encoding Opus streams, Ogg muxers SHOULD take into account the When encoding Opus streams, Ogg muxers SHOULD take into account the
algorithmic delay of the Opus encoder. algorithmic delay of the Opus encoder.
</t> </t>
<figure align="center"> <t>
<preamble>
In encoders derived from the reference implementation, the number of In encoders derived from the reference implementation, the number of
samples can be queried with: samples can be queried with:
</preamble> </t>
<figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples)); opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples));
]]></artwork> ]]></artwork>
...@@ -1373,12 +1369,12 @@ When extending the end of the signal, order-N (typically with N ranging from 8 ...@@ -1373,12 +1369,12 @@ When extending the end of the signal, order-N (typically with N ranging from 8
The last N samples are used as memory to an infinite impulse response (IIR) The last N samples are used as memory to an infinite impulse response (IIR)
filter. filter.
</t> </t>
<figure align="center"> <t>
<preamble>
The filter is then applied on a zero input to extrapolate the end of the signal. The filter is then applied on a zero input to extrapolate the end of the signal.
Let a(k) be the kth LPC coefficient and x(n) be the nth sample of the signal, Let a(k) be the kth LPC coefficient and x(n) be the nth sample of the signal,
each new sample past the end of the signal is computed as: each new sample past the end of the signal is computed as:
</preamble> </t>
<figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
N N
--- ---
...@@ -1422,19 +1418,19 @@ De-emphasis is allowed.</t> ...@@ -1422,19 +1418,19 @@ De-emphasis is allowed.</t>
the encoder.</t> the encoder.</t>
</list> </list>
</t> </t>
<figure align="center"> <t>
<preamble>
In encoders derived from the reference implementation, inter-frame prediction In encoders derived from the reference implementation, inter-frame prediction
can be turned off by calling: can be turned off by calling:
</preamble> </t>
<figure align="center">
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
opus_encoder_ctl(encoder_state, OPUS_SET_PREDICTION_DISABLED(1)); opus_encoder_ctl(encoder_state, OPUS_SET_PREDICTION_DISABLED(1));
]]></artwork> ]]></artwork>
<postamble> </figure>
<t>
For best results, this implementation requires that prediction be explicitly For best results, this implementation requires that prediction be explicitly
enabled again before resuming normal encoding, even after a reset. enabled again before resuming normal encoding, even after a reset.
</postamble> </t>
</figure>
</section> </section>
...@@ -1485,19 +1481,19 @@ An "Ogg Opus file" consists of one or more sequentially multiplexed segments, ...@@ -1485,19 +1481,19 @@ An "Ogg Opus file" consists of one or more sequentially multiplexed segments,
The RECOMMENDED mime-type for Ogg Opus files is "audio/ogg". The RECOMMENDED mime-type for Ogg Opus files is "audio/ogg".
</t> </t>
<figure> <t>
<preamble>
If more specificity is desired, one MAY indicate the presence of Opus streams If more specificity is desired, one MAY indicate the presence of Opus streams
using the codecs parameter defined in <xref target="RFC6381"/> and using the codecs parameter defined in <xref target="RFC6381"/> and
<xref target="RFC5334"/>, e.g., <xref target="RFC5334"/>, e.g.,
</preamble> </t>
<figure>
<artwork align="center"><![CDATA[ <artwork align="center"><![CDATA[
audio/ogg; codecs=opus audio/ogg; codecs=opus
]]></artwork> ]]></artwork>
<postamble>
for an Ogg Opus file.
</postamble>
</figure> </figure>
<t>
for an Ogg Opus file.
</t>
<t> <t>
The RECOMMENDED filename extension for Ogg Opus files is '.opus'. The RECOMMENDED filename extension for Ogg Opus files is '.opus'.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment