a1-encapsulation-ogg.tex 6.3 KB
 Ralph Giles committed Mar 06, 2009 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- %!TEX root = Vorbis_I_spec.tex \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} \subsection{Overview} This document describes using Ogg logical and physical transport streams to encapsulate Vorbis compressed audio packet data into file form. The \xref{vorbis:spec:intro} provides an overview of the construction of Vorbis audio packets. The \href{oggstream.html}{Ogg bitstream overview} and \href{framing.html}{Ogg logical bitstream and framing spec} provide detailed descriptions of Ogg transport streams. This specification document assumes a working knowledge of the concepts covered in these named backround documents. Please read them first. \subsubsection{Restrictions} The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis streams use Ogg transport streams in degenerate, unmultiplexed form only. That is: \begin{itemize} \item A meta-headerless Ogg file encapsulates the Vorbis I packets \item The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). \item The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) \end{itemize} This is not to say that it is not currently possible to multiplex Vorbis with other media types into a multi-stream Ogg file. At the time this document was written, Ogg was becoming a popular container for low-bitrate movies consisting of DivX video and Vorbis audio. However, a 'Vorbis I audio file' is taken to imply Vorbis audio existing alone within a degenerate Ogg stream. A compliant 'Vorbis audio player' is not required to implement Ogg support beyond the specific support of Vorbis within a degenrate Ogg stream (naturally, application authors are encouraged to support full multiplexed Ogg handling). \subsubsection{MIME type} The MIME type of Ogg files depend on the context. Specifically, complex multimedia and applications should use \literal{application/ogg}, while visual media should use \literal{video/ogg}, and audio \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear in any of those types. RTP encapsulated Vorbis should use \literal{audio/vorbis} + \literal{audio/vorbis-config}. \subsection{Encapsulation} Ogg encapsulation of a Vorbis packet stream is straightforward. \begin{itemize} \item The first Vorbis packet (the identification header), which uniquely identifies a stream as Vorbis audio, is placed alone in the first page of the logical Ogg stream. This results in a first Ogg page of exactly 58 bytes at the very beginning of the logical stream. \item This first page is marked 'beginning of stream' in the page flags. \item The second and third vorbis packets (comment and setup headers) may span one or more pages beginning on the second page of the logical stream. However many pages they span, the third header packet finishes the page on which it ends. The next (first audio) packet must begin on a fresh page. \item The granule position of these first pages containing only headers is zero. \item The first audio packet of the logical stream begins a fresh Ogg page. \item Packets are placed into ogg pages in order until the end of stream. \item The last page is marked 'end of stream' in the page flags. \item Vorbis packets may span page boundaries. \item The granule position of pages containing Vorbis audio is in units of PCM audio samples (per channel; a stereo stream's granule position does not increment at twice the speed of a mono stream). \item The granule position of a page represents the end PCM sample position of the last packet \emph{completed} on that page. The 'last PCM sample' is the last complete sample returned by decode, not an internal sample awaiting lapping with a subsequent block. A page that is entirely spanned by a single packet (that completes on a subsequent page) has no granule position, and the granule position is set to '-1'. Note that the last decoded (fully lapped) PCM sample from a packet is not necessarily the middle sample from that block. If, eg, the current Vorbis packet encodes a "long block" and the next Vorbis packet encodes a "short block", the last decodable sample from the current packet be at position (3*long\_block\_length/4) - (short\_block\_length/4). \item The granule (PCM) position of the first page need not indicate that the stream started at position zero. Although the granule position belongs to the last completed packet on the page and a valid granule position must be positive, by inference it may indicate that the PCM position of the beginning of audio is positive or negative. \begin{itemize} \item A positive starting value simply indicates that this stream begins at some positive time offset, potentially within a larger program. This is a common case when connecting to the middle of broadcast stream. \item A negative value indicates that output samples preceeding time zero should be discarded during decoding; this technique is used to allow sample-granularity editing of the stream start time of already-encoded Vorbis streams. The number of samples to be discarded must not exceed the overlap-add span of the first two audio packets. \end{itemize} In both of these cases in which the initial audio PCM starting offset is nonzero, the second finished audio packet must flush the page on which it appears and the third packet begin a fresh page. This allows the decoder to always be able to perform PCM position adjustments before needing to return any PCM data from synthesis, resulting in correct positioning information without any aditional seeking logic. \begin{note} Failure to do so should, at worst, cause a decoder implementation to return incorrect positioning information for seeking operations at the very beginning of the stream. \end{note} \item A granule position on the final page in a stream that indicates less audio data than the final packet would normally return is used to end the stream on other than even frame boundaries. The difference between the actual available data returned and the declared amount indicates how many trailing samples to discard from the decoding process. \end{itemize}