Commit 7e111cbe authored by Ralph Giles's avatar Ralph Giles
Browse files

Remove the original html documentation. The docbook version is now on

par or better. Don't forget to make the docs before you roll a
distribution.

svn path=/trunk/vorbis/; revision=4074
parent 7bebb37d
......@@ -42,14 +42,6 @@ doc_DATA = components.png \
vorbis-clip.txt \
vorbis-errors.txt \
vorbis-fidelity.html \
vorbis-ogg.html \
vorbis-spec-bitpack.html \
vorbis-spec-codebook.html \
vorbis-spec-floor0.html \
vorbis-spec-floor1.html \
vorbis-spec-intro.html \
vorbis-spec-ref.html \
vorbis-spec-res.html \
vorbis.html \
vorbisword2.png \
wait.png \
......
......@@ -16,17 +16,13 @@ Ogg Vorbis Documentation
<li><a href="stereo.html">Vorbis channel coupling and stereo-specific application</a>
</ul>
<h2>Ogg Vorbis I specification documents</h2>
<h2>Ogg Vorbis I specification</h2>
<ul>
<li><a href="vorbis-spec-intro.html">Vorbis specification introduction and description</a>
<li><a href="vorbis-spec-bitpack.html">Vorbis bitpacking convention</a>
<li><a href="vorbis-spec-codebook.html">Vorbis probability model and codebooks</a>
<li><a href="vorbis-spec-ref.html">Vorbis codec setup and packet decode</a>
<li>Vorbis I specification
[<a href="Vorbis_I_spec.html">html</a>]
[<a href="Vorbis_I_spec.pdf">pdf</a>]
<li><a href="v-comment.html">Vorbis comment header specification</a>
<li><a href="vorbis-spec-floor0.html">Vorbis floor type 0 setup and decode</a>
<li><a href="vorbis-spec-floor1.html">Vorbis floor type 1 setup and decode</a>
<li><a href="vorbis-spec-res.html">Vorbis residue types 0, 1 and 2 setup and decode</a>
<li><a href="vorbis-ogg.html">Embedding Vorbis encoded audio in an Ogg bitstream</a>
<li><a href="draft-moffitt-vorbis-rtp-00.txt">Embedding Vorbis encoded audio in an RTP payload format</a>
</ul>
......
<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
<nobr><img src="white-ogg.png"><img src="vorbisword2.png"></nobr><p>
<h1><font color=#000070>
Ogg Vorbis I format specification: embedding Vorbis into an Ogg stream
</font></h1>
<em>Last update to this document: July 14, 2002</em><br>
<h1>Overview</h1>
This document describes using Ogg logical and physical transport
streams to encapsulate Vorbis compressed audio packet data into file
form.<p>
_<a href="vorbis-spec-intro.html">Ogg Vorbis I format specification:
high-level description</a>_ provides an overview of the construction
of Vorbis audio packets.<p> The _<a href="oggstream.html">Ogg
bitstream overview</a>_ and <a href="framing.html">Ogg logical
bitstream and framing spec</a>_ provide detailed descriptions of Ogg
transport streams. This specification document assumes a working
knowledge of the concepts covered in these named backround
documents. Please read them first.<p>
<h2>Restrictions</h2>
The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
streams use Ogg transport streams in degenerate, unmultiplexed
form only. That is:
<ul>
<li>A meta-headerless Ogg file encapsulates the Vorbis I packets
<li>The Ogg stream may be chained, i.e. contain multiple, contigous logical streams (links).
<li>The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
</ul>
This is not to say that it is not currently possible to multiplex
Vorbis with other media types into a multi-stream Ogg file. At the
time this document was written, Ogg was becoming a popular container
for low-bitrate movies consisting of DiVX video and Vorbis audio.
However, a 'Vorbis I audio file' is taken to imply Vorbis audio
existing alone within a degenerate Ogg stream. A compliant 'Vorbis
audio player' is not required to implement Ogg support beyond the
specific support of Vorbis within a degenrate ogg stream (naturally,
application authors are encouraged to support full multiplexed Ogg
handling).
<p>
<h2>MIME type</h2>
The correct MIME type of any Ogg file is <tt>application/x-ogg</tt>.
However, if a file is a Vorbis I audio file (which implies a
degenerate Ogg stream including only unmultiplexed Vorbis audio), the
mime type <tt>audio/x-vorbis</tt> is also allowed.
<h1>Encapsulation</h1>
Ogg encapsulation of a Vorbis packet stream is straightforward.<p>
<ul>
<li>The first Vorbis packet [the indentification header], which
uniquely identifies a stream as Vorbis audio, is placed alone in the
first page of the logical Ogg stream. This results in a first Ogg
page of exactly 58 bytes at the very beginning of the logical stream.<p>
<li>This first page is marked 'beginning of stream' in the page flags.<p>
<li>The second and third vorbis packets [comment and setup
headers] may span one or more pages beginning on the second page of
the logical stream. However many pages they span, the third header
packet finishes the page on which it ends. The next [first audio] packet
must begin on a fresh page.<p>
<li>The granule position of these first pages containing only headers is
zero.<p>
<li>The first audio packet of the logical stream begins a fresh Ogg page.<p>
<li>Packets are placed into ogg pages in order until the end of stream.<p>
<li>The last page is marked 'end of stream' in the page flags.<p>
<li>Vorbis packets may span page boundaries. <p>
<li>The granule position of pages containing Vorbis audio is in units
of PCM audio samples (per channel; a stereo stream's granule position
does not increment at twice the speed of a mono stream).<p>
<li>The granule position of a page represents the end PCM sample
position of the last packet <em>completed</em> on that page. A page
that is entirely spanned by a single packet (that completes on a
subsequent page) has no granule position, and the granule position is
set to '-1'.<p>
<li>The granule (PCM) position of the first page need not indicate
that the stream started at position zero. Although the granule
position belongs to the last completed packet on the page and a
valid granule position must be positive, by
inference it may indicate that the PCM position of the beginning
of audio is positive or negative.<p>
<ul>
<li>A positive starting value simply indicates that this stream begins at
some positive time offset, potentially within a larger
program. This is a common case when connecting to the middle
of broadcast stream.<p> <li>A negative value indicates that
output samples preceeding time zero should be discarded during
decoding; this technique is used to allow sample-granularity
editing of the stream start time of already-encoded Vorbis
streams. The number of samples to be discarded must not exceed
the overlap-add span of the first two audio packets.<p>
</uL>
In both of these cases in which the initial audio PCM starting
offset is nonzero, the second finished audio packet must flush the
page on which it appears and the third packet begin a fresh page.
This allows the decoder to always be able to perform PCM position
adjustments before needing to return any PCM data from synthesis,
resulting in correct positioning information without any aditional
seeking logic.<p>
(Note however that failure to do so should, at worst, cause a
decoder implementation to return incorrect positioning information
for seeking operations at the very beginning of the stream.)<p>
<li> A granule position on the final page in a stream that indicates
less audio data than the final packet would normally return is used to
end the stream on other than even frame boundaries. The difference
between the actual available data returned and the declared amount
indicates how many trailing samples to discard from the decoding
process.<p>
</ul>
<hr>
<a href="http://www.xiph.org/">
<img src="white-xifish.png" align=left border=0>
</a>
<font size=-2 color=#505050>
Ogg is a <a href="http://www.xiph.org">Xiph.org Foundation</a> effort
to protect essential tenets of Internet multimedia from corporate
hostage-taking; Open Source is the net's greatest tool to keep
everyone honest. See <a href="http://www.xiph.org/about.html">About
the Xiph.org Foundation</a> for details.
<p>
Ogg Vorbis is the first Ogg audio CODEC. Anyone may freely use and
distribute the Ogg and Vorbis specification, whether in a private,
public or corporate capacity. However, the Xiph.org Foundation and
the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis
specification and certify specification compliance.<p>
Xiph.org's Vorbis software CODEC implementation is distributed under a
BSD-like license. This does not restrict third parties from
distributing independent implementations of Vorbis software under
other licenses.<p>
Ogg, Vorbis, Xiph.org Foundation and their logos are trademarks (tm)
of the <a href="http://www.xiph.org/">Xiph.org Foundation</a>. These
pages are copyright (C) 1994-2002 Xiph.org Foundation. All rights
reserved.<p>
</body>
<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
<nobr><img src="white-ogg.png"><img src="vorbisword2.png"></nobr><p>
<h1><font color=#000070>
Ogg Vorbis I format specification: bitpacking convention
</font></h1>
<em>Last update to this document: July 14, 2002</em><br>
<h1>Overview</h1>
The Vorbis codec uses relatively unstructured raw packets containing
arbitrary-width binary integer fields. Logically, these packets are a
bitstream in which bits are coded one-by-one by the encoder and then
read one-by-one in the same monotonically increasing order by the
decoder. Most current binary storage arrangements group bits into a
native word size of eight bits (octets), sixteen bits, thirty-two bits
or, less commonly other fixed word sizes. The Vorbis bitpacking
convention specifies the correct mapping of the logical packet
bitstream into an actual representation in fixed-width words.
<h2>octets, bytes and words</h2>
In most contemporary architectures, a 'byte' is synonymous with an
'octet', that is, eight bits. This has not always been the case;
seven, ten, eleven and sixteen bit 'bytes' have been used. For
purposes of the bitpacking convention, a byte implies the native,
smallest integer storage representation offered by a platform. On
modern platforms, this is generally assumed to be eight bits (not
necessarily because of the processor but because of the
filesystem/memory architecture. Modern filesystems invariably offer
bytes as the fundamental atom of storage). A 'word' is an integer
size that is a grouped multiple of this smallest size.<p>
The most ubiquitous architectures today consider a 'byte' to be an
octet (eight bits) and a word to be a group of two, four or eight
bytes (16,32 or 64 bits). Note however that the Vorbis bitpacking
convention is still well defined for any native byte size; Vorbis uses
the native bit-width of a given storage system. This document assumes
that a byte is one octet for purposes of example.<p>
<h2>bit order</h2>
A byte has a well-defined 'least significant' bit [LSb], which is the
only bit set when the byte is storing the two's complement integer
value +1. A byte's 'most significant' bit [MSb] is at the opposite
end of the byte. Bits in a byte are numbered from zero at the LSb to
<i>n</i> (<i>n</i>=7 in an octet) for the MSb.<n>
<h2>byte order</h2>
Words are native groupings of multiple bytes. Several byte orderings
are possible in a word; the common ones are 3-2-1-0 ('big endian' or
'most significant byte first' in which the highest-valued byte comes
first), 0-1-2-3 ('little endian' or 'least significant byte first' in
which the lowest value byte comes first) and less commonly 3-1-2-0 and
0-2-1-3 ('mixed endian').<p>
The Vorbis bitpacking convention specifies storage and bitstream
manipulation at the byte, not word, level, thus host word ordering is
of a concern only during optimization when writing high performance
code that operates on a word of storage at a time rather than by byte.
Logically, bytes are always coded and decoded in order from byte zero
through byte <em>n</em>.<p>
<h2>coding bits into byte sequences</h2>
The Vorbis codec has need to code arbitrary bit-width integers, from
zero to 32 bits wide, into packets. These integer fields are not
aligned to the boundaries of the byte representation; the next field
is written at the bit position that the previous field ends.<p>
The encoder logically packs integers by writing the LSb of an binary
integer to the logical bitstream first, followed by next least
significant bit, etc, until the requested number of bits have been
coded. When packing the bits into bytes, the encoder begins by
placing the LSb of the integer to be written into the least
significant unused bit position of the destination byte, followed by
the next-least significant bit of the source integer and so on up to
the requested number of bits. When all bits of the destination byte
have been filled, encoding continues by zeroing all bits of the next
byte and writing the next bit into the bit position 0 of that byte.
Decoding follows the same process as encoding, but by reading bits
from the byte stream and reassembling them into integers.<p>
<h2>signedness</h2>
The signedness of a specific number resulting from decode is to be
interpreted by the decoder given decode context. That is, the three
bit binary pattern 'b111' can be taken to represent either 'seven' as
an unsigned integer, or '-1' as a signed, two's complement integer.
The encoder and decoder are responsible for knowing if fields are to
be treated as signed or unsigned.
<h2>coding example</h2>
Code the 4 bit integer value '12' [b1100] into an empty bytestream.
Bytestream result:
<pre>
|
V
7 6 5 4 3 2 1 0
byte 0 [0 0 0 0 1 1 0 0] <-
byte 1 [ ]
byte 2 [ ]
byte 3 [ ]
...
byte n [ ] bytestream length == 1 byte
</pre>
Continue by coding the 3 bit integer value '-1' [b111]:
<pre>
|
V
7 6 5 4 3 2 1 0
byte 0 [0 1 1 1 1 1 0 0] <-
byte 1 [ ]
byte 2 [ ]
byte 3 [ ]
...
byte n [ ] bytestream length == 1 byte
</pre>
Continue by coding the 7 bit integer value '17' [b0010001]:
<pre>
|
V
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0]
byte 1 [0 0 0 0 1 0 0 0] <-
byte 2 [ ]
byte 3 [ ]
...
byte n [ ] bytestream length == 2 bytes
bit cursor == 6
</pre>
Continue by coding the 13 bit integer value '6969' [b110 11001110 01]:
<pre>
|
V
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0]
byte 1 [0 1 0 0 1 0 0 0]
byte 2 [1 1 0 0 1 1 1 0]
byte 3 [0 0 0 0 0 1 1 0] <-
...
byte n [ ] bytestream length == 4 bytes
</pre>
<h2>decoding example</h2>
Reading from the beginning of the bytestream encoded in the above example:
<pre>
|
V
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0] <-
byte 1 [0 1 0 0 1 0 0 0]
byte 2 [1 1 0 0 1 1 1 0]
byte 3 [0 0 0 0 0 1 1 0] bytestream length == 4 bytes
</pre>
We read two, two-bit integer fields, resulting in the returned numbers
'b00' and 'b11'. Two things are worth noting here:
<ul>
<li>Although these four bits were originally written as a single four-bit
integer, reading some other combination of bit-widths from the
bitstream is well defined. There are no artificial alignment
boundaries maintained in the bitstream. <li>The second value is the
two-bit-wide integer 'b11'. This value may be interpreted either as
the unsigned value '3', or the signed value '-1'. Signedness is
dependent on decode context.
</uL>
<h2>end-of-packet alignment</h2>
The typical use of bitpacking is to produce many independent
byte-aligned packets which are embedded into a larger byte-aligned
container structure, such as an Ogg transport bitstream. Externally,
each bytestream (encoded bitstream) must begin and end on a byte
boundary. Often, the encoded bitstream is not an integer number of
bytes, and so there is unused (uncoded) space in the last byte of a
packet.<p>
Unused space in the last byte of a bytestream is always zeroed during
the coding process. Thus, should this unused space be read, it will
return binary zeroes.<p>
Attempting to read past the end of an encoded packet results in an
'end-of-packet' condition. End-of-packet is not to be considered an
error; it is merely a state indicating that there is insufficient
remaining data to fulfill the desired read size. Vorbis uses truncated
packets as a normal mode of operation, and as such, decoders must
handle reading past the end of a packet as a typical mode of
operation. Any further read operations after an 'end-of-packet'
condition shall also return 'end-of-packet'.<p>
<h2> reading zero bits</h2>
Reading a zero-bit-wide integer returns the value '0' and does not
increment the stream cursor. Reading to the end of the packet (but
not past, such that an 'end-of-packet' condition has not triggered)
and then reading a zero bit integer shall succeed, returning 0, and
not trigger an end-of-packet condition. Reading a zero-bit-wide
integer after a previous read sets 'end-of-packet' shall also fail
with 'end-of-packet'<p>
<hr>
<a href="http://www.xiph.org/">
<img src="white-xifish.png" align=left border=0>
</a>
<font size=-2 color=#505050>
Ogg is a <a href="http://www.xiph.org">Xiph.org Foundation</a> effort
to protect essential tenets of Internet multimedia from corporate
hostage-taking; Open Source is the net's greatest tool to keep
everyone honest. See <a href="http://www.xiph.org/about.html">About
the Xiph.org Foundation</a> for details.
<p>
Ogg Vorbis is the first Ogg audio CODEC. Anyone may freely use and
distribute the Ogg and Vorbis specification, whether in a private,
public or corporate capacity. However, the Xiph.org Foundation and
the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis
specification and certify specification compliance.<p>
Xiph.org's Vorbis software CODEC implementation is distributed under a
BSD-like license. This does not restrict third parties from
distributing independent implementations of Vorbis software under
other licenses.<p>
Ogg, Vorbis, Xiph.org Foundation and their logos are trademarks (tm)
of the <a href="http://www.xiph.org/">Xiph.org Foundation</a>. These
pages are copyright (C) 1994-2002 Xiph.org Foundation. All rights
reserved.<p>
</body>
<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
<nobr><img src="white-ogg.png"><img src="vorbisword2.png"></nobr><p>
<h1><font color=#000070>
Ogg Vorbis I format specification: probability model and codebooks
</font></h1>
<em>Last update to this document: August 8, 2002</em><br>
<h1>Overview</h1>
Unlike practically every other mainstream audio codec, Vorbis has no
statically configured probability model, instead packing all entropy
decoding configuration, VQ and Huffman, into the bitstream itself in
the third header, the codec setup header. This packed configuration
consists of multiple 'codebooks', each containing a specific
Huffman-equivalent representation for decoding compressed codewords as
well as an optional lookup table of output vector values to which a
decoded Huffman value is applied as an offset, generating the final
decoded output corresponding to a given compressed codeword.
<h2>bitwise operation</h2>
The codebook mechanism is built on top of the <a
href="vorbis-spec-bitpack.html">Vorbis bitpacker</a>; both the
codebooks themselves and the codewords they decode are unrolled from a
packet as a series of arbitrary-width values read from the stream
according to the <a href="vorbis-spec-bitpack.html">Vorbis bitpacking
convention</a>.
<h1>Packed Codebook Format</h1>
For purposes of the below examples, we assume that the storage
system's native byte width is eight bits. This is not universally
true; see <a href="vorbis-spec-bitpack.html">the Vorbis bitpacking
convention</a> document for discussion relating to non-eight-bit
bytes.<p>
<h2>codebook decode</h2>
A codebook begins with a 24 bit sync pattern, 0x564342:
<pre>
byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
</pre>
16 bit <tt>[codebook_dimensions]</tt> and 24 bit <tt>[codebook_entries]</tt> fields:
<pre>
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
</pre>
Next is the <tt>[ordered]</tt> bit flag:
<pre>
byte 8: [ X ] [ordered] (1 bit)
</pre>
Each entry, numbering a
total of <tt>[codebook_entries]</tt>, is assigned a codeword length.
We now read the list of codeword lengths and store these lengths in
the array <tt>[codebook_codeword_lengths]</tt>. Decode of lengths is
according to whether the <tt>[ordered]</tt> flag is set or unset.
<ul>
<li>If the <tt>[ordered]</tt> flag is unset, the codeword list is not
length ordered and the decoder needs to read each codeword length
one-by-one.<p> The decoder first reads one additional bit flag, the
<tt>[sparse]</tt> flag. This flag determines whether or not the
codebook contains unused entries that are not to be included in the
codeword decode tree:<p>
<pre>
byte 8: [ X 1 ] [sparse] flag (1 bit)
</pre>
The decoder now performs for each of the <tt>[codebook_entries]</tt> code book entries:
<pre>
1) if([sparse] is set){
2) [flag] = read one bit;
3) if([flag] is set){
4) [length] = read a five bit unsigned integer;
5) codeword length for this entry is [length]+1;
} else {
6) this entry is unused. mark it as such.
}
} else the sparse flag is not set {
7) [length] = read a five bit unsigned integer;
8) the codeword length for this entry is [length]+1;
}
</pre>
<li>If the <tt>[ordered]</tt> flag is set, the codeword list for this
codebook is encoded in ascending length order. Rather than reading
a length for every codeword, the encoder reads the number of
codewords per length. That is, beginning at entry zero:
<pre>
1) [current_entry] = 0;
2) [current_length] = read a five bit unsigned integer and add 1;
3) [number] = read <a
href="helper.html#ilog">ilog</a>([codebook_entries] - [current_entry]) bits as an unsigned integer
4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive,
of the [codebook_codeword_lengths] array to [current_length]
5) set [current_entry] to [number] + [current_entry]
6) increment [current_length] by 1
7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; the decoder will
not be able to read this stream.
8) if [current_entry] is less than [codebook_entries], repeat process starting at 3)
9) done.
</pre>
</ul>
After all codeword lengths have been decoded, the decoder reads the
vector lookup table. Vorbis I supports three lookup types:
<ol><li>No lookup
<li>Implicitly populated value mapping (lattice VQ)
<li>Explicitly populated value mapping (tessellated or 'foam' VQ)
</ol>
The lookup table type is read as a four bit unsigned integer:
<pre>
1) [codebook_lookup_type] = read four bits as an unsigned integer
</pre>
Codebook decode precedes according to <tt>[codebook_lookup_type]</tt>:
<ul>
<li> Lookup type zero indicates no lookup to be read. Proceed past
lookup decode.
<li> Lookup types one and two are similar, differing only in the
number of lookup values to be read. Lookup type one reads a list of
values that are permuted in a set pattern to build a list of vectors,
each vector of order <tt>[codebook_dimensions]</tt> scalars. Lookup
type two builds the same vector list, but reads each scalar for each
vector explicitly, rather than building vectors from a smaller list of
possible scalar values. Lookup decode proceeds as follows:
<pre>
1) [codebook_minimum_value] = <a href="helper.html#float32_unpack">float32_unpack</a>( read 32 bits as an unsigned integer)
2) [codebook_delta_value] = <a href="helper.html#float32_unpack">float32_unpack</a>( read 32 bits as an unsigned integer)
3) [codebook_value_bits] = read 4 bits as an unsigned integer and add 1
4) [codebook_sequence_p] = read 1 bit as a boolean flag
if ( [codebook_lookup_type] is 1 ) {
5) [codebook_lookup_values] = <a href="helper.html#lookup1_values">lookup1_values</a>( <tt>[codebook_entries]</tt>, <tt>[codebook_dimensions]</tt> )
} else {
6) [codebook_lookup_values] = <tt>[codebook_entries]</tt> * <tt>[codebook_dimensions]</tt>