Commit 0dc18350 authored by Ralph Giles's avatar Ralph Giles
Browse files

Docbook to latex conversion of the spec by Max Horn.

We now build the specification document from latex source instead of the 
older docbook method since may developers have had trouble getting the 
docbook tools to work. It's also faster.

For now, the built documentation is still kept in svn and rebuilt only 
if --enable-docs is passed to configure. We may relax this to rebuilding
only if the tools are available once we're more confident it will work
with common TeX installations.

svn path=/trunk/vorbis/; revision=15748
parent dd49650d
This diff is collapsed.
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
%!TEX root = Vorbis_I_spec.tex
% $Id$
\section{Bitpacking Convention} \label{vorbis:spec:bitpacking}
The Vorbis codec uses relatively unstructured raw packets containing
arbitrary-width binary integer fields. Logically, these packets are a
bitstream in which bits are coded one-by-one by the encoder and then
read one-by-one in the same monotonically increasing order by the
decoder. Most current binary storage arrangements group bits into a
native word size of eight bits (octets), sixteen bits, thirty-two bits
or, less commonly other fixed word sizes. The Vorbis bitpacking
convention specifies the correct mapping of the logical packet
bitstream into an actual representation in fixed-width words.
\subsubsection{octets, bytes and words}
In most contemporary architectures, a 'byte' is synonymous with an
'octet', that is, eight bits. This has not always been the case;
seven, ten, eleven and sixteen bit 'bytes' have been used. For
purposes of the bitpacking convention, a byte implies the native,
smallest integer storage representation offered by a platform. On
modern platforms, this is generally assumed to be eight bits (not
necessarily because of the processor but because of the
filesystem/memory architecture. Modern filesystems invariably offer
bytes as the fundamental atom of storage). A 'word' is an integer
size that is a grouped multiple of this smallest size.
The most ubiquitous architectures today consider a 'byte' to be an
octet (eight bits) and a word to be a group of two, four or eight
bytes (16, 32 or 64 bits). Note however that the Vorbis bitpacking
convention is still well defined for any native byte size; Vorbis uses
the native bit-width of a given storage system. This document assumes
that a byte is one octet for purposes of example.
\subsubsection{bit order}
A byte has a well-defined 'least significant' bit (LSb), which is the
only bit set when the byte is storing the two's complement integer
value +1. A byte's 'most significant' bit (MSb) is at the opposite
end of the byte. Bits in a byte are numbered from zero at the LSb to
$n$ ($n=7$ in an octet) for the
\subsubsection{byte order}
Words are native groupings of multiple bytes. Several byte orderings
are possible in a word; the common ones are 3-2-1-0 ('big endian' or
'most significant byte first' in which the highest-valued byte comes
first), 0-1-2-3 ('little endian' or 'least significant byte first' in
which the lowest value byte comes first) and less commonly 3-1-2-0 and
0-2-1-3 ('mixed endian').
The Vorbis bitpacking convention specifies storage and bitstream
manipulation at the byte, not word, level, thus host word ordering is
of a concern only during optimization when writing high performance
code that operates on a word of storage at a time rather than by byte.
Logically, bytes are always coded and decoded in order from byte zero
through byte $n$.
\subsubsection{coding bits into byte sequences}
The Vorbis codec has need to code arbitrary bit-width integers, from
zero to 32 bits wide, into packets. These integer fields are not
aligned to the boundaries of the byte representation; the next field
is written at the bit position at which the previous field ends.
The encoder logically packs integers by writing the LSb of a binary
integer to the logical bitstream first, followed by next least
significant bit, etc, until the requested number of bits have been
coded. When packing the bits into bytes, the encoder begins by
placing the LSb of the integer to be written into the least
significant unused bit position of the destination byte, followed by
the next-least significant bit of the source integer and so on up to
the requested number of bits. When all bits of the destination byte
have been filled, encoding continues by zeroing all bits of the next
byte and writing the next bit into the bit position 0 of that byte.
Decoding follows the same process as encoding, but by reading bits
from the byte stream and reassembling them into integers.
The signedness of a specific number resulting from decode is to be
interpreted by the decoder given decode context. That is, the three
bit binary pattern 'b111' can be taken to represent either 'seven' as
an unsigned integer, or '-1' as a signed, two's complement integer.
The encoder and decoder are responsible for knowing if fields are to
be treated as signed or unsigned.
\subsubsection{coding example}
Code the 4 bit integer value '12' [b1100] into an empty bytestream.
Bytestream result:
7 6 5 4 3 2 1 0
byte 0 [0 0 0 0 1 1 0 0] <-
byte 1 [ ]
byte 2 [ ]
byte 3 [ ]
byte n [ ] bytestream length == 1 byte
Continue by coding the 3 bit integer value '-1' [b111]:
7 6 5 4 3 2 1 0
byte 0 [0 1 1 1 1 1 0 0] <-
byte 1 [ ]
byte 2 [ ]
byte 3 [ ]
byte n [ ] bytestream length == 1 byte
Continue by coding the 7 bit integer value '17' [b0010001]:
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0]
byte 1 [0 0 0 0 1 0 0 0] <-
byte 2 [ ]
byte 3 [ ]
byte n [ ] bytestream length == 2 bytes
bit cursor == 6
Continue by coding the 13 bit integer value '6969' [b110 11001110 01]:
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0]
byte 1 [0 1 0 0 1 0 0 0]
byte 2 [1 1 0 0 1 1 1 0]
byte 3 [0 0 0 0 0 1 1 0] <-
byte n [ ] bytestream length == 4 bytes
\subsubsection{decoding example}
Reading from the beginning of the bytestream encoded in the above example:
7 6 5 4 3 2 1 0
byte 0 [1 1 1 1 1 1 0 0] <-
byte 1 [0 1 0 0 1 0 0 0]
byte 2 [1 1 0 0 1 1 1 0]
byte 3 [0 0 0 0 0 1 1 0] bytestream length == 4 bytes
We read two, two-bit integer fields, resulting in the returned numbers
'b00' and 'b11'. Two things are worth noting here:
\item Although these four bits were originally written as a single
four-bit integer, reading some other combination of bit-widths from the
bitstream is well defined. There are no artificial alignment
boundaries maintained in the bitstream.
\item The second value is the
two-bit-wide integer 'b11'. This value may be interpreted either as
the unsigned value '3', or the signed value '-1'. Signedness is
dependent on decode context.
\subsubsection{end-of-packet alignment}
The typical use of bitpacking is to produce many independent
byte-aligned packets which are embedded into a larger byte-aligned
container structure, such as an Ogg transport bitstream. Externally,
each bytestream (encoded bitstream) must begin and end on a byte
boundary. Often, the encoded bitstream is not an integer number of
bytes, and so there is unused (uncoded) space in the last byte of a
Unused space in the last byte of a bytestream is always zeroed during
the coding process. Thus, should this unused space be read, it will
return binary zeroes.
Attempting to read past the end of an encoded packet results in an
'end-of-packet' condition. End-of-packet is not to be considered an
error; it is merely a state indicating that there is insufficient
remaining data to fulfill the desired read size. Vorbis uses truncated
packets as a normal mode of operation, and as such, decoders must
handle reading past the end of a packet as a typical mode of
operation. Any further read operations after an 'end-of-packet'
condition shall also return 'end-of-packet'.
\subsubsection{reading zero bits}
Reading a zero-bit-wide integer returns the value '0' and does not
increment the stream cursor. Reading to the end of the packet (but
not past, such that an 'end-of-packet' condition has not triggered)
and then reading a zero bit integer shall succeed, returning 0, and
not trigger an end-of-packet condition. Reading a zero-bit-wide
integer after a previous read sets 'end-of-packet' shall also fail
with 'end-of-packet'.
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
%!TEX root = Vorbis_I_spec.tex
% $Id$
\section{Probability Model and Codebooks} \label{vorbis:spec:codebook}
Unlike practically every other mainstream audio codec, Vorbis has no
statically configured probability model, instead packing all entropy
decoding configuration, VQ and Huffman, into the bitstream itself in
the third header, the codec setup header. This packed configuration
consists of multiple 'codebooks', each containing a specific
Huffman-equivalent representation for decoding compressed codewords as
well as an optional lookup table of output vector values to which a
decoded Huffman value is applied as an offset, generating the final
decoded output corresponding to a given compressed codeword.
\subsubsection{Bitwise operation}
The codebook mechanism is built on top of the vorbis bitpacker. Both
the codebooks themselves and the codewords they decode are unrolled
from a packet as a series of arbitrary-width values read from the
stream according to \xref{vorbis:spec:bitpacking}.
\subsection{Packed codebook format}
For purposes of the examples below, we assume that the storage
system's native byte width is eight bits. This is not universally
true; see \xref{vorbis:spec:bitpacking} for discussion
relating to non-eight-bit bytes.
\subsubsection{codebook decode}
A codebook begins with a 24 bit sync pattern, 0x564342:
byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
16 bit \varname{[codebook_dimensions]} and 24 bit \varname{[codebook_entries]} fields:
byte 3: [ X X X X X X X X ]
byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
byte 5: [ X X X X X X X X ]
byte 6: [ X X X X X X X X ]
byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
Next is the \varname{[ordered]} bit flag:
byte 8: [ X ] [ordered] (1 bit)
Each entry, numbering a
total of \varname{[codebook_entries]}, is assigned a codeword length.
We now read the list of codeword lengths and store these lengths in
the array \varname{[codebook_codeword_lengths]}. Decode of lengths is
according to whether the \varname{[ordered]} flag is set or unset.
If the \varname{[ordered]} flag is unset, the codeword list is not
length ordered and the decoder needs to read each codeword length
The decoder first reads one additional bit flag, the
\varname{[sparse]} flag. This flag determines whether or not the
codebook contains unused entries that are not to be included in the
codeword decode tree:
byte 8: [ X 1 ] [sparse] flag (1 bit)
The decoder now performs for each of the \varname{[codebook_entries]}
codebook entries:
1) if([sparse] is set) \{
2) [flag] = read one bit;
3) if([flag] is set) \{
4) [length] = read a five bit unsigned integer;
5) codeword length for this entry is [length]+1;
\} else \{
6) this entry is unused. mark it as such.
\} else the sparse flag is not set \{
7) [length] = read a five bit unsigned integer;
8) the codeword length for this entry is [length]+1;
If the \varname{[ordered]} flag is set, the codeword list for this
codebook is encoded in ascending length order. Rather than reading
a length for every codeword, the encoder reads the number of
codewords per length. That is, beginning at entry zero:
1) [current_entry] = 0;
2) [current_length] = read a five bit unsigned integer and add 1;
3) [number] = read \link{vorbis:spec:ilog}{ilog}([codebook_entries] - [current_entry]) bits as an unsigned integer
4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive,
of the [codebook_codeword_lengths] array to [current_length]
5) set [current_entry] to [number] + [current_entry]
6) increment [current_length] by 1
7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION;
the decoder will not be able to read this stream.
8) if [current_entry] is less than [codebook_entries], repeat process starting at 3)
9) done.
After all codeword lengths have been decoded, the decoder reads the
vector lookup table. Vorbis I supports three lookup types:
No lookup
Implicitly populated value mapping (lattice VQ)
Explicitly populated value mapping (tessellated or 'foam'
The lookup table type is read as a four bit unsigned integer:
1) [codebook_lookup_type] = read four bits as an unsigned integer
Codebook decode precedes according to \varname{[codebook_lookup_type]}:
Lookup type zero indicates no lookup to be read. Proceed past
lookup decode.
Lookup types one and two are similar, differing only in the
number of lookup values to be read. Lookup type one reads a list of
values that are permuted in a set pattern to build a list of vectors,
each vector of order \varname{[codebook_dimensions]} scalars. Lookup
type two builds the same vector list, but reads each scalar for each
vector explicitly, rather than building vectors from a smaller list of
possible scalar values. Lookup decode proceeds as follows:
1) [codebook_minimum_value] = \link{vorbis:spec:float32:unpack}{float32_unpack}( read 32 bits as an unsigned integer)
2) [codebook_delta_value] = \link{vorbis:spec:float32:unpack}{float32_unpack}( read 32 bits as an unsigned integer)
3) [codebook_value_bits] = read 4 bits as an unsigned integer and add 1
4) [codebook_sequence_p] = read 1 bit as a boolean flag
if ( [codebook_lookup_type] is 1 ) \{
5) [codebook_lookup_values] = \link{vorbis:spec:lookup1:values}{lookup1_values}(\varname{[codebook_entries]}, \varname{[codebook_dimensions]} )
\} else \{
6) [codebook_lookup_values] = \varname{[codebook_entries]} * \varname{[codebook_dimensions]}
7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each;
store these in order in the array [codebook_multiplicands]
A \varname{[codebook_lookup_type]} of greater than two is reserved
and indicates a stream that is not decodable by the specification in this
An 'end of packet' during any read operation in the above steps is
considered an error condition rendering the stream undecodable.
\paragraph{Huffman decision tree representation}
The \varname{[codebook_codeword_lengths]} array and
\varname{[codebook_entries]} value uniquely define the Huffman decision
tree used for entropy decoding.
Briefly, each used codebook entry (recall that length-unordered
codebooks support unused codeword entries) is assigned, in order, the
lowest valued unused binary Huffman codeword possible. Assume the
following codeword length list:
entry 0: length 2
entry 1: length 4
entry 2: length 4
entry 3: length 4
entry 4: length 4
entry 5: length 2
entry 6: length 3
entry 7: length 3
Assigning codewords in order (lowest possible value of the appropriate
length to highest) results in the following codeword list:
entry 0: length 2 codeword 00
entry 1: length 4 codeword 0100
entry 2: length 4 codeword 0101
entry 3: length 4 codeword 0110
entry 4: length 4 codeword 0111
entry 5: length 2 codeword 10
entry 6: length 3 codeword 110
entry 7: length 3 codeword 111
Unlike most binary numerical values in this document, we
intend the above codewords to be read and used bit by bit from left to
right, thus the codeword '001' is the bit string 'zero, zero, one'.
When determining 'lowest possible value' in the assignment definition
above, the leftmost bit is the MSb.
It is clear that the codeword length list represents a Huffman
decision tree with the entry numbers equivalent to the leaves numbered
\captionof{figure}{huffman tree illustration}
As we assign codewords in order, we see that each choice constructs a
new leaf in the leftmost possible position.
Note that it's possible to underspecify or overspecify a Huffman tree
via the length list. In the above example, if codeword seven were
eliminated, it's clear that the tree is unfinished:
\captionof{figure}{underspecified huffman tree illustration}
Similarly, in the original codebook, it's clear that the tree is fully
populated and a ninth codeword is impossible. Both underspecified and
overspecified trees are an error condition rendering the stream
Codebook entries marked 'unused' are simply skipped in the assigning
process. They have no codeword and do not appear in the decision
tree, thus it's impossible for any bit pattern read from the stream to
decode to that entry number.
\paragraph{VQ lookup table vector representation}
Unpacking the VQ lookup table vectors relies on the following values:
the [codebook_multiplicands] array
Decoding (unpacking) a specific vector in the vector lookup table
proceeds according to \varname{[codebook_lookup_type]}. The unpacked
vector values are what a codebook would return during audio packet
decode in a VQ context.
\paragraph{Vector value decode: Lookup type 1}
Lookup type one specifies a lattice VQ lookup table built
algorithmically from a list of scalar values. Calculate (unpack) the
final values of a codebook entry vector from the entries in
\varname{[codebook_multiplicands]} as follows (\varname{[value_vector]}
is the output vector representing the vector of values for entry number
\varname{[lookup_offset]} in this codebook):
1) [last] = 0;
2) [index_divisor] = 1;
3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) \{
4) [multiplicand_offset] = ( [lookup_offset] divided by [index_divisor] using integer
division ) integer modulo [codebook_lookup_values]
5) vector [value_vector] element [i] =
( [codebook_multiplicands] array element number [multiplicand_offset] ) *
[codebook_delta_value] + [codebook_minimum_value] + [last];
6) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i]
7) [index_divisor] = [index_divisor] * [codebook_lookup_values]
8) vector calculation completed.
\paragraph{Vector value decode: Lookup type 2}
Lookup type two specifies a VQ lookup table in which each scalar in
each vector is explicitly set by the \varname{[codebook_multiplicands]}
array in a one-to-one mapping. Calculate [unpack] the
final values of a codebook entry vector from the entries in
\varname{[codebook_multiplicands]} as follows (\varname{[value_vector]}
is the output vector representing the vector of values for entry number
\varname{[lookup_offset]} in this codebook):
1) [last] = 0;
2) [multiplicand_offset] = [lookup_offset] * [codebook_dimensions]
3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) \{
4) vector [value_vector] element [i] =
( [codebook_multiplicands] array element number [multiplicand_offset] ) *
[codebook_delta_value] + [codebook_minimum_value] + [last];
5) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i]
6) increment [multiplicand_offset]
7) vector calculation completed.
\subsection{Use of the codebook abstraction}
The decoder uses the codebook abstraction much as it does the
bit-unpacking convention; a specific codebook reads a
codeword from the bitstream, decoding it into an entry number, and then
returns that entry number to the decoder (when used in a scalar
entropy coding context), or uses that entry number as an offset into
the VQ lookup table, returning a vector of values (when used in a context
desiring a VQ value). Scalar or VQ context is always explicit; any call
to the codebook mechanism requests either a scalar entry number or a
lookup vector.
Note that VQ lookup type zero indicates that there is no lookup table;
requesting decode using a codebook of lookup type 0 in any context
expecting a vector return value (even in a case where a vector of
dimension one) is forbidden. If decoder setup or decode requests such
an action, that is an error condition rendering the packet
Using a codebook to read from the packet bitstream consists first of
reading and decoding the next codeword in the bitstream. The decoder
reads bits until the accumulated bits match a codeword in the
codebook. This process can be though of as logically walking the
Huffman decode tree by reading one bit at a time from the bitstream,
and using the bit as a decision boolean to take the 0 branch (left in
the above examples) or the 1 branch (right in the above examples).
Walking the tree finishes when the decode process hits a leaf in the
decision tree; the result is the entry number corresponding to that
leaf. Reading past the end of a packet propagates the 'end-of-stream'
condition to the decoder.