Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
Opus
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Alexander Traud
Opus
Commits
2b5dc862
Commit
2b5dc862
authored
14 years ago
by
Jean-Marc Valin
Browse files
Options
Downloads
Patches
Plain Diff
Adding range coding information
parent
41ec4b28
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/draft-ietf-codec-opus.xml
+323
-1
323 additions, 1 deletion
doc/draft-ietf-codec-opus.xml
with
323 additions
and
1 deletion
doc/draft-ietf-codec-opus.xml
+
323
−
1
View file @
2b5dc862
...
...
@@ -2,7 +2,7 @@
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
<?rfc toc="yes" symrefs="yes" ?>
<rfc
ipr=
"trust200902"
category=
"std"
docName=
"draft-ietf-codec-opus-0
1
"
>
<rfc
ipr=
"trust200902"
category=
"std"
docName=
"draft-ietf-codec-opus-0
2
"
>
<front>
<title
abbrev=
"Interactive Audio Codec"
>
Definition of the Opus Audio Codec
</title>
...
...
@@ -275,6 +275,309 @@ Three frames of different <spanx style="emph">durations</spanx>:
</section>
</section>
<section
title=
"Codec Encoder"
>
<t>
Opus encoder block diagram.
</t>
<section
anchor=
"range-encoder"
title=
"Range Coder"
>
<t>
Opus uses an entropy coder based upon
<xref
target=
"range-coding"
></xref>
,
which is itself a rediscovery of the FIFO arithmetic code introduced by
<xref
target=
"coding-thesis"
></xref>
.
It is very similar to arithmetic encoding, except that encoding is done with
digits in any base instead of with bits,
so it is faster when using larger bases (i.e.: an octet). All of the
calculations in the range coder must use bit-exact integer arithmetic.
</t>
<t>
The range coder also acts as the bit-packer for Opus. It is
used in three different ways, to encode:
<list
style=
"symbols"
>
<t>
entropy-coded symbols with a fixed probability model using ec_encode(), (rangeenc.c)
</t>
<t>
integers from 0 to 2^M-1 using ec_enc_uint() or ec_enc_bits(), (entenc.c)
</t>
<t>
integers from 0 to N-1 (where N is not a power of two) using ec_enc_uint(). (entenc.c)
</t>
</list>
</t>
<t>
The range encoder maintains an internal state vector composed of the
four-tuple (low,rng,rem,ext), representing the low end of the current
range, the size of the current range, a single buffered output octet,
and a count of additional carry-propagating output octets. Both rng
and low are 32-bit unsigned integer values, rem is an octet value or
the special value -1, and ext is an integer with at least 16 bits.
This state vector is initialized at the start of each each frame to
the value (0,2^31,-1,0).
</t>
<t>
Each symbol is drawn from a finite alphabet and coded in a separate
context which describes the size of the alphabet and the relative
frequency of each symbol in that alphabet. Opus only uses static
contexts; they are not adapted to the statistics of the data that is
coded.
</t>
<section
anchor=
"encoding-symbols"
title=
"Encoding Symbols"
>
<t>
The main encoding function is ec_encode() (rangeenc.c),
which takes as an argument a three-tuple (fl,fh,ft)
describing the range of the symbol to be encoded in the current
context, with 0
<
= fl
<
fh
<
= ft
<
= 65535. The values of this tuple
are derived from the probability model for the symbol. Let f(i) be
the frequency of the ith symbol in the current context. Then the
three-tuple corresponding to the kth symbol is given by
<![CDATA[
fl=sum(f(i),i<k), fh=fl+f(i), and ft=sum(f(i)).
]]>
</t>
<t>
ec_encode() updates the state of the encoder as follows. If fl is
greater than zero, then low = low + rng - (rng/ft)*(ft-fl) and
rng = (rng/ft)*(fh-fl). Otherwise, low is unchanged and
rng = rng - (rng/ft)*(fh-fl). The divisions here are exact integer
division. After this update, the range is normalized.
</t>
<t>
To normalize the range, the following process is repeated until
rng > 2^23. First, the top 9 bits of low, (low>>23), are placed into
a carry buffer. Then, low is set to
<![CDATA[(low << 8 & 0x7FFFFFFF) and rng
is set to (rng<<8)]]>
. This process is carried out by
ec_enc_normalize() (rangeenc.c).
</t>
<t>
The 9 bits produced in each iteration of the normalization loop
consist of 8 data bits and a carry flag. The final value of the
output bits is not determined until carry propagation is accounted
for. Therefore the reference implementation buffers a single
(non-propagating) output octet and keeps a count of additional
propagating (0xFF) output octets. An implementation MAY choose to use
any mathematically equivalent scheme to perform carry propagation.
</t>
<t>
The function ec_enc_carry_out() (rangeenc.c) performs
this buffering. It takes a 9-bit input value, c, from the normalization
8-bit output and a carry bit. If c is 0xFF, then ext is incremented
and no octets are output. Otherwise, if rem is not the special value
-1, then the octet (rem+(c>>8)) is output. Then ext octets are output
with the value 0 if the carry bit is set, or 0xFF if it is not, and
rem is set to the lower 8 bits of c. After this, ext is set to zero.
</t>
<t>
In the reference implementation, a special version of ec_encode()
called ec_encode_bin() (rangeenc.c) is defined to
take a two-tuple (fl,ftb), where
<![CDATA[0 <= fl < 2^ftb and ftb < 16. It is
mathematically equivalent to calling ec_encode() with the three-tuple
(fl,fl+1,1<<ftb)]]>
, but avoids using division.
</t>
</section>
<section
anchor=
"encoding-ints"
title=
"Encoding Uniformly Distributed Integers"
>
<t>
Functions ec_enc_uint() or ec_enc_bits() are based on ec_encode() and
encode one of N equiprobable symbols, each with a frequency of 1,
where N may be as large as 2^32-1. Because ec_encode() is limited to
a total frequency of 2^16-1, this is done by encoding a series of
symbols in smaller contexts.
</t>
<t>
ec_enc_bits() (entenc.c) is defined, like
ec_encode_bin(), to take a two-tuple (fl,ftb), with
<![CDATA[0 <= fl < 2^ftb
and ftb < 32. While ftb is greater than 8, it encodes bits (ftb-8) to
(ftb-1) of fl, e.g., (fl>
>ftb-8
&
0xFF) using ec_encode_bin() and
subtracts 8 from ftb. Then, it encodes the remaining bits of fl, e.g.,
(fl
&
(1
<
<ftb
)-1)]]
>
, again using ec_encode_bin().
</t>
<t>
ec_enc_uint() (entenc.c) takes a two-tuple (fl,ft),
where ft is not necessarily a power of two. Let ftb be the location
of the highest 1 bit in the two's-complement representation of
(ft-1), or -1 if no bits are set. If ftb>8, then the top 8 bits of fl
are encoded using ec_encode() with the three-tuple
(fl>>ftb-8,(fl>>ftb-8)+1,(ft-1>>ftb-8)+1), and the remaining bits
are encoded with ec_enc_bits using the two-tuple
<![CDATA[(fl&(1<<ftb-8)-1,ftb-8). Otherwise, fl is encoded with ec_encode()
directly using the three-tuple (fl,fl+1,ft)]]>
.
</t>
</section>
<section
anchor=
"encoder-finalizing"
title=
"Finalizing the Stream"
>
<t>
After all symbols are encoded, the stream must be finalized by
outputting a value inside the current range. Let end be the integer
in the interval [low,low+rng) with the largest number of trailing
zero bits. Then while end is not zero, the top 9 bits of end, e.g.,
<![CDATA[(end>
>23), are sent to the carry buffer, and end is replaced by
(end
<<
8
&
0x7FFFFFFF). Finally, if the value in carry buffer, rem, is]]>
neither zero nor the special value -1, or the carry count, ext, is
greater than zero, then 9 zero bits are sent to the carry buffer.
After the carry buffer is finished outputting octets, the rest of the
output buffer is padded with zero octets. Finally, rem is set to the
special value -1. This process is implemented by ec_enc_done()
(rangeenc.c).
</t>
</section>
<section
anchor=
"encoder-tell"
title=
"Current Bit Usage"
>
<t>
The bit allocation routines in Opus need to be able to determine a
conservative upper bound on the number of bits that have been used
to encode the current frame thus far. This drives allocation
decisions and ensures that the range code will not overflow the
output buffer. This is computed in the reference implementation to
fractional bit precision by the function ec_enc_tell()
(rangeenc.c).
Like all operations in the range encoder, it must
be implemented in a bit-exact manner.
</t>
</section>
</section>
<section
title=
"SILK Encoder"
>
<t>
Copy from SILK draft.
</t>
</section>
<section
title=
"CELT Encoder"
>
<t>
Copy from CELT draft.
</t>
</section>
</section>
<section
title=
"Codec Decoder"
>
<t>
Opus decoder block diagram.
</t>
<section
anchor=
"range-decoder"
title=
"Range Decoder"
>
<t>
The range decoder extracts the symbols and integers encoded using the range encoder in
<xref
target=
"range-encoder"
></xref>
. The range decoder maintains an internal
state vector composed of the two-tuple (dif,rng), representing the
difference between the high end of the current range and the actual
coded value, and the size of the current range, respectively. Both
dif and rng are 32-bit unsigned integer values. rng is initialized to
2^7. dif is initialized to rng minus the top 7 bits of the first
input octet. Then the range is immediately normalized, using the
procedure described in the following section.
</t>
<section
anchor=
"decoding-symbols"
title=
"Decoding Symbols"
>
<t>
Decoding symbols is a two-step process. The first step determines
a value fs that lies within the range of some symbol in the current
context. The second step updates the range decoder state with the
three-tuple (fl,fh,ft) corresponding to that symbol, as defined in
<xref
target=
"encoding-symbols"
></xref>
.
</t>
<t>
The first step is implemented by ec_decode()
(rangedec.c),
and computes fs = ft-min((dif-1)/(rng/ft)+1,ft), where ft is
the sum of the frequency counts in the current context, as described
in
<xref
target=
"encoding-symbols"
></xref>
. The divisions here are exact integer division.
</t>
<t>
In the reference implementation, a special version of ec_decode()
called ec_decode_bin() (rangeenc.c) is defined using
the parameter ftb instead of ft. It is mathematically equivalent to
calling ec_decode() with ft = (1
<<
ftb), but avoids one of the
divisions.
</t>
<t>
The decoder then identifies the symbol in the current context
corresponding to fs; i.e., the one whose three-tuple (fl,fh,ft)
satisfies fl
<
= fs
<
fh. This tuple is used to update the decoder
state according to dif = dif - (rng/ft)*(ft-fh), and if fl is greater
than zero, rng = (rng/ft)*(fh-fl), or otherwise rng = rng - (rng/ft)*(ft-fh). After this update, the range is normalized.
</t>
<t>
To normalize the range, the following process is repeated until
rng > 2^23. First, rng is set to (rng
<
8)
&
0xFFFFFFFF. Then the next
8 bits of input are read into sym, using the remaining bit from the
previous input octet as the high bit of sym, and the top 7 bits of the
next octet for the remaining bits of sym. If no more input octets
remain, zero bits are used instead. Then, dif is set to
(dif
<<
8)-sym
&
0xFFFFFFFF (i.e., using wrap-around if the subtraction
overflows a 32-bit register). Finally, if dif is larger than 2^31,
dif is then set to dif - 2^31. This process is carried out by
ec_dec_normalize() (rangedec.c).
</t>
</section>
<section
anchor=
"decoding-ints"
title=
"Decoding Uniformly Distributed Integers"
>
<t>
Functions ec_dec_uint() or ec_dec_bits() are based on ec_decode() and
decode one of N equiprobable symbols, each with a frequency of 1,
where N may be as large as 2^32-1. Because ec_decode() is limited to
a total frequency of 2^16-1, this is done by decoding a series of
symbols in smaller contexts.
</t>
<t>
ec_dec_bits() (entdec.c) is defined, like
ec_decode_bin(), to take a single parameter ftb, with ftb
<
32.
and ftb
<
32, and produces an ftb-bit decoded integer value, t,
initialized to zero. While ftb is greater than 8, it decodes the next
8 most significant bits of the integer, s = ec_decode_bin(8), updates
the decoder state with the 3-tuple (s,s+1,256), adds those bits to
the current value of t, t = t
<<
8 | s, and subtracts 8 from ftb. Then
it decodes the remaining bits of the integer, s = ec_decode_bin(ftb),
updates the decoder state with the 3 tuple (s,s+1,1
<<
ftb), and adds
those bits to the final values of t, t = t
<<
ftb | s.
</t>
<t>
ec_dec_uint() (entdec.c) takes a single parameter,
ft, which is not necessarily a power of two, and returns an integer,
t, with a value between 0 and ft-1, inclusive, which is initialized to zero. Let
ftb be the location of the highest 1 bit in the two's-complement
representation of (ft-1), or -1 if no bits are set. If ftb>8, then
the top 8 bits of t are decoded using t = ec_decode((ft-1>>ftb-8)+1),
the decoder state is updated with the three-tuple
(s,s+1,(ft-1>>ftb-8)+1), and the remaining bits are decoded with
t = t
<<
ftb-8|ec_dec_bits(ftb-8). If, at this point, t >= ft, then
the current frame is corrupt, and decoding should stop. If the
original value of ftb was not greater than 8, then t is decoded with
t = ec_decode(ft), and the decoder state is updated with the
three-tuple (t,t+1,ft).
</t>
</section>
<section
anchor=
"decoder-tell"
title=
"Current Bit Usage"
>
<t>
The bit allocation routines in CELT need to be able to determine a
conservative upper bound on the number of bits that have been used
to decode from the current frame thus far. This drives allocation
decisions which must match those made in the encoder. This is
computed in the reference implementation to fractional bit precision
by the function ec_dec_tell() (rangedec.c). Like all
operations in the range decoder, it must be implemented in a
bit-exact manner, and must produce exactly the same value returned by
ec_enc_tell() after encoding the same symbols.
</t>
</section>
</section>
<section
title=
"SILK Decoder"
>
<t>
Copy from SILK draft.
</t>
</section>
<section
title=
"CELT Decoder"
>
<t>
Copy from CELT draft.
</t>
</section>
</section>
<section
anchor=
"security"
title=
"Security Considerations"
>
...
...
@@ -383,6 +686,25 @@ Christopher Montgomery, Karsten Vandborg Soerensen, and Timothy Terriberry.
<format
type=
'TXT'
octets=
'110393'
target=
'ftp://ftp.isi.edu/in-notes/rfc3552.txt'
/>
</reference>
<reference
anchor=
"range-coding"
>
<front>
<title>
Range encoding: An algorithm for removing redundancy from a digitised message
</title>
<author
initials=
"G."
surname=
"Nigel"
fullname=
""
><organization/></author>
<author
initials=
"N."
surname=
"Martin"
fullname=
""
><organization/></author>
<date
year=
"1979"
/>
</front>
<seriesInfo
name=
"Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording"
value=
""
/>
</reference>
<reference
anchor=
"coding-thesis"
>
<front>
<title>
Source coding algorithms for fast data compression
</title>
<author
initials=
"R."
surname=
"Pasco"
fullname=
""
><organization/></author>
<date
month=
"May"
year=
"1976"
/>
</front>
<seriesInfo
name=
"Ph.D. thesis"
value=
"Dept. of Electrical Engineering, Stanford University"
/>
</reference>
</references>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment