manual.lyx 223 KB
Newer Older
Jean-Marc Valin's avatar
Jean-Marc Valin committed
1 2
#LyX 1.6.0rc2 created this file. For more info see http://www.lyx.org/
\lyxformat 340
jm's avatar
jm committed
3 4
\begin_document
\begin_header
5
\textclass scrbook
6 7
\language english
\inputencoding auto
8 9 10 11 12 13 14 15
\font_roman times
\font_sans helvet
\font_typewriter courier
\font_default_family default
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
Jean-Marc Valin's avatar
Jean-Marc Valin committed
16

17
\graphics default
18
\paperfontsize 10
19
\spacing single
Jean-Marc Valin's avatar
Jean-Marc Valin committed
20
\use_hyperref false
jm's avatar
...  
jm committed
21
\papersize letterpaper
22 23
\use_geometry true
\use_amsmath 2
24
\use_esint 0
jm's avatar
jm committed
25 26
\cite_engine basic
\use_bibtopic false
27
\paperorientation portrait
28 29 30 31
\leftmargin 2cm
\topmargin 2cm
\rightmargin 2cm
\bottommargin 2cm
32 33 34 35 36 37 38 39
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle headings
jm's avatar
jm committed
40
\listings_params "basicstyle={\ttfamily},breaklines=true,language=C,xleftmargin=0mm"
jm's avatar
jm committed
41
\tracking_changes false
42
\output_changes false
jm's avatar
jm committed
43
\author "" 
44
\author "" 
jm's avatar
jm committed
45
\end_header
46

jm's avatar
jm committed
47
\begin_body
48

jm's avatar
jm committed
49
\begin_layout Title
50
The Speex Manual
Jean-Marc Valin's avatar
Jean-Marc Valin committed
51 52 53
\begin_inset Newline newline
\end_inset

jm's avatar
jm committed
54
Version 1.2
jm's avatar
jm committed
55
\end_layout
56

jm's avatar
jm committed
57
\begin_layout Author
58
Jean-Marc Valin
jm's avatar
jm committed
59 60 61
\end_layout

\begin_layout Standard
Jean-Marc Valin's avatar
Jean-Marc Valin committed
62 63
\begin_inset Newpage newpage
\end_inset
jm's avatar
jm committed
64

65 66 67 68 69 70 71 72

\end_layout

\begin_layout Standard
Copyright 
\begin_inset ERT
status collapsed

Jean-Marc Valin's avatar
Jean-Marc Valin committed
73
\begin_layout Plain Layout
74 75 76 77 78 79 80 81


\backslash
copyright
\end_layout

\end_inset

jm's avatar
jm committed
82
 2002-2008 Jean-Marc Valin/Xiph.org Foundation
jm's avatar
jm committed
83
\end_layout
84

jm's avatar
jm committed
85
\begin_layout Standard
86 87 88 89 90 91 92
Permission is granted to copy, distribute and/or modify this document under
 the terms of the GNU Free Documentation License, Version 1.1 or any later
 version published by the Free Software Foundation; with no Invariant Section,
 with no Front-Cover Texts, and with no Back-Cover.
 A copy of the license is included in the section entitled "GNU Free Documentati
on License".
 
jm's avatar
jm committed
93 94 95
\end_layout

\begin_layout Standard
Jean-Marc Valin's avatar
Jean-Marc Valin committed
96 97
\begin_inset Newpage newpage
\end_inset
jm's avatar
jm committed
98

99

Jean-Marc Valin's avatar
Jean-Marc Valin committed
100 101
\begin_inset CommandInset toc
LatexCommand tableofcontents
102

jm's avatar
jm committed
103
\end_inset
104 105


Jean-Marc Valin's avatar
Jean-Marc Valin committed
106 107 108
\begin_inset Newpage newpage
\end_inset

jmvalin's avatar
jmvalin committed
109

jm's avatar
jm committed
110 111 112
\end_layout

\begin_layout Standard
jmvalin's avatar
jmvalin committed
113 114
\begin_inset FloatList table

jm's avatar
jm committed
115 116
\end_inset

jmvalin's avatar
jmvalin committed
117

Jean-Marc Valin's avatar
Jean-Marc Valin committed
118 119 120
\begin_inset Newpage newpage
\end_inset

jmvalin's avatar
jmvalin committed
121

jm's avatar
jm committed
122
\end_layout
123

124
\begin_layout Chapter
125
Introduction to Speex
jm's avatar
jm committed
126
\end_layout
127

jm's avatar
jm committed
128
\begin_layout Standard
129
The Speex codec (
jm's avatar
jm committed
130
\family typewriter
jm's avatar
...  
jm committed
131
http://www.speex.org/
jm's avatar
jm committed
132
\family default
133 134
) exists because there is a need for a speech codec that is open-source
 and free from software patent royalties.
135
 These are essential conditions for being usable in any open-source software.
136 137
 In essence, Speex is to speech what Vorbis is to audio/music.
 Unlike many other speech codecs, Speex is not designed for mobile phones
138
 but rather for packet networks and voice over IP (VoIP) applications.
139
 File-based compression is of course also supported.
jmvalin's avatar
...  
jmvalin committed
140
 
jm's avatar
jm committed
141
\end_layout
142

jm's avatar
jm committed
143
\begin_layout Standard
144 145 146 147 148
The Speex codec is designed to be very flexible and support a wide range
 of speech quality and bit-rate.
 Support for very good quality speech also means that Speex can encode wideband
 speech (16 kHz sampling rate) in addition to narrowband speech (telephone
 quality, 8 kHz sampling rate).
jm's avatar
jm committed
149
\end_layout
150

jm's avatar
jm committed
151
\begin_layout Standard
152 153 154 155
Designing for VoIP instead of mobile phones means that Speex is robust to
 lost packets, but not to corrupted ones.
 This is based on the assumption that in VoIP, packets either arrive unaltered
 or don't arrive at all.
156 157
 Because Speex is targeted at a wide range of devices, it has modest (adjustable
) complexity and a small memory footprint.
jm's avatar
jm committed
158
\end_layout
159

jm's avatar
jm committed
160
\begin_layout Standard
161
All the design goals led to the choice of CELP
Jean-Marc Valin's avatar
Jean-Marc Valin committed
162 163 164 165 166 167
\begin_inset Index
status collapsed

\begin_layout Plain Layout
CELP
\end_layout
jm's avatar
...  
jm committed
168

jm's avatar
jm committed
169
\end_inset
jm's avatar
...  
jm committed
170

171 172 173 174 175
 as the encoding technique.
 One of the main reasons is that CELP has long proved that it could work
 reliably and scale well to both low bit-rates (e.g.
 DoD CELP @ 4.8 kbps) and high bit-rates (e.g.
 G.728 @ 16 kbps).
jmvalin's avatar
...  
jmvalin committed
176
 
jm's avatar
jm committed
177
\end_layout
jmvalin's avatar
jmvalin committed
178

179 180
\begin_layout Section
Getting help
Jean-Marc Valin's avatar
Jean-Marc Valin committed
181 182
\begin_inset CommandInset label
LatexCommand label
183
name "sec:Getting-help"
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218

\end_inset


\end_layout

\begin_layout Standard
As for many open source projects, there are many ways to get help with Speex.
 These include:
\end_layout

\begin_layout Itemize
This manual
\end_layout

\begin_layout Itemize
Other documentation on the Speex website (http://www.speex.org/)
\end_layout

\begin_layout Itemize
Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not
 just for developers)
\end_layout

\begin_layout Itemize
IRC: The main channel is #speex on irc.freenode.net.
 Note that due to time differences, it may take a while to get someone,
 so please be patient.
\end_layout

\begin_layout Itemize
Email the author privately at jean-marc.valin@usherbrooke.ca 
\series bold
only
\series default
219
 for private/delicate topics you do not wish to discuss publicly.
220 221 222 223 224 225 226
\end_layout

\begin_layout Standard
Before asking for help (mailing list or IRC), 
\series bold
it is important to first read this manual
\series default
227
 (OK, so if you made it here it's already a good sign).
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242
 It is generally considered rude to ask on a mailing list about topics that
 are clearly detailed in the documentation.
 On the other hand, it's perfectly OK (and encouraged) to ask for clarifications
 about something covered in the manual.
 This manual does not (yet) cover everything about Speex, so everyone is
 encouraged to ask questions, send comments, feature requests, or just let
 us know how Speex is being used.
 
\end_layout

\begin_layout Standard
Here are some additional guidelines related to the mailing list.
 Before reporting bugs in Speex to the list, it is strongly recommended
 (if possible) to first test whether these bugs can be reproduced using
 the speexenc and speexdec (see Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
243 244
\begin_inset CommandInset ref
LatexCommand ref
245
reference "sec:Command-line-encoder/decoder"
246 247 248 249 250 251 252 253 254 255 256 257 258

\end_inset

) command-line utilities.
 Bugs reported based on 3rd party code are both harder to find and far too
 often caused by errors that have nothing to do with Speex.
 
\end_layout

\begin_layout Section
About this document
\end_layout

jm's avatar
jm committed
259
\begin_layout Standard
jm's avatar
...  
jm committed
260 261
This document is divided in the following way.
 Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
262 263
\begin_inset CommandInset ref
LatexCommand ref
264
reference "sec:Feature-description"
jm's avatar
...  
jm committed
265

jm's avatar
jm committed
266
\end_inset
jm's avatar
...  
jm committed
267

268 269
 describes the different Speex features and defines many basic terms that
 are used throughout this manual.
jm's avatar
...  
jm committed
270
 Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
271 272
\begin_inset CommandInset ref
LatexCommand ref
273
reference "sec:Command-line-encoder/decoder"
jm's avatar
...  
jm committed
274

jm's avatar
jm committed
275
\end_inset
jm's avatar
...  
jm committed
276

277 278
 documents the standard command-line tools provided in the Speex distribution.
 Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
279 280
\begin_inset CommandInset ref
LatexCommand ref
281
reference "sec:Programming-with-Speex"
jm's avatar
...  
jm committed
282

jm's avatar
jm committed
283
\end_inset
jm's avatar
...  
jm committed
284

285
 includes detailed instructions about programming using the libspeex
Jean-Marc Valin's avatar
Jean-Marc Valin committed
286 287 288 289 290 291
\begin_inset Index
status collapsed

\begin_layout Plain Layout
libspeex
\end_layout
292 293 294 295

\end_inset

 API.
jm's avatar
...  
jm committed
296
 Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
297 298
\begin_inset CommandInset ref
LatexCommand ref
299
reference "sec:Formats-and-standards"
jm's avatar
...  
jm committed
300

jm's avatar
jm committed
301
\end_inset
jm's avatar
...  
jm committed
302 303

 has some information related to Speex and standards.
304 305 306 307 308 309 310 311 312
 
\end_layout

\begin_layout Standard
The three last sections describe the algorithms used in Speex.
 These sections require signal processing knowledge, but are not required
 for merely using Speex.
 They are intended for people who want to understand how Speex really works
 and/or want to do research based on Speex.
jm's avatar
...  
jm committed
313
 Section 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
314 315
\begin_inset CommandInset ref
LatexCommand ref
316
reference "sec:Introduction-to-CELP"
jm's avatar
...  
jm committed
317

jm's avatar
jm committed
318
\end_inset
jm's avatar
...  
jm committed
319 320

 explains the general idea behind CELP, while sections 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
321 322
\begin_inset CommandInset ref
LatexCommand ref
323
reference "sec:Speex-narrowband-mode"
jm's avatar
...  
jm committed
324

jm's avatar
jm committed
325
\end_inset
jm's avatar
...  
jm committed
326 327

 and 
Jean-Marc Valin's avatar
Jean-Marc Valin committed
328 329
\begin_inset CommandInset ref
LatexCommand ref
330
reference "sec:Speex-wideband-mode"
jm's avatar
...  
jm committed
331

jm's avatar
jm committed
332
\end_inset
jm's avatar
...  
jm committed
333 334

 are specific to Speex.
jm's avatar
jm committed
335 336 337
\end_layout

\begin_layout Standard
Jean-Marc Valin's avatar
Jean-Marc Valin committed
338 339
\begin_inset Newpage newpage
\end_inset
jm's avatar
jm committed
340 341 342 343


\end_layout

344
\begin_layout Chapter
jm's avatar
jm committed
345
Codec description
Jean-Marc Valin's avatar
Jean-Marc Valin committed
346 347
\begin_inset CommandInset label
LatexCommand label
348
name "sec:Feature-description"
jm's avatar
...  
jm committed
349

jm's avatar
jm committed
350
\end_inset
jm's avatar
...  
jm committed
351 352


jm's avatar
jm committed
353
\end_layout
jm's avatar
jm committed
354

jm's avatar
jm committed
355
\begin_layout Standard
356
This section describes Speex and its features into more details.
jm's avatar
jm committed
357
\end_layout
jm's avatar
jm committed
358

359
\begin_layout Section
jm's avatar
jm committed
360
Concepts
jm's avatar
jm committed
361
\end_layout
jm's avatar
jm committed
362

jm's avatar
jm committed
363
\begin_layout Standard
jm's avatar
jm committed
364 365
Before introducing all the Speex features, here are some concepts in speech
 coding that help better understand the rest of the manual.
366 367
 Although some are general concepts in speech/audio processing, others are
 specific to Speex.
jm's avatar
jm committed
368
\end_layout
369

jm's avatar
jm committed
370
\begin_layout Subsection*
371
Sampling rate
Jean-Marc Valin's avatar
Jean-Marc Valin committed
372 373 374 375 376 377
\begin_inset Index
status collapsed

\begin_layout Plain Layout
sampling rate
\end_layout
jm's avatar
...  
jm committed
378

jm's avatar
jm committed
379
\end_inset
jm's avatar
...  
jm committed
380 381


jm's avatar
jm committed
382
\end_layout
383

jm's avatar
jm committed
384
\begin_layout Standard
385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403
The sampling rate expressed in Hertz (Hz) is the number of samples taken
 from a signal per second.
 For a sampling rate of 
\begin_inset Formula $F_{s}$
\end_inset

 kHz, the highest frequency that can be represented is equal to 
\begin_inset Formula $F_{s}/2$
\end_inset

 kHz (
\begin_inset Formula $F_{s}/2$
\end_inset

 is known as the Nyquist frequency).
 This is a fundamental property in signal processing and is described by
 the sampling theorem.
 Speex is mainly designed for three different sampling rates: 8 kHz, 16
 kHz, and 32 kHz.
404
 These are respectively referred to as narrowband
Jean-Marc Valin's avatar
Jean-Marc Valin committed
405 406 407 408 409 410
\begin_inset Index
status collapsed

\begin_layout Plain Layout
narrowband
\end_layout
411

jm's avatar
jm committed
412
\end_inset
413

414
, wideband
Jean-Marc Valin's avatar
Jean-Marc Valin committed
415 416 417 418 419 420
\begin_inset Index
status collapsed

\begin_layout Plain Layout
wideband
\end_layout
421

jm's avatar
jm committed
422
\end_inset
423

424
 and ultra-wideband
Jean-Marc Valin's avatar
Jean-Marc Valin committed
425 426 427 428 429 430
\begin_inset Index
status collapsed

\begin_layout Plain Layout
ultra-wideband
\end_layout
431

jm's avatar
jm committed
432
\end_inset
433

434
.
435 436
 
\end_layout
437

438 439 440
\begin_layout Subsection*
Bit-rate
\end_layout
441

442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
\begin_layout Standard
When encoding a speech signal, the bit-rate is defined as the number of
 bits per unit of time required to encode the speech.
 It is measured in 
\emph on
bits per second
\emph default
 (bps), or generally 
\emph on
kilobits per second
\emph default
.
 It is important to make the distinction between 
\emph on
kilo
\series bold
bits
\series default
460 461 462 463
\emph default
 
\emph on
per second
464 465 466 467 468 469 470 471 472 473 474
\emph default
 (k
\series bold
b
\series default
ps) and 
\emph on
kilo
\series bold
bytes
\series default
475 476 477 478
\emph default
 
\emph on
per second
479 480 481 482 483 484
\emph default
 (k
\series bold
B
\series default
ps).
jm's avatar
jm committed
485
\end_layout
486

jm's avatar
jm committed
487
\begin_layout Subsection*
488
Quality
Jean-Marc Valin's avatar
Jean-Marc Valin committed
489 490 491 492 493 494
\begin_inset Index
status collapsed

\begin_layout Plain Layout
quality
\end_layout
495

jm's avatar
jm committed
496
\end_inset
497

498
 (variable)
jm's avatar
jm committed
499
\end_layout
jm's avatar
misc.  
jm committed
500

jm's avatar
jm committed
501
\begin_layout Standard
502 503 504
Speex is a lossy codec, which means that it achieves compression at the
 expense of fidelity of the input speech signal.
 Unlike some other speech codecs, it is possible to control the trade-off
505 506 507
 made between quality and bit-rate.
 The Speex encoding process is controlled most of the time by a quality
 parameter that ranges from 0 to 10.
508
 In constant bit-rate
Jean-Marc Valin's avatar
Jean-Marc Valin committed
509 510 511 512 513 514
\begin_inset Index
status collapsed

\begin_layout Plain Layout
constant bit-rate
\end_layout
515

jm's avatar
jm committed
516
\end_inset
517

518 519 520
 (CBR) operation, the quality parameter is an integer, while for variable
 bit-rate (VBR), the parameter is a float.
 
jm's avatar
jm committed
521
\end_layout
522

jm's avatar
jm committed
523
\begin_layout Subsection*
524
Complexity
Jean-Marc Valin's avatar
Jean-Marc Valin committed
525 526 527 528 529 530
\begin_inset Index
status collapsed

\begin_layout Plain Layout
complexity
\end_layout
531

jm's avatar
jm committed
532
\end_inset
jm's avatar
jm committed
533

534
 (variable)
jm's avatar
jm committed
535
\end_layout
jm's avatar
jm committed
536

jm's avatar
jm committed
537
\begin_layout Standard
538 539 540 541
With Speex, it is possible to vary the complexity allowed for the encoder.
 This is done by controlling how the search is performed with an integer
 ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
 
jm's avatar
jm committed
542
\emph on
543
gzip
jm's avatar
jm committed
544
\emph default
545
 and 
jm's avatar
jm committed
546
\emph on
547
bzip2
jm's avatar
jm committed
548
\emph default
549
 compression utilities.
550
 For normal use, the noise level at complexity 1 is between 1 and 2 dB higher
551
 than at complexity 10, but the CPU requirements for complexity 10 is about
jm's avatar
jm committed
552
 5 times higher than for complexity 1.
553 554
 In practice, the best trade-off is between complexity 2 and 4, though higher
 settings are often useful when encoding non-speech sounds like DTMF
Jean-Marc Valin's avatar
Jean-Marc Valin committed
555 556 557 558 559 560
\begin_inset Index
status collapsed

\begin_layout Plain Layout
DTMF
\end_layout
561

jm's avatar
jm committed
562
\end_inset
jm's avatar
jm committed
563

564
 tones.
jm's avatar
jm committed
565
\end_layout
566

jm's avatar
jm committed
567
\begin_layout Subsection*
568
Variable Bit-Rate
Jean-Marc Valin's avatar
Jean-Marc Valin committed
569 570 571 572 573 574
\begin_inset Index
status collapsed

\begin_layout Plain Layout
variable bit-rate
\end_layout
575

jm's avatar
jm committed
576
\end_inset
577

578
 (VBR)
jm's avatar
jm committed
579
\end_layout
jm's avatar
jm committed
580

jm's avatar
jm committed
581
\begin_layout Standard
582 583 584
Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
 to adapt to the 
\begin_inset Quotes eld
jm's avatar
jm committed
585
\end_inset
jm's avatar
jm committed
586

587 588
difficulty
\begin_inset Quotes erd
jm's avatar
jm committed
589
\end_inset
590

591 592 593 594
 of the audio being encoded.
 In the example of Speex, sounds like vowels and high-energy transients
 require a higher bit-rate to achieve good quality, while fricatives (e.g.
 s,f sounds) can be coded adequately with less bits.
595
 For this reason, VBR can achieve lower bit-rate for the same quality, or
596 597 598 599 600 601
 a better quality for a certain bit-rate.
 Despite its advantages, VBR has two main drawbacks: first, by only specifying
 quality, there's no guaranty about the final average bit-rate.
 Second, for some real-time applications like voice over IP (VoIP), what
 counts is the maximum bit-rate, which must be low enough for the communication
 channel.
jm's avatar
jm committed
602
\end_layout
603

jm's avatar
jm committed
604
\begin_layout Subsection*
605
Average Bit-Rate
Jean-Marc Valin's avatar
Jean-Marc Valin committed
606 607 608 609 610 611
\begin_inset Index
status collapsed

\begin_layout Plain Layout
average bit-rate
\end_layout
612

jm's avatar
jm committed
613
\end_inset
614

615
 (ABR)
jm's avatar
jm committed
616
\end_layout
617

jm's avatar
jm committed
618
\begin_layout Standard
619 620 621
Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
 VBR quality in order to meet a specific target bit-rate.
 Because the quality/bit-rate is adjusted in real-time (open-loop), the
jm's avatar
jm committed
622
 global quality will be slightly lower than that obtained by encoding in
623
 VBR with exactly the right quality setting to meet the target average bit-rate.
jm's avatar
jm committed
624
\end_layout
625

jm's avatar
jm committed
626
\begin_layout Subsection*
627
Voice Activity Detection
Jean-Marc Valin's avatar
Jean-Marc Valin committed
628 629 630 631 632 633
\begin_inset Index
status collapsed

\begin_layout Plain Layout
voice activity detection
\end_layout
jm's avatar
...  
jm committed
634

jm's avatar
jm committed
635
\end_inset
636

637
 (VAD)
jm's avatar
jm committed
638
\end_layout
639

jm's avatar
jm committed
640
\begin_layout Standard
641 642 643 644 645 646 647 648
When enabled, voice activity detection detects whether the audio being encoded
 is speech or silence/background noise.
 VAD is always implicitly activated when encoding in VBR, so the option
 is only useful in non-VBR operation.
 In this case, Speex detects non-speech periods and encode them with just
 enough bits to reproduce the background noise.
 This is called 
\begin_inset Quotes eld
jm's avatar
jm committed
649
\end_inset
650

651 652
comfort noise generation
\begin_inset Quotes erd
jm's avatar
jm committed
653
\end_inset
654

655
 (CNG).
jm's avatar
jm committed
656
\end_layout
657

jm's avatar
jm committed
658
\begin_layout Subsection*
659
Discontinuous Transmission
Jean-Marc Valin's avatar
Jean-Marc Valin committed
660 661 662 663 664 665
\begin_inset Index
status collapsed

\begin_layout Plain Layout
discontinuous transmission
\end_layout
666

jm's avatar
jm committed
667
\end_inset
668

669
 (DTX)
jm's avatar
jm committed
670
\end_layout
671

jm's avatar
jm committed
672
\begin_layout Standard
jm's avatar
...  
jm committed
673
Discontinuous transmission is an addition to VAD/VBR operation, that allows
jm's avatar
jm committed
674
 to stop transmitting completely when the background noise is stationary.
675 676
 In file-based operation, since we cannot just stop writing to the file,
 only 5 bits are used for such frames (corresponding to 250 bps).
jm's avatar
jm committed
677
\end_layout
678

jm's avatar
jm committed
679
\begin_layout Subsection*
680
Perceptual enhancement
Jean-Marc Valin's avatar
Jean-Marc Valin committed
681 682 683 684 685 686
\begin_inset Index
status collapsed

\begin_layout Plain Layout
perceptual enhancement
\end_layout
687

jm's avatar
jm committed
688
\end_inset
689 690


jm's avatar
jm committed
691
\end_layout
692

jm's avatar
jm committed
693
\begin_layout Standard
694 695 696 697 698
Perceptual enhancement is a part of the decoder which, when turned on, attempts
 to reduce the perception of the noise/distortion produced by the encoding/decod
ing process.
 In most cases, perceptual enhancement brings the sound further from the
 original 
jm's avatar
jm committed
699
\emph on
700
objectively
jm's avatar
jm committed
701
\emph default
702 703
 (e.g.
 considering only SNR), but in the end it still 
jm's avatar
jm committed
704
\emph on
705
sounds
jm's avatar
jm committed
706
\emph default
707
 better (subjective improvement).
jm's avatar
jm committed
708
\end_layout
709

jm's avatar
jm committed
710
\begin_layout Subsection*
711
Latency and algorithmic delay
Jean-Marc Valin's avatar
Jean-Marc Valin committed
712 713 714 715 716 717
\begin_inset Index
status collapsed

\begin_layout Plain Layout
algorithmic delay
\end_layout
718

jm's avatar
jm committed
719
\end_inset
720 721


jm's avatar
jm committed
722
\end_layout
723

jm's avatar
jm committed
724
\begin_layout Standard
725 726 727
Every speech codec introduces a delay in the transmission.
 For Speex, this delay is equal to the frame size, plus some amount of 
\begin_inset Quotes eld
jm's avatar
jm committed
728
\end_inset
729

730 731
look-ahead
\begin_inset Quotes erd
jm's avatar
jm committed
732
\end_inset
733

734 735 736 737 738
 required to process each frame.
 In narrowband operation (8 kHz), the delay is 30 ms, while for wideband
 (16 kHz), the delay is 34 ms.
 These values don't account for the CPU time it takes to encode or decode
 the frames.
jm's avatar
jm committed
739
\end_layout
jm's avatar
jm committed
740

741
\begin_layout Section
jm's avatar
jm committed
742
Codec
jm's avatar
jm committed
743
\end_layout
jm's avatar
jm committed
744

jm's avatar
jm committed
745 746 747 748 749 750
\begin_layout Standard
The main characteristics of Speex can be summarized as follows:
\end_layout

\begin_layout Itemize
Free software/open-source
Jean-Marc Valin's avatar
Jean-Marc Valin committed
751 752 753 754 755 756
\begin_inset Index
status collapsed

\begin_layout Plain Layout
open-source
\end_layout
jm's avatar
jm committed
757 758 759 760

\end_inset

, patent
Jean-Marc Valin's avatar
Jean-Marc Valin committed
761 762 763 764 765 766
\begin_inset Index
status collapsed

\begin_layout Plain Layout
patent
\end_layout
jm's avatar
jm committed
767 768 769 770 771 772 773 774

\end_inset

 and royalty-free
\end_layout

\begin_layout Itemize
Integration of narrowband
Jean-Marc Valin's avatar
Jean-Marc Valin committed
775 776 777 778 779 780
\begin_inset Index
status collapsed

\begin_layout Plain Layout
narrowband
\end_layout
jm's avatar
jm committed
781 782 783 784

\end_inset

 and wideband
Jean-Marc Valin's avatar
Jean-Marc Valin committed
785 786 787 788 789 790
\begin_inset Index
status collapsed

\begin_layout Plain Layout
wideband
\end_layout
jm's avatar
jm committed
791 792 793 794 795 796 797 798 799 800 801 802

\end_inset

 using an embedded bit-stream
\end_layout

\begin_layout Itemize
Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
\end_layout

\begin_layout Itemize
Dynamic bit-rate switching (AMR) and Variable Bit-Rate
Jean-Marc Valin's avatar
Jean-Marc Valin committed
803 804 805 806 807 808
\begin_inset Index
status collapsed

\begin_layout Plain Layout
variable bit-rate
\end_layout
jm's avatar
jm committed
809 810 811 812 813 814 815 816

\end_inset

 (VBR) operation
\end_layout

\begin_layout Itemize
Voice Activity Detection
Jean-Marc Valin's avatar
Jean-Marc Valin committed
817 818 819 820 821 822
\begin_inset Index
status collapsed

\begin_layout Plain Layout
voice activity detection
\end_layout
jm's avatar
jm committed
823 824 825 826 827 828 829 830

\end_inset

 (VAD, integrated with VBR) and discontinuous transmission (DTX)
\end_layout

\begin_layout Itemize
Variable complexity
Jean-Marc Valin's avatar
Jean-Marc Valin committed
831 832 833 834 835 836
\begin_inset Index
status collapsed

\begin_layout Plain Layout
complexity
\end_layout
jm's avatar
jm committed
837 838 839 840 841 842 843 844 845 846 847

\end_inset


\end_layout

\begin_layout Itemize
Embedded wideband structure (scalable sampling rate)
\end_layout

\begin_layout Itemize
848
Ultra-wideband sampling rate at 32 kHz
jm's avatar
jm committed
849 850 851 852 853 854 855
\end_layout

\begin_layout Itemize
Intensity stereo encoding option
\end_layout

\begin_layout Itemize
856
Fixed-point implementation
jm's avatar
jm committed
857 858
\end_layout

859
\begin_layout Section
jm's avatar
jm committed
860
Preprocessor
jm's avatar
jm committed
861
\end_layout
jm's avatar
jm committed
862

jm's avatar
jm committed
863
\begin_layout Standard
jm's avatar
jm committed
864 865
This part refers to the preprocessor module introduced in the 1.1.x branch.
 The preprocessor is designed to be used on the audio 
jm's avatar
jm committed
866
\emph on
jm's avatar
jm committed
867
before
jm's avatar
jm committed
868
\emph default
jm's avatar
jm committed
869 870
 running the encoder.
 The preprocessor provides three main functionalities:
jm's avatar
jm committed
871
\end_layout
jm's avatar
jm committed
872

jm's avatar
jm committed
873
\begin_layout Itemize
874
noise suppression
jm's avatar
jm committed
875
\end_layout
jm's avatar
jm committed
876

jm's avatar
jm committed
877
\begin_layout Itemize
jm's avatar
jm committed
878
automatic gain control (AGC)
jm's avatar
jm committed
879
\end_layout
jm's avatar
jm committed
880

jm's avatar
jm committed
881
\begin_layout Itemize
jm's avatar
jm committed
882
voice activity detection (VAD)
jm's avatar
jm committed
883
\end_layout
jm's avatar
jm committed
884

jm's avatar
jm committed
885
\begin_layout Standard
jm's avatar
jm committed
886 887 888 889 890 891 892 893 894
The denoiser can be used to reduce the amount of background noise present
 in the input signal.
 This provides higher quality speech whether or not the denoised signal
 is encoded with Speex (or at all).
 However, when using the denoised signal with the codec, there is an additional
 benefit.
 Speech codecs in general (Speex included) tend to perform poorly on noisy
 input, which tends to amplify the noise.
 The denoiser greatly reduces this effect.
jm's avatar
jm committed
895
\end_layout
jm's avatar
jm committed
896

jm's avatar
jm committed
897
\begin_layout Standard
jm's avatar
jm committed
898 899 900 901 902 903 904
Automatic gain control (AGC) is a feature that deals with the fact that
 the recording volume may vary by a large amount between different setups.
 The AGC provides a way to adjust a signal to a reference volume.
 This is useful for voice over IP because it removes the need for manual
 adjustment of the microphone gain.
 A secondary advantage is that by setting the microphone gain to a conservative
 (low) level, it is easier to avoid clipping.
jm's avatar
jm committed
905
\end_layout
jm's avatar
jm committed
906

jm's avatar
jm committed
907
\begin_layout Standard
jm's avatar
jm committed
908 909 910
The voice activity detector (VAD) provided by the preprocessor is more advanced
 than the one directly provided in the codec.
 
jm's avatar
jm committed
911
\end_layout
jm's avatar
jm committed
912

913
\begin_layout Section
jm's avatar
jm committed
914
Adaptive Jitter Buffer
jm's avatar
jm committed
915
\end_layout
jm's avatar
jm committed
916