manual.lyx 223 KB
 Jean-Marc Valin committed Nov 09, 2008 1 2 #LyX 1.6.0rc2 created this file. For more info see http://www.lyx.org/ \lyxformat 340  jm committed Aug 12, 2006 3 4 \begin_document \begin_header  jm committed Nov 13, 2006 5 \textclass scrbook  jmvalin committed Jul 23, 2002 6 7 \language english \inputencoding auto  jm committed Jun 05, 2007 8 9 10 11 12 13 14 15 \font_roman times \font_sans helvet \font_typewriter courier \font_default_family default \font_sc false \font_osf false \font_sf_scale 100 \font_tt_scale 100  Jean-Marc Valin committed Nov 09, 2008 16   jmvalin committed Jul 23, 2002 17 \graphics default  jm committed Nov 13, 2006 18 \paperfontsize 10  jm committed Apr 30, 2007 19 \spacing single  Jean-Marc Valin committed Nov 09, 2008 20 \use_hyperref false  jm committed Mar 17, 2003 21 \papersize letterpaper  jm committed Nov 13, 2006 22 23 \use_geometry true \use_amsmath 2  jm committed Jun 05, 2007 24 \use_esint 0  jm committed Aug 12, 2006 25 26 \cite_engine basic \use_bibtopic false  jmvalin committed Jul 23, 2002 27 \paperorientation portrait  jm committed Nov 13, 2006 28 29 30 31 \leftmargin 2cm \topmargin 2cm \rightmargin 2cm \bottommargin 2cm  jmvalin committed Jul 23, 2002 32 33 34 35 36 37 38 39 \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language english \papercolumns 1 \papersides 1 \paperpagestyle headings  jm committed Oct 14, 2007 40 \listings_params "basicstyle={\ttfamily},breaklines=true,language=C,xleftmargin=0mm"  jm committed Aug 12, 2006 41 \tracking_changes false  jm committed Jun 05, 2007 42 \output_changes false  jm committed Oct 04, 2007 43 \author ""  jm committed Oct 14, 2007 44 \author ""  jm committed Aug 12, 2006 45 \end_header  jmvalin committed Jul 23, 2002 46   jm committed Aug 12, 2006 47 \begin_body  jmvalin committed Jul 23, 2002 48   jm committed Aug 12, 2006 49 \begin_layout Title  jm committed May 27, 2008 50 The Speex Manual  Jean-Marc Valin committed Nov 09, 2008 51 52 53 \begin_inset Newline newline \end_inset  jm committed May 18, 2008 54 Version 1.2  jm committed Aug 12, 2006 55 \end_layout  jmvalin committed Jul 23, 2002 56   jm committed Aug 12, 2006 57 \begin_layout Author  jmvalin committed Jul 23, 2002 58 Jean-Marc Valin  jm committed Aug 12, 2006 59 60 61 \end_layout \begin_layout Standard  Jean-Marc Valin committed Nov 09, 2008 62 63 \begin_inset Newpage newpage \end_inset  jm committed Aug 12, 2006 64   jm committed Apr 30, 2007 65 66 67 68 69 70 71 72  \end_layout \begin_layout Standard Copyright \begin_inset ERT status collapsed  Jean-Marc Valin committed Nov 09, 2008 73 \begin_layout Plain Layout  jm committed Apr 30, 2007 74 75 76 77 78 79 80 81  \backslash copyright \end_layout \end_inset  jm committed May 18, 2008 82  2002-2008 Jean-Marc Valin/Xiph.org Foundation  jm committed Aug 12, 2006 83 \end_layout  jmvalin committed Jul 23, 2002 84   jm committed Aug 12, 2006 85 \begin_layout Standard  jmvalin committed Jul 23, 2002 86 87 88 89 90 91 92 Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Section, with no Front-Cover Texts, and with no Back-Cover. A copy of the license is included in the section entitled "GNU Free Documentati on License".  jm committed Aug 12, 2006 93 94 95 \end_layout \begin_layout Standard  Jean-Marc Valin committed Nov 09, 2008 96 97 \begin_inset Newpage newpage \end_inset  jm committed Aug 12, 2006 98   jmvalin committed Jul 23, 2002 99   Jean-Marc Valin committed Nov 09, 2008 100 101 \begin_inset CommandInset toc LatexCommand tableofcontents  jmvalin committed Jul 23, 2002 102   jm committed Aug 12, 2006 103 \end_inset  jmvalin committed Jul 23, 2002 104 105   Jean-Marc Valin committed Nov 09, 2008 106 107 108 \begin_inset Newpage newpage \end_inset  jmvalin committed Jul 28, 2002 109   jm committed Aug 12, 2006 110 111 112 \end_layout \begin_layout Standard  jmvalin committed Jul 28, 2002 113 114 \begin_inset FloatList table  jm committed Aug 12, 2006 115 116 \end_inset  jmvalin committed Jul 28, 2002 117   Jean-Marc Valin committed Nov 09, 2008 118 119 120 \begin_inset Newpage newpage \end_inset  jmvalin committed Jul 28, 2002 121   jm committed Aug 12, 2006 122 \end_layout  jmvalin committed Jul 23, 2002 123   jm committed Nov 13, 2006 124 \begin_layout Chapter  jmvalin committed Jul 23, 2002 125 Introduction to Speex  jm committed Aug 12, 2006 126 \end_layout  jmvalin committed Jul 23, 2002 127   jm committed Aug 12, 2006 128 \begin_layout Standard  jm committed Apr 30, 2007 129 The Speex codec (  jm committed Aug 12, 2006 130 \family typewriter  jm committed Nov 01, 2002 131 http://www.speex.org/  jm committed Aug 12, 2006 132 \family default  jm committed Apr 30, 2007 133 134 ) exists because there is a need for a speech codec that is open-source and free from software patent royalties.  jm committed Jun 05, 2007 135  These are essential conditions for being usable in any open-source software.  jm committed Apr 30, 2007 136 137  In essence, Speex is to speech what Vorbis is to audio/music. Unlike many other speech codecs, Speex is not designed for mobile phones  jm committed Jun 05, 2007 138  but rather for packet networks and voice over IP (VoIP) applications.  jm committed Apr 30, 2007 139  File-based compression is of course also supported.  jmvalin committed Aug 14, 2002 140   jm committed Aug 12, 2006 141 \end_layout  jmvalin committed Jul 23, 2002 142   jm committed Aug 12, 2006 143 \begin_layout Standard  jm committed Apr 30, 2007 144 145 146 147 148 The Speex codec is designed to be very flexible and support a wide range of speech quality and bit-rate. Support for very good quality speech also means that Speex can encode wideband speech (16 kHz sampling rate) in addition to narrowband speech (telephone quality, 8 kHz sampling rate).  jm committed Aug 12, 2006 149 \end_layout  jmvalin committed Jul 23, 2002 150   jm committed Aug 12, 2006 151 \begin_layout Standard  jm committed Apr 30, 2007 152 153 154 155 Designing for VoIP instead of mobile phones means that Speex is robust to lost packets, but not to corrupted ones. This is based on the assumption that in VoIP, packets either arrive unaltered or don't arrive at all.  jm committed Jun 05, 2007 156 157  Because Speex is targeted at a wide range of devices, it has modest (adjustable ) complexity and a small memory footprint.  jm committed Aug 12, 2006 158 \end_layout  jmvalin committed Jul 23, 2002 159   jm committed Aug 12, 2006 160 \begin_layout Standard  jm committed Apr 30, 2007 161 All the design goals led to the choice of CELP  Jean-Marc Valin committed Nov 09, 2008 162 163 164 165 166 167 \begin_inset Index status collapsed \begin_layout Plain Layout CELP \end_layout  jm committed Nov 01, 2002 168   jm committed Aug 12, 2006 169 \end_inset  jm committed Nov 01, 2002 170   jm committed Apr 30, 2007 171 172 173 174 175  as the encoding technique. One of the main reasons is that CELP has long proved that it could work reliably and scale well to both low bit-rates (e.g. DoD CELP @ 4.8 kbps) and high bit-rates (e.g. G.728 @ 16 kbps).  jmvalin committed Aug 14, 2002 176   jm committed Aug 12, 2006 177 \end_layout  jmvalin committed Aug 22, 2002 178   jm committed Apr 30, 2007 179 180 \begin_layout Section Getting help  Jean-Marc Valin committed Nov 09, 2008 181 182 \begin_inset CommandInset label LatexCommand label  jm committed Jun 05, 2007 183 name "sec:Getting-help"  jm committed Apr 30, 2007 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218  \end_inset \end_layout \begin_layout Standard As for many open source projects, there are many ways to get help with Speex. These include: \end_layout \begin_layout Itemize This manual \end_layout \begin_layout Itemize Other documentation on the Speex website (http://www.speex.org/) \end_layout \begin_layout Itemize Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not just for developers) \end_layout \begin_layout Itemize IRC: The main channel is #speex on irc.freenode.net. Note that due to time differences, it may take a while to get someone, so please be patient. \end_layout \begin_layout Itemize Email the author privately at jean-marc.valin@usherbrooke.ca \series bold only \series default  jm committed Feb 02, 2008 219  for private/delicate topics you do not wish to discuss publicly.  jm committed Apr 30, 2007 220 221 222 223 224 225 226 \end_layout \begin_layout Standard Before asking for help (mailing list or IRC), \series bold it is important to first read this manual \series default  jm committed Jun 05, 2007 227  (OK, so if you made it here it's already a good sign).  jm committed Apr 30, 2007 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242  It is generally considered rude to ask on a mailing list about topics that are clearly detailed in the documentation. On the other hand, it's perfectly OK (and encouraged) to ask for clarifications about something covered in the manual. This manual does not (yet) cover everything about Speex, so everyone is encouraged to ask questions, send comments, feature requests, or just let us know how Speex is being used. \end_layout \begin_layout Standard Here are some additional guidelines related to the mailing list. Before reporting bugs in Speex to the list, it is strongly recommended (if possible) to first test whether these bugs can be reproduced using the speexenc and speexdec (see Section  Jean-Marc Valin committed Nov 09, 2008 243 244 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 245 reference "sec:Command-line-encoder/decoder"  jm committed Apr 30, 2007 246 247 248 249 250 251 252 253 254 255 256 257 258  \end_inset ) command-line utilities. Bugs reported based on 3rd party code are both harder to find and far too often caused by errors that have nothing to do with Speex. \end_layout \begin_layout Section About this document \end_layout  jm committed Aug 12, 2006 259 \begin_layout Standard  jm committed Mar 17, 2003 260 261 This document is divided in the following way. Section  Jean-Marc Valin committed Nov 09, 2008 262 263 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 264 reference "sec:Feature-description"  jm committed Mar 17, 2003 265   jm committed Aug 12, 2006 266 \end_inset  jm committed Mar 17, 2003 267   jm committed Apr 30, 2007 268 269  describes the different Speex features and defines many basic terms that are used throughout this manual.  jm committed Mar 17, 2003 270  Section  Jean-Marc Valin committed Nov 09, 2008 271 272 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 273 reference "sec:Command-line-encoder/decoder"  jm committed Mar 17, 2003 274   jm committed Aug 12, 2006 275 \end_inset  jm committed Mar 17, 2003 276   jm committed Apr 30, 2007 277 278  documents the standard command-line tools provided in the Speex distribution. Section  Jean-Marc Valin committed Nov 09, 2008 279 280 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 281 reference "sec:Programming-with-Speex"  jm committed Mar 17, 2003 282   jm committed Aug 12, 2006 283 \end_inset  jm committed Mar 17, 2003 284   jm committed Apr 30, 2007 285  includes detailed instructions about programming using the libspeex  Jean-Marc Valin committed Nov 09, 2008 286 287 288 289 290 291 \begin_inset Index status collapsed \begin_layout Plain Layout libspeex \end_layout  jm committed Apr 30, 2007 292 293 294 295  \end_inset API.  jm committed Mar 17, 2003 296  Section  Jean-Marc Valin committed Nov 09, 2008 297 298 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 299 reference "sec:Formats-and-standards"  jm committed Mar 17, 2003 300   jm committed Aug 12, 2006 301 \end_inset  jm committed Mar 17, 2003 302 303  has some information related to Speex and standards.  jm committed Apr 30, 2007 304 305 306 307 308 309 310 311 312  \end_layout \begin_layout Standard The three last sections describe the algorithms used in Speex. These sections require signal processing knowledge, but are not required for merely using Speex. They are intended for people who want to understand how Speex really works and/or want to do research based on Speex.  jm committed Mar 17, 2003 313  Section  Jean-Marc Valin committed Nov 09, 2008 314 315 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 316 reference "sec:Introduction-to-CELP"  jm committed Mar 17, 2003 317   jm committed Aug 12, 2006 318 \end_inset  jm committed Mar 17, 2003 319 320  explains the general idea behind CELP, while sections  Jean-Marc Valin committed Nov 09, 2008 321 322 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 323 reference "sec:Speex-narrowband-mode"  jm committed Mar 17, 2003 324   jm committed Aug 12, 2006 325 \end_inset  jm committed Mar 17, 2003 326 327  and  Jean-Marc Valin committed Nov 09, 2008 328 329 \begin_inset CommandInset ref LatexCommand ref  jm committed Jun 05, 2007 330 reference "sec:Speex-wideband-mode"  jm committed Mar 17, 2003 331   jm committed Aug 12, 2006 332 \end_inset  jm committed Mar 17, 2003 333 334  are specific to Speex.  jm committed Aug 12, 2006 335 336 337 \end_layout \begin_layout Standard  Jean-Marc Valin committed Nov 09, 2008 338 339 \begin_inset Newpage newpage \end_inset  jm committed Aug 12, 2006 340 341 342 343  \end_layout  jm committed Nov 13, 2006 344 \begin_layout Chapter  jm committed Jul 09, 2004 345 Codec description  Jean-Marc Valin committed Nov 09, 2008 346 347 \begin_inset CommandInset label LatexCommand label  jm committed Jun 05, 2007 348 name "sec:Feature-description"  jm committed Mar 17, 2003 349   jm committed Aug 12, 2006 350 \end_inset  jm committed Mar 17, 2003 351 352   jm committed Aug 12, 2006 353 \end_layout  jm committed Dec 31, 2002 354   jm committed Aug 12, 2006 355 \begin_layout Standard  jm committed Apr 30, 2007 356 This section describes Speex and its features into more details.  jm committed Aug 12, 2006 357 \end_layout  jm committed Jul 09, 2004 358   jm committed Nov 13, 2006 359 \begin_layout Section  jm committed Jul 09, 2004 360 Concepts  jm committed Aug 12, 2006 361 \end_layout  jm committed Jul 09, 2004 362   jm committed Aug 12, 2006 363 \begin_layout Standard  jm committed Mar 27, 2007 364 365 Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual.  jm committed Apr 30, 2007 366 367  Although some are general concepts in speech/audio processing, others are specific to Speex.  jm committed Aug 12, 2006 368 \end_layout  jm committed Feb 17, 2003 369   jm committed Aug 12, 2006 370 \begin_layout Subsection*  jm committed Feb 17, 2003 371 Sampling rate  Jean-Marc Valin committed Nov 09, 2008 372 373 374 375 376 377 \begin_inset Index status collapsed \begin_layout Plain Layout sampling rate \end_layout  jm committed Nov 01, 2002 378   jm committed Aug 12, 2006 379 \end_inset  jm committed Nov 01, 2002 380 381   jm committed Aug 12, 2006 382 \end_layout  jmvalin committed Jul 23, 2002 383   jm committed Aug 12, 2006 384 \begin_layout Standard  jm committed Apr 30, 2007 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate of \begin_inset Formula $F_{s}$ \end_inset kHz, the highest frequency that can be represented is equal to \begin_inset Formula $F_{s}/2$ \end_inset kHz ( \begin_inset Formula $F_{s}/2$ \end_inset is known as the Nyquist frequency). This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz.  jm committed Feb 02, 2008 404  These are respectively referred to as narrowband  Jean-Marc Valin committed Nov 09, 2008 405 406 407 408 409 410 \begin_inset Index status collapsed \begin_layout Plain Layout narrowband \end_layout  jmvalin committed Jul 23, 2002 411   jm committed Aug 12, 2006 412 \end_inset  jmvalin committed Jul 23, 2002 413   jm committed Feb 17, 2003 414 , wideband  Jean-Marc Valin committed Nov 09, 2008 415 416 417 418 419 420 \begin_inset Index status collapsed \begin_layout Plain Layout wideband \end_layout  jmvalin committed Jul 23, 2002 421   jm committed Aug 12, 2006 422 \end_inset  jmvalin committed Jul 23, 2002 423   jm committed Feb 17, 2003 424  and ultra-wideband  Jean-Marc Valin committed Nov 09, 2008 425 426 427 428 429 430 \begin_inset Index status collapsed \begin_layout Plain Layout ultra-wideband \end_layout  jmvalin committed Jul 23, 2002 431   jm committed Aug 12, 2006 432 \end_inset  jmvalin committed Jul 23, 2002 433   jm committed Feb 17, 2003 434 .  jm committed Apr 30, 2007 435 436  \end_layout  jm committed Aug 12, 2006 437   jm committed Apr 30, 2007 438 439 440 \begin_layout Subsection* Bit-rate \end_layout  jm committed Aug 12, 2006 441   jm committed Apr 30, 2007 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 \begin_layout Standard When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It is measured in \emph on bits per second \emph default (bps), or generally \emph on kilobits per second \emph default . It is important to make the distinction between \emph on kilo \series bold bits \series default  jm committed Jun 05, 2007 460 461 462 463 \emph default \emph on per second  jm committed Apr 30, 2007 464 465 466 467 468 469 470 471 472 473 474 \emph default (k \series bold b \series default ps) and \emph on kilo \series bold bytes \series default  jm committed Jun 05, 2007 475 476 477 478 \emph default \emph on per second  jm committed Apr 30, 2007 479 480 481 482 483 484 \emph default (k \series bold B \series default ps).  jm committed Aug 12, 2006 485 \end_layout  jm committed Feb 17, 2003 486   jm committed Aug 12, 2006 487 \begin_layout Subsection*  jm committed Feb 17, 2003 488 Quality  Jean-Marc Valin committed Nov 09, 2008 489 490 491 492 493 494 \begin_inset Index status collapsed \begin_layout Plain Layout quality \end_layout  jm committed Feb 17, 2003 495   jm committed Aug 12, 2006 496 \end_inset  jmvalin committed Jul 23, 2002 497   jm committed Apr 30, 2007 498  (variable)  jm committed Aug 12, 2006 499 \end_layout  jm committed Jan 11, 2003 500   jm committed Aug 12, 2006 501 \begin_layout Standard  jm committed Feb 02, 2008 502 503 504 Speex is a lossy codec, which means that it achieves compression at the expense of fidelity of the input speech signal. Unlike some other speech codecs, it is possible to control the trade-off  jm committed Apr 30, 2007 505 506 507  made between quality and bit-rate. The Speex encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10.  jm committed Feb 17, 2003 508  In constant bit-rate  Jean-Marc Valin committed Nov 09, 2008 509 510 511 512 513 514 \begin_inset Index status collapsed \begin_layout Plain Layout constant bit-rate \end_layout  jmvalin committed Jul 23, 2002 515   jm committed Aug 12, 2006 516 \end_inset  jmvalin committed Jul 23, 2002 517   jm committed Feb 17, 2003 518 519 520  (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.  jm committed Aug 12, 2006 521 \end_layout  jmvalin committed Jul 23, 2002 522   jm committed Aug 12, 2006 523 \begin_layout Subsection*  jm committed Feb 17, 2003 524 Complexity  Jean-Marc Valin committed Nov 09, 2008 525 526 527 528 529 530 \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout  jmvalin committed Jul 23, 2002 531   jm committed Aug 12, 2006 532 \end_inset  jm committed Nov 11, 2002 533   jm committed Feb 17, 2003 534  (variable)  jm committed Aug 12, 2006 535 \end_layout  jm committed Nov 11, 2002 536   jm committed Aug 12, 2006 537 \begin_layout Standard  jm committed Feb 17, 2003 538 539 540 541 With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to  jm committed Aug 12, 2006 542 \emph on  jm committed Feb 17, 2003 543 gzip  jm committed Aug 12, 2006 544 \emph default  jm committed Feb 17, 2003 545  and  jm committed Aug 12, 2006 546 \emph on  jm committed Feb 17, 2003 547 bzip2  jm committed Aug 12, 2006 548 \emph default  jm committed Feb 17, 2003 549  compression utilities.  jm committed Mar 23, 2003 550  For normal use, the noise level at complexity 1 is between 1 and 2 dB higher  jm committed Feb 17, 2003 551  than at complexity 10, but the CPU requirements for complexity 10 is about  jm committed Mar 22, 2003 552  5 times higher than for complexity 1.  jm committed Feb 17, 2003 553 554  In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF  Jean-Marc Valin committed Nov 09, 2008 555 556 557 558 559 560 \begin_inset Index status collapsed \begin_layout Plain Layout DTMF \end_layout  jmvalin committed Jul 23, 2002 561   jm committed Aug 12, 2006 562 \end_inset  jm committed Nov 11, 2002 563   jm committed Feb 17, 2003 564  tones.  jm committed Aug 12, 2006 565 \end_layout  jmvalin committed Jul 23, 2002 566   jm committed Aug 12, 2006 567 \begin_layout Subsection*  jm committed Feb 17, 2003 568 Variable Bit-Rate  Jean-Marc Valin committed Nov 09, 2008 569 570 571 572 573 574 \begin_inset Index status collapsed \begin_layout Plain Layout variable bit-rate \end_layout  jmvalin committed Jul 23, 2002 575   jm committed Aug 12, 2006 576 \end_inset  jmvalin committed Jul 23, 2002 577   jm committed Feb 17, 2003 578  (VBR)  jm committed Aug 12, 2006 579 \end_layout  jm committed Nov 11, 2002 580   jm committed Aug 12, 2006 581 \begin_layout Standard  jm committed Feb 17, 2003 582 583 584 Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the \begin_inset Quotes eld  jm committed Aug 12, 2006 585 \end_inset  jm committed Nov 11, 2002 586   jm committed Feb 17, 2003 587 588 difficulty \begin_inset Quotes erd  jm committed Aug 12, 2006 589 \end_inset  jmvalin committed Jul 23, 2002 590   jm committed Feb 17, 2003 591 592 593 594  of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits.  jm committed Feb 02, 2008 595  For this reason, VBR can achieve lower bit-rate for the same quality, or  jm committed Feb 17, 2003 596 597 598 599 600 601  a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.  jm committed Aug 12, 2006 602 \end_layout  jmvalin committed Jul 23, 2002 603   jm committed Aug 12, 2006 604 \begin_layout Subsection*  jm committed Feb 17, 2003 605 Average Bit-Rate  Jean-Marc Valin committed Nov 09, 2008 606 607 608 609 610 611 \begin_inset Index status collapsed \begin_layout Plain Layout average bit-rate \end_layout  jmvalin committed Jul 23, 2002 612   jm committed Aug 12, 2006 613 \end_inset  jmvalin committed Jul 23, 2002 614   jm committed Feb 17, 2003 615  (ABR)  jm committed Aug 12, 2006 616 \end_layout  jmvalin committed Jul 23, 2002 617   jm committed Aug 12, 2006 618 \begin_layout Standard  jm committed Feb 17, 2003 619 620 621 Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the  jm committed Mar 22, 2003 622  global quality will be slightly lower than that obtained by encoding in  jm committed Feb 17, 2003 623  VBR with exactly the right quality setting to meet the target average bit-rate.  jm committed Aug 12, 2006 624 \end_layout  jmvalin committed Jul 23, 2002 625   jm committed Aug 12, 2006 626 \begin_layout Subsection*  jm committed Feb 17, 2003 627 Voice Activity Detection  Jean-Marc Valin committed Nov 09, 2008 628 629 630 631 632 633 \begin_inset Index status collapsed \begin_layout Plain Layout voice activity detection \end_layout  jm committed Nov 01, 2002 634   jm committed Aug 12, 2006 635 \end_inset  jmvalin committed Jul 23, 2002 636   jm committed Feb 17, 2003 637  (VAD)  jm committed Aug 12, 2006 638 \end_layout  jmvalin committed Jul 23, 2002 639   jm committed Aug 12, 2006 640 \begin_layout Standard  jm committed Feb 17, 2003 641 642 643 644 645 646 647 648 When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called \begin_inset Quotes eld  jm committed Aug 12, 2006 649 \end_inset  jmvalin committed Jul 23, 2002 650   jm committed Feb 17, 2003 651 652 comfort noise generation \begin_inset Quotes erd  jm committed Aug 12, 2006 653 \end_inset  jmvalin committed Jul 23, 2002 654   jm committed Feb 17, 2003 655  (CNG).  jm committed Aug 12, 2006 656 \end_layout  jm committed Feb 17, 2003 657   jm committed Aug 12, 2006 658 \begin_layout Subsection*  jm committed Feb 17, 2003 659 Discontinuous Transmission  Jean-Marc Valin committed Nov 09, 2008 660 661 662 663 664 665 \begin_inset Index status collapsed \begin_layout Plain Layout discontinuous transmission \end_layout  jm committed Feb 17, 2003 666   jm committed Aug 12, 2006 667 \end_inset  jmvalin committed Jul 23, 2002 668   jm committed Feb 17, 2003 669  (DTX)  jm committed Aug 12, 2006 670 \end_layout  jmvalin committed Jul 23, 2002 671   jm committed Aug 12, 2006 672 \begin_layout Standard  jm committed Mar 13, 2003 673 Discontinuous transmission is an addition to VAD/VBR operation, that allows  jm committed Mar 22, 2003 674  to stop transmitting completely when the background noise is stationary.  jm committed Feb 17, 2003 675 676  In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps).  jm committed Aug 12, 2006 677 \end_layout  jmvalin committed Jul 23, 2002 678   jm committed Aug 12, 2006 679 \begin_layout Subsection*  jm committed Feb 17, 2003 680 Perceptual enhancement  Jean-Marc Valin committed Nov 09, 2008 681 682 683 684 685 686 \begin_inset Index status collapsed \begin_layout Plain Layout perceptual enhancement \end_layout  jmvalin committed Jul 23, 2002 687   jm committed Aug 12, 2006 688 \end_inset  jmvalin committed Jul 23, 2002 689 690   jm committed Aug 12, 2006 691 \end_layout  jmvalin committed Jul 23, 2002 692   jm committed Aug 12, 2006 693 \begin_layout Standard  jm committed Apr 30, 2007 694 695 696 697 698 Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion produced by the encoding/decod ing process. In most cases, perceptual enhancement brings the sound further from the original  jm committed Aug 12, 2006 699 \emph on  jm committed Feb 17, 2003 700 objectively  jm committed Aug 12, 2006 701 \emph default  jm committed Apr 30, 2007 702 703  (e.g. considering only SNR), but in the end it still  jm committed Aug 12, 2006 704 \emph on  jm committed Feb 17, 2003 705 sounds  jm committed Aug 12, 2006 706 \emph default  jm committed Feb 17, 2003 707  better (subjective improvement).  jm committed Aug 12, 2006 708 \end_layout  jmvalin committed Jul 23, 2002 709   jm committed Aug 12, 2006 710 \begin_layout Subsection*  jm committed Apr 30, 2007 711 Latency and algorithmic delay  Jean-Marc Valin committed Nov 09, 2008 712 713 714 715 716 717 \begin_inset Index status collapsed \begin_layout Plain Layout algorithmic delay \end_layout  jmvalin committed Jul 23, 2002 718   jm committed Aug 12, 2006 719 \end_inset  jmvalin committed Jul 23, 2002 720 721   jm committed Aug 12, 2006 722 \end_layout  jmvalin committed Jul 23, 2002 723   jm committed Aug 12, 2006 724 \begin_layout Standard  jm committed Feb 17, 2003 725 726 727 Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of \begin_inset Quotes eld  jm committed Aug 12, 2006 728 \end_inset  jmvalin committed Jul 23, 2002 729   jm committed Feb 17, 2003 730 731 look-ahead \begin_inset Quotes erd  jm committed Aug 12, 2006 732 \end_inset  jmvalin committed Jul 23, 2002 733   jm committed Feb 17, 2003 734 735 736 737 738  required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.  jm committed Aug 12, 2006 739 \end_layout  jm committed Jul 09, 2004 740   jm committed Nov 13, 2006 741 \begin_layout Section  jm committed Sep 05, 2005 742 Codec  jm committed Aug 12, 2006 743 \end_layout  jm committed Jul 09, 2004 744   jm committed Mar 27, 2007 745 746 747 748 749 750 \begin_layout Standard The main characteristics of Speex can be summarized as follows: \end_layout \begin_layout Itemize Free software/open-source  Jean-Marc Valin committed Nov 09, 2008 751 752 753 754 755 756 \begin_inset Index status collapsed \begin_layout Plain Layout open-source \end_layout  jm committed Mar 27, 2007 757 758 759 760  \end_inset , patent  Jean-Marc Valin committed Nov 09, 2008 761 762 763 764 765 766 \begin_inset Index status collapsed \begin_layout Plain Layout patent \end_layout  jm committed Mar 27, 2007 767 768 769 770 771 772 773 774  \end_inset and royalty-free \end_layout \begin_layout Itemize Integration of narrowband  Jean-Marc Valin committed Nov 09, 2008 775 776 777 778 779 780 \begin_inset Index status collapsed \begin_layout Plain Layout narrowband \end_layout  jm committed Mar 27, 2007 781 782 783 784  \end_inset and wideband  Jean-Marc Valin committed Nov 09, 2008 785 786 787 788 789 790 \begin_inset Index status collapsed \begin_layout Plain Layout wideband \end_layout  jm committed Mar 27, 2007 791 792 793 794 795 796 797 798 799 800 801 802  \end_inset using an embedded bit-stream \end_layout \begin_layout Itemize Wide range of bit-rates available (from 2.15 kbps to 44 kbps) \end_layout \begin_layout Itemize Dynamic bit-rate switching (AMR) and Variable Bit-Rate  Jean-Marc Valin committed Nov 09, 2008 803 804 805 806 807 808 \begin_inset Index status collapsed \begin_layout Plain Layout variable bit-rate \end_layout  jm committed Mar 27, 2007 809 810 811 812 813 814 815 816  \end_inset (VBR) operation \end_layout \begin_layout Itemize Voice Activity Detection  Jean-Marc Valin committed Nov 09, 2008 817 818 819 820 821 822 \begin_inset Index status collapsed \begin_layout Plain Layout voice activity detection \end_layout  jm committed Mar 27, 2007 823 824 825 826 827 828 829 830  \end_inset (VAD, integrated with VBR) and discontinuous transmission (DTX) \end_layout \begin_layout Itemize Variable complexity  Jean-Marc Valin committed Nov 09, 2008 831 832 833 834 835 836 \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout  jm committed Mar 27, 2007 837 838 839 840 841 842 843 844 845 846 847  \end_inset \end_layout \begin_layout Itemize Embedded wideband structure (scalable sampling rate) \end_layout \begin_layout Itemize  jm committed Jun 05, 2007 848 Ultra-wideband sampling rate at 32 kHz  jm committed Mar 27, 2007 849 850 851 852 853 854 855 \end_layout \begin_layout Itemize Intensity stereo encoding option \end_layout \begin_layout Itemize  jm committed Apr 30, 2007 856 Fixed-point implementation  jm committed Mar 27, 2007 857 858 \end_layout  jm committed Nov 13, 2006 859 \begin_layout Section  jm committed Jul 09, 2004 860 Preprocessor  jm committed Aug 12, 2006 861 \end_layout  jm committed Jul 09, 2004 862   jm committed Aug 12, 2006 863 \begin_layout Standard  jm committed Jul 09, 2004 864 865 This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the audio  jm committed Aug 12, 2006 866 \emph on  jm committed Jul 09, 2004 867 before  jm committed Aug 12, 2006 868 \emph default  jm committed Jul 09, 2004 869 870  running the encoder. The preprocessor provides three main functionalities:  jm committed Aug 12, 2006 871 \end_layout  jm committed Jul 09, 2004 872   jm committed Aug 12, 2006 873 \begin_layout Itemize  jm committed Nov 13, 2006 874 noise suppression  jm committed Aug 12, 2006 875 \end_layout  jm committed Jul 09, 2004 876   jm committed Aug 12, 2006 877 \begin_layout Itemize  jm committed Jul 09, 2004 878 automatic gain control (AGC)  jm committed Aug 12, 2006 879 \end_layout  jm committed Jul 09, 2004 880   jm committed Aug 12, 2006 881 \begin_layout Itemize  jm committed Jul 09, 2004 882 voice activity detection (VAD)  jm committed Aug 12, 2006 883 \end_layout  jm committed Jul 09, 2004 884   jm committed Aug 12, 2006 885 \begin_layout Standard  jm committed Jul 09, 2004 886 887 888 889 890 891 892 893 894 The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect.  jm committed Aug 12, 2006 895 \end_layout  jm committed Jul 09, 2004 896   jm committed Aug 12, 2006 897 \begin_layout Standard  jm committed Jul 09, 2004 898 899 900 901 902 903 904 Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping.  jm committed Aug 12, 2006 905 \end_layout  jm committed Jul 09, 2004 906   jm committed Aug 12, 2006 907 \begin_layout Standard  jm committed Jul 09, 2004 908 909 910 The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec.  jm committed Aug 12, 2006 911 \end_layout  jm committed Sep 05, 2005 912   jm committed Nov 13, 2006 913 \begin_layout Section  jm committed Sep 05, 2005 914 Adaptive Jitter Buffer  jm committed Aug 12, 2006 915 \end_layout  jm committed Sep 05, 2005 916