From 998e9e00fd01e0eff8cca0e752c48f2286804c13 Mon Sep 17 00:00:00 2001 From: Ralph Giles <giles@mozilla.com> Date: Tue, 14 Jan 2014 15:40:16 -0800 Subject: [PATCH] Add Tim's gap repair text. From http://www.ietf.org/mail-archive/web/codec/current/msg03030.html --- doc/draft-ietf-codec-oggopus.xml | 78 ++++++++++++++++++++++++++++++-- 1 file changed, 73 insertions(+), 5 deletions(-) diff --git a/doc/draft-ietf-codec-oggopus.xml b/doc/draft-ietf-codec-oggopus.xml index 6131e69ed..d7cca9f3c 100644 --- a/doc/draft-ietf-codec-oggopus.xml +++ b/doc/draft-ietf-codec-oggopus.xml @@ -245,13 +245,81 @@ All other pages with completed packets after the first MUST have a granule This guarantees that a demuxer can assign individual packets the same granule position when working forwards as when working backwards. For this to work, there cannot be any gaps. -In order to support capturing a stream that uses discontinuous transmission - (DTX), an encoder SHOULD emit packets that explicitly request the use of - Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in - Section 3.2.1 of <xref target="RFC6716"/>) in place of the packets that were - not transmitted. </t> +<section anchor="gap-repair" title="Repairing Gaps in Real-time Streams"> +<t> +In order to support capturing a real-time stream that has lost packets, or that + uses discontinuous transmission (DTX), a muxer SHOULD emit packets that + explicitly request the use of Packet Loss Concealment (PLC) in place of the + packets that were not transmitted. +Only gaps that are a multiple of 2.5 ms are repairable, as these are the + only durations that can be created by packet loss or DTX. +Muxers need not handle other gap sizes. +Creating the necessary packets involves synthesizing a TOC byte (defined in + Section 3.1 of <xref target="RFC6716"/>)---and whatever additional + internal framing is needed---to indicate the packet duration for each stream. +The actual length of each missing Opus frame inside the packet is zero bytes, + as defined in Section 3.2.1 of <xref target="RFC6716"/>. +</t> + +<t> +<xref target="RFC6716"/> does not impose any requirements on the PLC, but this + section outlines choices that are expected to have a positive influence on + most PLC implementations, including the reference implementation. +When possible, creating the TOC byte using the same mode, audio bandwidth, + channel count, and frame size as the previous packet (if any) covers all + losses that do not include a configuration switch, as defined in + Section 4.5 of <xref target="RFC6716"/>. +This is the simplest and usually the most well-tested case for the PLC to + handle. +If there is no previous packet, reasonable decoders will not emit anything + other than silence regardless of the mode. +Using the CELT-only mode for this case (with any audio bandwidth) allows + maximum flexibility, since a single packet can represent any duration up to + 120 ms that is a multiple of 2.5 ms using at most two bytes. +</t> + +<t> +When a previous packet is available, keeping the audio bandwidth and channel + count the same allows the PLC to provide maximum continuity in the concealment + data it generates. +However, if the size of the gap is not a multiple of the most recent frame + size, then the frame size will have to change for at least some frames. +Delaying such changes as long as possible to simplifies things for PLC + implementations. +A 95 ms gap could be encoded as 19 5 ms frames in two bytes + with a single CBR code 3 packet. +If the previous frame size was 20 ms, using four 80 ms frames, + followed by three 5 ms frames requires 4 bytes (plus an extra byte + of Ogg lacing overhead), but allows the PLC to use its well-tested steady + state behavior for as long as possible. +The total bitrate of the latter approach, including Ogg overhead, is about + 0.4 kbps, so the impact on file size is minimal. +</t> + +<t> +Changing modes is discouraged, since this causes some decoder implementations + to reset their PLC state. +However, SILK and Hybrid modes cannot fill gaps that are not a multiple of + 10 ms. +If switching to CELT mode is needed to match the gap size, doing so at the end + of the gap allows the PLC to function for as long as possible. +Since CELT does not support medium-band audio, using wideband when switching + from medium-band SILK ensures that any PLC implementation that does try to + migrate state between the modes will not be forced to artificially reduce the + bandwidth. +</t> + +<t> +The synthetic TOC byte MAY use any of codes 0, 1, 2, or 3 to pack the + frame(s) into a packet. +If the TOC configuration matches, the muxer MAY combine the empty frames with + previous or subsequent non-zero-length frames (using code 2 or + VBR code 3). +</t> +</section> + <section anchor="preskip" title="Pre-skip"> <t> There is some amount of latency introduced during the decoding process, to -- GitLab