flac has been tuned so that the default options yield a good speed vs. compression tradeoff for many kinds of input. However, if you are looking to maximize the compression rate or speed, or want to use the full power of FLAC's metadata system, this section is for you. If not, just skip to the next section.
The basic structure of a FLAC stream is:
- The four byte string "fLaC"
- The STREAMINFO metadata block
- Zero or more other metadata blocks
- One or more audio frames
The first four bytes are to identify the FLAC stream. The metadata that follows contains all the information about the stream except for the audio data itself. After the metadata comes the encoded audio data.
FLAC defines several types of metadata blocks (see the format page for the complete list). Metadata blocks can be any length and new ones can be defined. A decoder is allowed to skip any metadata types it does not understand. Only one is mandatory: the STREAMINFO block. This block has information like the sample rate, number of channels, etc., and data that can help the decoder manage its buffers, like the minimum and maximum data rate and minimum and maximum block size. Also included in the STREAMINFO block is the MD5 signature of the unencoded audio data. This is useful for checking an entire stream for transmission errors.
Other blocks allow for padding, seek tables, tags, cuesheets, and application-specific data. You can see flac options below for adding PADDING blocks or specifying seek points. FLAC does not require seek points for seeking but they can speed up seeks, or be used for cueing in editing applications.
Also, if you have a need of a custom metadata block, you can define your own and request an ID here. Then you can reserve a PADDING block of the correct size when encoding, and overwrite the padding block with your APPLICATION block after encoding. The resulting stream will be FLAC compatible; decoders that are aware of your metadata can use it and the rest will safely ignore it.
After the metadata comes the encoded audio data. Audio data and metadata are not interleaved. Like most audio codecs, FLAC splits the unencoded audio data into blocks, and encodes each block separately. The encoded block is packed into a frame and appended to the stream. The reference encoder uses a single block size for the whole stream but the FLAC format does not require it.
The block size is an important parameter to encoding. If it is too small, the frame overhead will lower the compression. If it is too large, the modeling stage of the compressor will not be able to generate an efficient model. Understanding FLAC's modeling will help you to improve compression for some kinds of input by varying the block size. In the most general case, using linear prediction on 44.1kHz audio, the optimal block size will be between 2-6 ksamples. flac defaults to a block size of 4608 in this case. Using the fast fixed predictors, a smaller block size is usually preferable because of the smaller frame header.
In the case of stereo input, once the data is blocked it is optionally passed through an inter-channel decorrelation stage. The left and right channels are converted to center and side channels through the following transformation: mid = (left + right) / 2, side = left - right. This is a lossless process, unlike joint stereo. For normal CD audio this can result in significant extra compression. flac has two options for this: -m always compresses both the left-right and mid-side versions of the block and takes the smallest frame, and -M, which adaptively switches between left-right and mid-side.
In the next stage, the encoder tries to approximate the signal with a function in such a way that when the approximation is subracted, the result (called the residual, residue, or error) requires fewer bits-per-sample to encode. The function's parameters also have to be transmitted so they should not be so complex as to eat up the savings. FLAC has two methods of forming approximations: 1) fitting a simple polynomial to the signal; and 2) general linear predictive coding (LPC). I will not go into the details here, only some generalities that involve the encoding options.
First, fixed polynomial prediction (specified with -l 0) is much faster, but less accurate than LPC. The higher the maximum LPC order, the slower, but more accurate, the model will be. However, there are diminishing returns with increasing orders. Also, at some point (usually around order 9) the part of the encoder that guesses what is the best order to use will start to get it wrong and the compression will actually decrease slightly; at that point you will have to you will have to use the exhaustive search option -e to overcome this, which is significantly slower.
Second, the parameters for the fixed predictors can be transmitted in 3 bits whereas the parameters for the LPC model depend on the bits-per-sample and LPC order. This means the frame header length varies depending on the method and order you choose and can affect the optimal block size.
Once the model is generated, the encoder subracts the approximation from the original signal to get the residual (error) signal. The error signal is then losslessly coded. To do this, FLAC takes advantage of the fact that the error signal generally has a Laplacian (two-sided geometric) distribution, and that there are a set of special Huffman codes called Rice codes that can be used to efficiently encode these kind of signals quickly and without needing a dictionary.
Rice coding involves finding a single parameter that matches a signal's distribution, then using that parameter to generate the codes. As the distribution changes, the optimal parameter changes, so FLAC supports a method that allows the parameter to change as needed. The residual can be broken into several contexts or partitions, each with it's own Rice parameter. flac allows you to specify how the partitioning is done with the -r option. The residual can be broken into 2^n partitions, by using the option -r n,n. The parameter n is called the partition order. Furthermore, the encoder can be made to search through m to n partition orders, taking the best one, by specifying -r m,n. Generally, the choice of n does not affect encoding speed but m,n does. The larger the difference between m and n, the more time it will take the encoder to search for the best order. The block size will also affect the optimal order.
An audio frame is preceded by a frame header and trailed by a frame footer. The header starts with a sync code, and contains the minimum information necessary for a decoder to play the stream, like sample rate, bits per sample, etc. It also contains the block or sample number and an 8-bit CRC of the frame header. The sync code, frame header CRC, and block/sample number allow resynchronization and seeking even in the absence of seek points. The frame footer contains a 16-bit CRC of the entire encoded frame for error detection. If the reference decoder detects a CRC error it will generate a silent block.
In order to support come common types of metadata, the reference decoder knows how to skip ID3V1 and ID3V2 tags so it is safe to tag FLAC files in this way. ID3V2 tags must come at the beginning of the file (before the "fLaC" marker) and ID3V1 tags must come at the end of the file.
flac has a verify option -V that verifies the output while encoding. With this option, a decoder is run in parallel to the encoder and its output is compared against the original input. If a difference is found flac will stop with an error.