Cyborg Music: LAME - Intro to Encoding

Constant Bit Rate (CBR) :::

CBR encoding is the basic encoding mode of MP3: The bitrate is kept constant across the entire file, which means the same number of bits is allocated to encode each second of audio, and internally, frames of audio data occur at regular, predictable intervals, given a predictable file size for a given duration. CBR is therefore the "opposite" of VBR.

That said, in some formats there may be some variability in the number of bits that contain actual audio information from frame to frame. This concept manifests in the bit reservoir of MP3s. In a CBR MP3, even though the frames are of a fixed size, the audio data is not necessarily distributed consistently between them; audio for one frame might use fewer bits than the frame has, so that frame adds the spare bits to a reservoir that can supplement the bits allocated to the next frame. Thus, the effective bitrate is allowed to vary somewhat in a CBR MP3, even though there is a fixed number of frames for the duration of audio. For example, for a 256kbps file, the bitrate of a single frame can be up to 320 kbps, but the frame immediately before and/or after that one would have to use fewer bits, whereas in VBR, there would be no such restriction. Consequently, the amount of variability across the entire MP3 is not as great as that afforded by VBR, but it is not insignificant; a CBR encoder that does not efficiently use the reservoir will likely produce a lower quality file than one that does.

ABR is a more flexible way to encode where filesize is important, but still giving some flexibility to choose frame sizes.

Who should use CBR
CBR is useful for people who are concerned about maintaining maximum compatibility, especially with certain streaming applications and some hardware-based decoders that don't reliably support VBR.

CBR is also useful for people who desire the ability to obtain accurate estimates of the bitrate or approximate duration of a file's decoded audio without scanning and partially decoding the entire file.

Average Bit Rate (ABR) :::

ABR is a mix between CBR and VBR.

Like CBR, the files will have the (approximate) bitrate specified in the commandline, and uses the CBR algorithm to compute the number of bits needed to encode each frame.

Like VBR, the files will use different frame bitrates so instead of relying on bit reservoir like CBR, each frame just uses the smallest possible bitrate that can encode it.

The difference between ABR and true VBR is in how the desired number of bits is chosen. The true VBR mode determines the number of bits based on the quantization noise. VBR figures out how many bits are needed so that the quantization noise is less than the allowed masking.

ABR mode uses the CBR formula to determine the desired number of bits. This formula is based on the perceptual entropy, which is a rough measure of how difficult the frame is to encode.

The majority of frames in a VBR MP3 produced with an ABR method is normally at or near the target bitrate chosen by the user, but each frame can still potentially vary within the normal range of 8 to 320 kbps. The ABR encoder will typically limit the range of bitrates it can choose from, or will greatly favor certain bitrates, in an effort to ensure that the average comes out near the target.

Who should use ABR
ABR encoding is desirable for users who want the general benefits of VBR (an optimum bitrate from frame to frame) but with a relatively predictable file size like they would get with constant bitrate (CBR), and a greater preference for bitrates that are near a desired target. Inevitably, some frames will be encoded with more bits than necessary, but the result will always be equal to or better than that of CBR for the target bitrate.

Variable Bit Rate (VBR) :::

In Variable Bitrate (VBR) coding, the user chooses a desired quality level instead of a bitrate. A correct implementation should be able to maintain the same quality perception, changing the bitrate to a higher or lower one whenever the audio file is more or less complex. With MP3, this is not always possible.

VBR encoding is the logical way of encoding data. General data compressors (like .zip and .rar) are VBR, as lossless codecs are as well. Being able to indicate a quality value, the encoder decides for each frame, which is the most appropiate bitrate to keep it.

The main advantage of using VBR is that the encoder will use the smallest amount of bytes needed to keep the asked quality. The inconvenience is that the file size is quite unpredictable, and can change from file to file in more than 50kbps. (or nearly double the size, with different genres and quasi-mono content)

Mid/Side Stereo :::

During years, what is called Joint-stereo has been misunderstood.
Joint stereo in MP3 is a mechanism to selectively choose between three modes of storing stereo information. These three modes are Simple Stereo , Mid-Side Stereo, and Intensity-Stereo.

In Simple Stereo, the encoder analyzes the left and the right channels independently and stores the information as-is, without further checking the similarities in the signal1

In Mid-Side Stereo, the encoder analyzes the left, right2 , mid (l+r) and side (l-r) channels. It then gives more bits to the mid than the side channel (as usually the side channel is less complex) and then stores just the mid and side channels into the resulting MP3.
This way, the mid channel can be encoded as if the frame was bigger, and as such have more quality with the same bitrate.
Note: Mid/side in MP3 is switched frame-by-frame. In AAC, it can be switched band by band.

Intensity-Stereo (not supported in LAME) uses a technique known as joint frequency encoding, which is based on the principle of sound localization.
Human hearing is predominantly less acute at perceiving the direction of certain audio frequencies. By exploiting this 'limitation', intensity stereo coding can reduce the data rate of an audio stream with little or no perceived change in apparent quality.
It works by merging the upper spectrum into just one channel (thus reducing overall differences between channels) and transmiting a little side information about how to pan certain frequency regions.
This type of coding does not perfectly reconstruct the original audio because of the loss of information and can cause unwanted artifacts. However, for very low bitrates this tool usually provides a gain of perceived quality. 3

The LAME mid/side switching criterion, and mid/side masking thresholds are taken from Johnston and Ferreira, Sum-Difference Stereo Transform Coding, Proc. IEEE ICASSP (1992) p 569-571.

The MPEG AAC standard claims to use mid/side encoding based on this paper.

1.This is not the same than dual-mono. Dual-mono should be used where the left and right channels of the input file contain two different streams, where you should choose one (as in two different languages)
2.If one channel has much less noise masking in a certain band than the other, it could happen than the noise spread (by mid/side stereo) may no longer be masked for that channel. If both channels have the same masking, then the noise spread between both channels will be equally well masked.
To prevent this from happening, there is an analysis done on the left and right channel to determine the noise masking thresholds and properly mask the noise.
3.Quote from wikipedia Joint_stereo.

Information extracted from LAME 3.99 Help

LAME Official Home Site ::: http://lame.sourceforge.net/

About CM

viernes, 13 de abril de 2012

LAME - Intro to Encoding

No hay comentarios:

Publicar un comentario