Ogg Vorbis Psycho Acoustic Models Explained
The contents of this page come from Beni.

Posted 24oct2001

Your guide (Ripping, Page 6 of 7) says:

While MP3's version of VBR uses psycho-acoustic models, Ogg is based solely 
on noise, masking, coupling and ATH thresholds. I have no idea of what that 
means, but I read that's how I heard it works. =)

These *are* aspects of a psycho-acoustic model. For example, it's like saying
"Most people eat soup, but he eats boiled water with vegetables and other 
additions.
" I'm no specialist, but I've got a good idea of how they work. BTW, 
I'm also a big fan of Vorbis, so you'll see my bias ;).

                                                ***

ATH - Absolute Threshold of Hearing - the ear can't physically hear a 
sound quieter than this, no matter what happens on other frequencies.  

Vorbis uses it more correctly than other codecs. Most codecs assume 
volume is fixed during playback. Whereas the Vorbis codec assumes that
volume can be adjusted (which it can). Vorbis assumes that you adjust 
the volume such that the strongest frequency present won't kill your ears 8-[.


Tone Masking
- is just when louder frequencies mask out adjacent quieter 
ones. The ear & the brain's ability to process sound is unable to hear a quiet 
sound at one frequency if there is a loud sound at another. Frequencies more 
distant from one another require a greater volume in order to mask one another. 
Whereas frequencies closer to one another are more easily "masked".

Things aren't always that simple though. Think about what happens when you
hit a cymbal. A cymbal doesn't really play a tone, it generates broadband
noise. Noise masking basically determines how much extra noise the encoder
can introduce in that noise without it being audible. 

This is generally more tricky than tone masking and one of the things that's 
improved a lot in Vorbis RC3. (but RC2 did it still much better than LAME for 
example. LAME developers are now trying to imitate the Vorbis method.)

These two are used by all perceptual codecs (including MP3) to model what
a human ear will and won't hear. They are not specific to VBR. ABR 
approximates what's left from psycho-acoustics as good as possible in 
given bit-rate limits. VBR approximates the same things using as much bits 
as it takes. 

[The gory details: in order to make the bitrate lower than full-quality VBR, 
one makes ATH, masking, etc. more aggressive than necessary. This throws 
out also some things that would still be heard, but not very well.]


Coupling - a more generic term for concepts like mid/side stereo, amplitude 
stereo, etc., in MP3 and other codecs. The idea is that sound consists of 
many channels and has redundancy between channels. This redundancy 
can be exploited to lower the bit-rate if the channels are encoded in some 
joint representation. 

The simplest example is to encode the average and the difference between 
channels (for a stereo sound) - this is called mid/side representation and 
it requires much less bits for sections that are close to mono.

Psycho-acoustics appear here too: the ear's & brain's abilities to
differentiate sound coming from the each ear is limited. A lossy format 
can cheat to some degree (e.g. have the difference low-quality).

Compare with color television (analog). It transmits a full-resolution
intensity image (brightness), yet transmits coloring information at only 
half-resolution, because the eye perceives color with less resolution
and the brain cares less for color changes.  That's a lossy coupling 
method based on a psycho-visual model ;-).

Vorbis supports files with up to 255 channels. That's why the term stereo 
is avoided. Also mid/side seems to be patented. For now, the encoder knows 
to use coupling for 2-channel files only, but eventually it will scale.  

Vorbis uses a flexible format for coupling that allows it to cheat in different 
ways at different frequencies, or not to cheat at all (lossless stereo imaging 
in lossy audio - still gains on the redundancy!). 

It's elegantly based on Vorbis' compression technique. Vector quantization 
means that it encodes *joint* approximations to groups of numbers. 

If you group together numbers describing different channels, your channels 
become automatically coupled (normally a group would be picked from data 
describing a single channel, so channels would be approximated independently). 
Details explained here:
http://www.xiph.org/ogg/vorbis/doc/stereo.html.


Noise
-- I'm not sure what you refer to in this case. How is Vorbis based on 
noise?  Where did you see it?  Formats like PlusV and MP3Pro's SBR invent 
noise quite shamelessly in a way that you could believe to be the original. ;)  

Perhaps this is what you mean:

The process of Vector Quantization introduces some "[vector] quantization
noise" - the difference between the approximation (a limited number of
these can be chosen) and the original group of numbers.

Dont worry, since all codecs suffer from quantization problems. VQ's whole 
purpose is to suffer less from it by utilizing correlations among a group of 
numbers - the joint noise can lower more than independent noise introduced 
into each number.

See
http://data-compression.com/vq.html for an explanation (the math can
be skipped. Read paragraphs I, II, beginnings of III and V. IV is easy too.
Be sure click the animation at VI!) BTW, VQF also uses VQ.

Anyway, this noise is introduced *after* all psycho-acoustic modeling and
simplification of the signal. It's one of the very last steps. The trick is to 
ensure that it's unnoticeable. 

A good codec will predict how much of this noise can be tolerated at a given 
point of the signal and decide to allocate additional bits for a more precise 
approximation if necessary. 

I don't know how Vorbis handles it, but this is a only minor problem with most 
codecs. Compared to a good psycho-acoustic model. Basically you allow just
enough possible approximation values to make the noise unnoticeable, and live 
with the number of bits that this takes.

I hope this was helpful. It helped me organize these ideas in the head. For more 
details you can ask on the Vorbis mailing-list <
vorbis@xiph.org> or contact Monty 
<http://www.vorbis.com/contact.psp>. He's the main Vorbis developer.

Beni Cherniavsky <
cben@tx.technion.ac.il>

Return to => [
Non-MP3 encoders for CD Audio]


[Radified Home]