Audio Compression

Tim Reid


Introduction

A series of trigonometric functions can be used to interpolate a set of data points. Interpoltation can be used as as an easy way to implement a least squares approximation using the same trigonometric functions. The coefficients to the functions in the series can be found using discrete Fourier transforms. For a data vector of length \(n\), the Fourier transforms give \(n\) coefficients for a series of \(n\) terms that interpolates the data. To get a least squares approximation around \(m \leq n\) data points, the first \(m\) terms in the series are taken. In the case of audio compression the sound vector is appoximated using a series of only cosine functions. This is because the cosine Fourier transforms are real valued when a real valued vector is used as opposed to the regular Fourier transforms which are complex valued.

The type of cosine transform used in audio compression is a modified version of the version 4 discrete cosine transform. The transform matrix for this modified discrete cosine transform (MDCT) is given by: \[M_{i,j} = \sqrt{\frac{2}{n}} \cos \left( \frac{(i+1/2)(j+n/2+1/2)\pi }{n} \right). \] The transform matrix is a \(n \times 2n \) sized matrix. Since it is rectangular, it cannot be inverted like a regular cosine trasfrorm matrix, but instead it uses its transpose and overlaps the data intervals so that the values in the transform can be averaged to increase accuracy. In audio compression the lenght of the sampling window is 64, and every 32 sound values is overlapped into two MDCT matrices.

I will use the MDCT method to explore audio compression on different sound signals. All parts use the same function which is here



Use of MDCT Audio Compression on Pure Tones

When MDCT audio comprssion is used on pure tones whose frequencies are multiples of 64 Hz, the compressed then uncompressed sound has very little distortions in it some of the time. When done on multiples of 64Hz, the distortions appear every two frequencies. The RMSE errors for some multiples of 64Hz are in a table below:

Frequency 64Hz 128Hz 192Hz 256Hz 320Hz 384Hz 448Hz 512Hz
Error .0073 0.0468 .0032 .0468 .0106 .0469 .0037 .0439

The RMSE does seem to follow the qualitative pattern of the distorted sound. The closer the frequency is to an even multiple of 64Hz, the worse the RMSE is. This may be because the frame size is 64. The RMSE plotted over a wide range of frequencies can be seen below. The error gets high near multiples of 64Hz.
Here is an example of a pure tone (320Hz) and its uncompressed version when the uncompressed version has little distortion. The original is on the left and the uncompressed is on the right.



Using a Windowing Function

A windowing function can be used to remove some of the distortion from the uncompressed audio. The windowing function is applied when vector \(h\) is multiplied by the length 64 intervals before the MDCT is applied and then undone by multiplying the vector \(h\) by the intervals after the MDCT is inverted. The \(h\) vector is \[h_j=\sqrt{2}\sin \left(\frac{(j+1/2)\pi }{2n} \right). \] Undoing the windowing function after the MDCT is inverted is valid because since \(h\) is multiplied every 64 entries of the sound vector, it will have the same effect on the sound vector in each of its four partitions that are needed to invert the MDCT.

Qualitatively the windowing function seems to invert which of the tones are distorted. This means it corrects distorted ones and distorts correct ones. This can be seen in the plots which are here. The effect it has on the correct sound from the above 320Hz pure tone:

The qualitative observation that the windowing function increases the error of odd multiples of 64Hz is supported by the data:
Frequency 64Hz 128Hz 192Hz 256Hz 320Hz 384Hz 448Hz 512Hz
Error .0188 0.0296 .0243 .0256 .0318 .0127 .0243 .0273

Since the windowing function decreases error near even multiples of 64Hz and increases the error near odd multiples of 64Hz, the errors are more evenly distributed:


Building Chords

A chord can be built by adding pure tones. Adding them when they are in a 2:1 ratio creates an octave, when they a 3:2 ratio creates a fifth, and a 5:4 ratio creates a third. The plots of a chord when a windowing function is and is not used are below:

The sound from this third in its original form, uncompressed form, and windowed uncompressed form respectively are below:

The RMSE for the chord depends on the amount of bits used in compression. The effect this has is in the table below. (All values are for the 384+256HZ third chord)

Bits 2 3 4 5 6 7 8
Error .0343 0.0138 .0069 .0029 .0017 \(8.42 \times 10^{-4}\) \(4.24 \times 10^{-4}\)

The errors should decrease when the bit amount is increased, and qualitatively the sound quality increased:



Importing Music

The sound vector from music can be imported to the program. For an example, the "Hallelujah Chorus" from Handel's Messiah was used. When a windowing function is both used the RMSE is .10757. This is probably because the signals are out of phase. The plots for this music are below:

Qualitatively the sound is worse in the windowed version and this can be heard the sound below, they are from left to right: original, uncompressed, and uncompressed and windowed.



Importance Sampling

Importance sampling is when the amount of bits varies depending on how large the MDCT transformed vector entries are. I believe a way to determine the bits is to use the function \[b=Ceiling \left(\frac{\ln (Max(|y|)+1.5)}{ \ln(2)} \right)+1. \] This way the there is enough bits to express the median value in the form of powers of 2. In my attempt at implementation of this on the "Hallelujah Chorus" the sound quality was similar to the original compression, indicating that this method needs refinement.

Input and Output

The function has been rewritten to have one version be the input and one be the output. THe input version collects the bits and outputs them in a matrix with each column being one length 32 frame. The output then continues from here to reconstruct those bits into a sound. The functions are here: Input and Output
Numerical Analysis II Homepage

Tim Reid Homepage