A series of trigonometric functions can be used to interpolate a set of data points. Interpoltation can be used as as an easy way to implement a least squares approximation using the same trigonometric functions. The coefficients to the functions in the series can be found using discrete Fourier transforms. For a data vector of length \(n\), the Fourier transforms give \(n\) coefficients for a series of \(n\) terms that interpolates the data. To get a least squares approximation around \(m \leq n\) data points, the first \(m\) terms in the series are taken. In the case of audio compression the sound vector is appoximated using a series of only cosine functions. This is because the cosine Fourier transforms are real valued when a real valued vector is used as opposed to the regular Fourier transforms which are complex valued.
The type of cosine transform used in audio compression is a modified version of the version 4 discrete cosine transform. The transform matrix for this modified discrete cosine transform (MDCT) is given by: \[M_{i,j} = \sqrt{\frac{2}{n}} \cos \left( \frac{(i+1/2)(j+n/2+1/2)\pi }{n} \right). \] The transform matrix is a \(n \times 2n \) sized matrix. Since it is rectangular, it cannot be inverted like a regular cosine trasfrorm matrix, but instead it uses its transpose and overlaps the data intervals so that the values in the transform can be averaged to increase accuracy. In audio compression the lenght of the sampling window is 64, and every 32 sound values is overlapped into two MDCT matrices.
I will use the MDCT method to explore audio compression on different sound signals. All parts use the same function which is here
When MDCT audio comprssion is used on pure tones whose frequencies are multiples of 64 Hz, the compressed then uncompressed sound has very little distortions in it some of the time. When done on multiples of 64Hz, the distortions appear every two frequencies. The RMSE errors for some multiples of 64Hz are in a table below:
Frequency | 64Hz | 128Hz | 192Hz | 256Hz | 320Hz | 384Hz | 448Hz | 512Hz |
Error | .0073 | 0.0468 | .0032 | .0468 | .0106 | .0469 | .0037 | .0439 |
A windowing function can be used to remove some of the distortion from the uncompressed audio. The windowing function is applied when vector \(h\) is multiplied by the length 64 intervals before the MDCT is applied and then undone by multiplying the vector \(h\) by the intervals after the MDCT is inverted. The \(h\) vector is \[h_j=\sqrt{2}\sin \left(\frac{(j+1/2)\pi }{2n} \right). \] Undoing the windowing function after the MDCT is inverted is valid because since \(h\) is multiplied every 64 entries of the sound vector, it will have the same effect on the sound vector in each of its four partitions that are needed to invert the MDCT.
Qualitatively the windowing function seems to invert which of the tones are distorted. This means it corrects distorted ones and distorts correct ones. This can be seen in the plots which are here. The effect it has on the correct sound from the above 320Hz pure tone:
Frequency | 64Hz | 128Hz | 192Hz | 256Hz | 320Hz | 384Hz | 448Hz | 512Hz |
Error | .0188 | 0.0296 | .0243 | .0256 | .0318 | .0127 | .0243 | .0273 |
The RMSE for the chord depends on the amount of bits used in compression. The effect this has is in the table below. (All values are for the 384+256HZ third chord)
Bits | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Error | .0343 | 0.0138 | .0069 | .0029 | .0017 | \(8.42 \times 10^{-4}\) | \(4.24 \times 10^{-4}\) |