Project 5: Discrete Cosine Transform & Audio Compression

Daniel Jacobson's MATH 447 Webpage

Home ≫ Project 5 ≫ Problem 1
Project 5: Discrete Cosine Transform & Audio Compression

For the first problem, we used the Modified Discrete Cosine Transform (MDCT) directly on the data and used it to compress audio signals with frequency \( 64f \) Hz, for small integers \( f \). For the below examples, a sample rate of 8192Hz (intensity values per second) and a window size of \( n=32 \) data points, with coefficients compressed to \( b=4 \) bits (i.e. rounded to the nearest of \( 2^4=16 \) possible values). The original/decoded signals were plotted and compared. Each audio recording contains three 1.5-second clips: one of the original signal, one of the decoded signal, and one of the difference (error) between the two.

Relevant files: prob1codec.m, question1.m, and prob1.txt.

\( f = 1 \) (64 Hz)	\( f = 2 \) (128 Hz)
\( f = 3 \) (192 Hz)	\( f = 4 \) (256 Hz)
\( f = 5 \) (320 Hz)	\( f = 6 \) (384 Hz)

Clearly there is a significant difference in output between odd and even values of \(f\): for even values, the error is far more pronounced and an audible buzzing can be heard over the decoded signal. The reason for this becomes apparent when comparing the first seven coefficients of each transformation with the root mean squared error (RMSE):

\(f\)	\(y_0\)	\(y_1\)	\(y_2\)	\(y_3\)	\(y_4\)	\(y_5\)	\(y_6\)	RMSE
1	1.738	0	0	0	0	0	0	0.0021
2	0.649	-1.338	0.505	0.342	0.342	-0.252	-0.207	0.0095
3	0	-1.568	0	0	0	0	0	0.0017
4	0.087	1.042	1.102	0.483	-0.287	-0.236	0.180	0.0091
5	0	0	-1.892	0	0	0	0	0.0011
6	0.007	0.307	-0.797	1.299	-0.387	-0.302	0.198	0.0097

For odd numbered values of \(f\), the model only needs a single coefficient to encode the signal, and as a result, RMSE is lower - whatever quantization error exists affects merely the amplitude of the signal. For even numbered values of \(f\), since a combination of multiple coefficients is needed to approximate the correct signal, quantization error accumulates more easily and creates more unpredictable patterns.

The mathematical explanation for this is fairly straightforward. The signal is computed as \( f(x) = cos(x * 2 * \pi * (64*f) / 8192) \) (\(period = 128/f\)), while the MDCT is a linear combination of \( f_i(x) = cos((i+0.5)*(x+16.5)*pi / 32) \) (\(period = 64/(i+0.5)\)) for \( i \in [1,32] \). To get the two to line up, we would want the periods to be equal, so \( 128/f = 64/(i+0.5) \) which simplifies to \( i = (f - 1)/2 \) so to get an integer \(i\), \( f \) must be odd.

Problem 1