Project 5: Discrete Cosine Transform & Audio Compression

Daniel Jacobson's MATH 447 Webpage

Home ≫ Project 5 ≫ Problem 2
Project 5: Discrete Cosine Transform & Audio Compression

For the second problem, we added a windowing function to have each window "fade in" to the next. By multiplying each \(x_i\) with \(h_i x_i\), where \(h_i = \sqrt{2}\sin\left(\frac{(i-\frac{1}{2})\pi}{2n}\right)\), then multiplying the decoded \(x_i\)s by the same value before averaging them, the MDCT is "weighted" to provide a more emphasis on accuracy near the center (when \(h_i\) is highest), then the two overlapping outputs are weighted to favor the more accurate estimate. Running the same analysis as in Part 1 gives the following results:

Relevant files: prob2codec.m, question2.m, and prob2.txt.

\( f = 1 \) (64 Hz)	\( f = 2 \) (128 Hz)
\( f = 3 \) (192 Hz)	\( f = 4 \) (256 Hz)
\( f = 5 \) (320 Hz)	\( f = 6 \) (384 Hz)

While all six graphs show some error, it is reasonably well distributed. There is no recurrent, rythmic error like that caused by the original MDCT with even numbered values of \(f\).

\(f\)	\(y_0\)	\(y_1\)	\(y_2\)	\(y_3\)	\(y_4\)	\(y_5\)	\(y_6\)	RMSE
1	1.069	0.601	0.147	-0.067	-0.040	0.026	0.018	0.0045
2	1.140	-1.257	0	0	0	0	0	0.0033
3	-0.6392	-1.365	0.568	-0.110	-0.056	0.031	0.022	0.0043
4	0	1.312	1.077	0	0	0	0	0.0011
5	-0.151	0.470	-1.690	-0.455	-0.120	0.044	0.031	0.0043
6	0	0	-1.011	1.363	0	0	0	0.0014

The perfectly lined up cosines that greatly reduced error for odd-numbered values of \(f\) is destroyed by the windowing function, slightly increasing the error for those values, but this effect is dwarfed by the massive reduction in error for the even-numbered values of \(f\). In essence, while error has not been reduced across the board, it has been redistributed to be less predictable and more random.

The mathematical proof for this is as follows:

Given data arrays \[ Z_1 = \begin{bmatrix}x_1\\x_2\end{bmatrix}, Z_2 = \begin{bmatrix}x_2\\x_3\end{bmatrix} \] where \(x_1, x_2, x_3\) are each n-sized blocks of data, we can multiply by the MDCT matrix, \(M\), followed by its inverse, \(N\), to obtain the following result (adapted from Figure 11.38): \[ M N Z_1 = \begin{bmatrix} x_1 - R x_1 \\ x_2 + R x_2 \end{bmatrix}, M N Z_2 = \begin{bmatrix} x_2 - R x_2 \\ x_3 + R x_3 \end{bmatrix} \] where \(R\) is the reversal permutation matrix (formed by reversing the columns of the identity matrix). To reconstruct the original \(x_2\), we simply add the second half of \(M N Z_1\) to the first half of \(M N Z_2\) and divide by 2, giving \(\frac{1}{2}[(x_2 - R x_2) + (x_2 + R x_2)] = x_2\). With the windowing function, we will be multiplying both the inputs by the \(n \times n\) matrix \[ W = \begin{bmatrix} h_1 & 0 & \cdots & 0 \\ 0 & h_2 & & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & h_{2n} \\ \end{bmatrix}, \text{ where } h_i = \sqrt{2} \sin\left(\frac{(i-0.5)\pi}{2n}\right) \] Noticing that \(h_i\) has a period of \(4n\), and recalling that the sine function shifted to the left by a quarter period is equivalent to the cosine function, we can split \(W\) into two \(n \times n\) matrices, \(W_1\) and \(W_2\), where \[ W_1 = \begin{bmatrix} j_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & j_n \end{bmatrix}, j_i = \sqrt{2} \sin\left(\frac{(i-0.5)\pi}{2n}\right), \text{ and } W_2 = \begin{bmatrix} k_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & k_n \end{bmatrix}, k_i = \sqrt{2} \cos\left(\frac{(i-0.5)\pi}{2n}\right) \] Such that the input data becomes \[ W Z_1 = \begin{bmatrix}W_1 x_1\\W_2 x_2\end{bmatrix}, W Z_2 = \begin{bmatrix}W_1 x_2\\W_2 x_3\end{bmatrix} \] Note that \(W_1\) is simply \(W_2\) reversed: that is, \(W_1 R = R W_2 \). Additionally, note that \( (W_1^2 + W_2^2) = 2 I_n \). Performing and reversing the MDCT transform on the windowed data gives us \[ M N W Z_1 = \begin{bmatrix} W_1 x_1 - W_2 R x_1 \\ W_2 x_2 + W_1 R x_2 \end{bmatrix}, M N W Z_2 = \begin{bmatrix} W_1 x_2 - W_2 R x_2 \\ W_2 x_3 + W_1 R x_3 \end{bmatrix} \] and so applying the window function a second time and averaging the second half of the first block with the first half of the second block gives us back our original \(x_2\): \[ \frac{1}{2}[W_2(W_2 x_2 + W_1 R x_2) + W_1(W_1 x_2 - W_2 R x_2)] = \frac{1}{2}[(W_2^2 + W_1^2)x_2 + (W_1 W_2 R x_2 - W_1 W_2 R x_2)] = x_2 \]

Problem 2