In our demonstration, we will investigate the practical use ofthe MDCT with audio compression techniques. When transporting audio, there is a restriction on the space at which it can be sent without issues arising. So there must be a technique that compresses the data in filesize in such a way that it can be sent cleanly, without a big tradeoff of the actual quality. An example is an audio engineer or a music producer may create several audio pieces that are each about 40 MB .wav file, and they wish to place all these on a CD, which on average can only hold 80 MB. This is precisely what the MDCT will be used for; to compress the signal and then decode the signal so that it's compressed form is near similar to its original form.
We utilize a bare-bones audio compression algorithm here to investigate pure tones of \(64f\) Hz for small \(f\in \mathbb{Z}\), that can be represented as sampled data from a cosine function. We test even an odd multiples of 64 to test our algorithm's accuracy. Comparing auditory sounds we have:
\(f\) Multiple | Original Sounds | Compressed Sounds |
---|---|---|
\(1\) | ||
\(2\) | ||
\(3\) | ||
\(4\) |
Then when comparing the acutal signals from a quantitative point of view, our results are:
\(f\) Multiple | Signal Comparisons | Signal Error | RMSE |
---|---|---|---|
\(1\) | \(0.0021\) | ||
\(2\) | \(0.0095\) | ||
\(3\) | \(0.0017\) | ||
\(4\) | \(0.0091\) | ||
\(5\) | \(0.0011\) | ||
\(6\) | \(0.0097\) | ||
\(7\) | \(0.0016\) | ||
\(8\) | \(0.0095\) |
In an attempt to improve the accuracy so that our even multiples of pure tones can be accurately compressed, we componentwise-scale our input data with a sampled windowing function \(h\) where: \[h_j = \sqrt{2}\sin{\frac{(j-\frac{1}{2})\pi}{2n}} \]. We know this will still keep our algorithm consistent because of the following observation.
By following the notation in the book, let the vector \(h= [h_1,h_2,h_3,h_4]\) where each \(h_i\) is a \(\frac{n}{2} {}\) long vector. Then for our \(Z_1\) vector we have that each section can be now written as \(x_ih_i\) where this multiplication is component-wise. Since our \(Z_2\) vector is an overlapping vector with \(Z_1\), then we know that \[Z_2 = \left[\begin{array}{c} x_3h_3\\ x_4h_4\\ x_5h_1\\ x_6h_2 \end{array} \right]\] Now by moving on we pay particular close attention to the last two and the first two sections of our \(NMZ_1\) and \(NMZ_2\) vectors. we note that respectively these are: \[(NMZ_1)_{3,4} = \left[\begin{array}{c} x_3h_3+Rx_4h_4\\ Rx_3h_3+x_4h_4\\ \end{array} \right], (NMZ_2)_{1,2} = \left[\begin{array}{c} x_3h_3-Rx_4h_4\\ -Rx_3h_3+x_4h_4\\ \end{array} \right]\] From there we can see that: \[ \frac{1}{2}\left((NMZ_1)_{3,4}+(NMZ_2)_{1,2} \right)= \left[ \begin{array}{c} x_3h_3\\ x_4h_4 \end{array} \right] \] By setting \(x_ih_i=x_i '\), we can see that the equations in the book still hold; and thus, the decoding of the signal is still consistent when scaling our input data.
We note that this code is simply a modification of the latter half of previous code used. When testing our windowing function we notice the auditory output, we note that the original sounds stay the same:
\(f\) Multiple | Original Sounds | Compressed Sounds |
---|---|---|
\(1\) | ||
\(2\) | ||
\(3\) | ||
\(4\) |
In a similar fashion as section I, we see our quantitative results are:
\(f\) Multiple | Signal Comparisons | Signal Error | RMSE |
---|---|---|---|
\(1\) | \(0.0045\) | ||
\(2\) | \(0.0033\) | ||
\(3\) | \(0.0043\) | ||
\(4\) | \(0.0011\) | ||
\(5\) | \(0.0043\) | ||
\(6\) | \(0.0014\) | ||
\(7\) | \(0.0048\) | ||
\(8\) | \(0.0026\) |
So although this scaling process made our odd multiples not as accurate as our original algorithm, we have drastically improved our decoding accuracy for the even multiples. This is the other code used.
Here we investigate more complicated tones; in our case we will look at a certain type of chord. The chord will generate by adding various pure tones. The chord that will be analyzed here will be a third chord, so the frequency ratio is a 5:4 ratio. This chord sounds like:
We note when seeing how the the number of bits used affects the accuracy we find out that:
Bits | RMSE | Compressed Sound |
---|---|---|
\(2\) | \(.0162\) | |
\(3\) | \(.0082\) | |
\(4\) | \(.0049\) | |
\(5\) | \(.0021\) | |
\(6\) | \(.0012\) | |
\(7\) | \(7.6247*10^{-4}\) | |
\(8\) | \(4.2358*10^{-4}\) |
Here, we actually decide to try our techniques on an audio sample from the movie Sound of Music . We test to see what happens when we change the bits used, as well if we use our window-scaling or not. The original sample is:
Our results are below:
Bits | Windowed Sound | Windowed RMSE | Non-Windowed Sound | Non-Windowed RMSE |
---|---|---|---|---|
\(2\) | \(0.0124\) | \(0.0148\) | ||
\(3\) | \(0.0066\) | \(0.0090\) | ||
\(7\) | \(6.0616*10^{-4}\) | \(8.9001*10^{-4}\) | ||
\(8\) | \(3.3541*10^{-4}\) | \(4.5884*10^{-4}\) |