Project 5: Audio Compression
Garrett LeeThe goal for project 5 was learn to how use matlab to compress and ucompress audio or images. I chose to work on the audio portion of the project since I have wondered before how compressing and uncompressing music can change its quality and working with it first hand, even at a very basic level, would give me the insight I desired. Throughout this project I learned how to generate simple tones and chords in matlab, compress and compress the tones using a simple codec both with and without sampling, and, to a limited extent, learn the importance of sampling by putting emphasis on mmore dominant parts of the audio.
Part 1Part 1 was a simple application of audio compression. Using simple codec I was able to compress and uncompress a matlab generated tone. I then compared the original tone to the newly altered one by looking at their RMSE for various frequencies. What I found was that for odd integer multiples of 64 the frequencies the RMSE was lower than at the the Even integer values. This can be verifeid useing part 1 code .
Table of Results for Part 1
| Audio Function | n | f | RMSE |
|---|---|---|---|
| cos((1:2^(12))*2*pi*64*n/2^13) | 1 | 64 | 0.0021 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 2 | 128 | 0.0095 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 3 | 192 | 0.0017 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 4 | 256 | 0.0091 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 5 | 320 | 0.0011 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 6 | 384 | 0.0097 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 7 | 448 | 0.0016 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 8 | 512 | 0.0095 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 9 | 576 | 0.0011 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 10 | 640 | 0.0089 |
In part 2 we introduced the windowing function hj to our compression code which we now called simplecodec2 and ran using project5part2 . The windowing function:
which is multplied against against our audio vector through several iterations inorder to have it go smoothly to zero at the ends. This process is then repeated for our MDCT matrix. This proecess ultimatley reduced the RMSE for all values, but has it's greates influence on the even integer multiples of 64 from before, as displayed below. Though the RMSE for the already near matching functions, the odd multiples, actually increased.
Table of Results for Part 2
| Audio Function | n | f | RMSE |
|---|---|---|---|
| cos((1:2^(12))*2*pi*64*n/2^13) | 1 | 64 | 0.0045 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 2 | 128 | 0.0033 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 3 | 192 | 0.0043 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 4 | 256 | 0.0011 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 5 | 320 | 0.0043 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 6 | 384 | 0.0014 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 7 | 448 | 0.0048 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 8 | 512 | 0.0026 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 9 | 576 | 0.0043 |
| cos((1:2^(12))*2*pi*64*n/2^13) | 10 | 640 | 0.0022 |
In part 3 we stepped away form working with audio to show that windowing function worked for in general for some given equations in the book, which is handled in the attached image.This demonstrates that h*NMZ1 and h*NMZ2 still allows equation 11.39 form the text to hold.
In Part 4 we expreiment with long tones by assembling chords and using a different number of bits. A cord is formed by adding together waves forms of different frequencies. To demonstrate this I created 3 simple chords using the ratios 2:1 which is an octave, 5:4 which is a third, and an off beat one 6.5:8 which I called schism in reference to the unique time signature of the song of the same name by the rock band tool. Now, the most important aspect of this part of the project was the variation of bits and its effect on RMSE.
| Chord | data | Graph for 5 Bits | Graph for 50 Bits | 2:1 |
|
![]() |
![]() |
5:4 |
|
![]() |
![]() |
6.5:8 |
|
![]() |
![]() |
|---|
So in conclusion we see that as the number of bits allocated to each point increases, we see a decrease in RMSE. Though past 25 is seemingly overkill for MATLAB.
Relevant Code:
Part 5With part 5 we will now work with a complete audio sample. For the purposes of the project I chose an instrumental piece "Lost... Broken Shards" from the Xenogears OST by Yasunori Mitsuda. The audio track was then compressed and uncompressed both with and without the windowing function and the reults were compared. I could not hear a noticeable difference between the two tracks, but there were differences in the RMSE which means at certain positions we do hold more true to the original track. In particular the lower RMSE was found with the windowing function included. To properly test the track we used a sampling frequency of 44100, and we allocated between 1 and 10 bits, though I only recorded the audio for the 10 bits section.
| Windowing Function | data | No! |
|
Yes! |
|
|---|
Relevant Code:
Part 6For part 6 I implemented conditional sampling. Using a simple algorithms that compared the value of the function to it's maximum value I allocated to that value between 1 - 4 bits for sampling. Points that were near 0 were assigned 1 and points that were near the maximum were assigned the full 4 bits. The code could be generalized to any number of bits, but that would require a lot of work for how I think it would have to be done, and there is not enough time to do so.
| f | RMSE | Plot |
|---|---|---|
| 64 | 0.0224 | ![]() |
| 128 | 0.00329 | ![]() |
Relevant Code:
Part 7For part 7 I broke down the codec function into two components. The first component, coder, takes the initial wave and encodes the it using a set number of bits, which was 16256 using 4 bits per component. The second part of the section involved building a decoder that could reconstruct the original wave. This was done to the best of my ability and when compared against the the original simple codec it produced a similar tone.
Relevant Code: