For this project we are studying how to compress audio files to make them easier to store. Audio files before compression are stored as an array of point that corresponding to frequencies of sound to play. This takes up a lot of space as the change of frequencies needs to be measured in fractions of a second to be accurate. There is also a lot of waste as the recording process records everything, including frequencies that the human ear can't hear. To try and fix this problem, people have been working on compression algorithms to reduce the amount of waste that raw audio files have. A very common way to compress many things is using a method called Discrete Cosine Transform, which allow use to create a function that reproduces the original product based the location of the bits in the repressing matrix. Though the function is just as long as the original file it also has the property that if you remove the last part of the function This works for many different things, but we have a problem as our sound matrix is one dimensional. We can fix this by using what is call Modified Discrete Cosine Transform to do the job. This uses overlapping vectors to create a sudo-matrix to create the transformation matrix. This is what I use for this project.
This is the basic code we are give to start.
This is the code for parts 1 and 2 in full
This is the code for parts 3 and 5 in full
This is the RMSE function I created to do the RMSE need for part 1,2,3, and 5
This the scirpt that I used to run all the tests
Spoiler Alert, Part 4 Does not need any coding as it is just some on-paper proving.
I will show some part of the code that I changes as I go, but if you want to just look and run the functions for yourself go right ahead.
I proceeded to modify it for the rest of the project.
Investigate the ability of MDCT to represent pure tones. Begin with bits per window of size . Pick a tone of frequency between 100 Hz and 1000 Hz, and calculate the difference (as RMSE) between the original signal and the signal after encoding/decoding. You should cut the original signal to for comparison with the output signal, since the latter lacks n entries at the left and right ends. Plot a short section of the original and decoded signal.
For this one I modified the code to add in some basic plotting stuff shown below.
input_graph_filename and output_graph_filename where added into the necessary input to be the file name to store the images.
t = 1:n*5;subx = x(n*11+1:n*16);subout = out(n*10+1:n*15);figure(1)plot(t,subx,'b-')ylabel('frequency')xlabel('time')title('input sound graph')saveas(gcf, input_graph_filename)figure(2)plot(t,subout,'r-')ylabel('frequency')xlabel('time')title('output sound graph')saveas(gcf, output_graph_filename)It makes it the graphs and then saves them for viewing proposes here.
I also changed the b and n as shown below. I also changed L as it was leading to some unnecessary loss. (b will become a input as of part 2, you can see it as the forth input in both functions)
xxxxxxxxxxn=2^5;b=4;L=1;I also added in and created a RMSE function to get the RMSE for me, you can find it above with the rest of the full length code.
xxxxxxxxxxoutput = RMSE(out, x, n);This is what I inputted and the results I got. (some of the inputs added to the function will be talked about later)
xxxxxxxxxxformat longtone = 242;Fs = 2^(13);solo_tone = cos(((1:Fs)*2*pi*tone)/Fs);out_part_1 = simplecodec(solo_tone,Fs,1,4, 'part_1_input.wav', 'part_1_output.wav','part_1_input_output.png')This is the graph for the input and output (blue is the input wave and red is the output wave respectively)

Input Sound
Output Sound
You can really see and hear the lose that happened because of the compression. We will improve this is part 3.
We can also compare the RMSE to later RMSE found to compare the lose, The RMSE shown below.
xout_part_1 = 0.028628549642607Build chords and evaluate the RMSE as in Step 1. Simple intervals can be constructed by a simple addition of multiple pure tones. Rational ratios of frequencies with low numerators and denominators are pleasing to the ear: A ratio of frequencies gives an octave, ratio gives a third, a gives a fifth, and so forth. How does the RMSE depend on the number of bits used in the coder?
For this one I just added together a bunch of different tones to make an octave, third, and fifth tone respectfully. The code for these waves are down below.
xxxxxxxxxxoctive_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*2)/Fs);third_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*1.5)/Fs) + cos(((1:Fs)*2*pi*tone*3)/Fs);fifth_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*1.25)/Fs) + cos(((1:Fs)*2*pi*tone*2.5)/Fs) + cos(((1:Fs)*2*pi*tone*3.75)/Fs) + cos(((1:Fs)*2*pi*tone*5)/Fs);This is the picture of the two overlapping waves is for the octave is down below.

Input octave sound.
Output octave sound.
This is the respective RMSE for this octave input
xxxxxxxxxxout_octive = 0.023016451657445Next come the third wave input and output picture.

Input third cord sound
Output third sound
This is the RMSE for the third input
xxxxxxxxxxout_third = 0.020258851433030Last comes the fifth cord graph

Input fifth cord sound
Output fifth cord sound
This is the RMSE of the fifth cord input
xxxxxxxxxxout_fifth = 0.018916503315818As you see, the more complex the waves are, the better the compression works. This can also be heard in the sound and is shown in the reduction of the RMSE as well.
The very last part of the question asks me to play with the bits used to compress each window of the coder. I made a fix in the code to make it so the bit window was variable and changed the bit used to five, eight, and two to see what happened.
Five bit window octave graph.

Input five bit octave cord
Output five bit octave cord
Five bit window third graph.

Input five bit third cord
Output five bit third cord
Five bit window fifth graph

Input five bit fifth cord
Output five bit fifth cord
The RMSE for each of these is down below
xxxxxxxxxxout_octive_5_b = 0.013421858756676out_third_5_b = 0.012601301695160out_fifth_5_b = 0.011595492187486I also calculated the difference of each of the 5 bit window compressed RMSE's from the original 4 bit window compressed RMSE's with the answer show below.
xxxxxxxxxx%codeout_part_2_5_4_b_octive_diff = out_octive_5_b - out_octiveout_part_2_5_4_b_third_diff = out_third_5_b - out_thirdout_part_2_5_4_b_fifth_diff = out_fifth_5_b - out_fifth%outputout_part_2_5_4_b_octive_diff = -0.009594592900769out_part_2_5_4_b_third_diff = -0.007657549737871out_part_2_5_4_b_fifth_diff = -0.007321011128332As you can see there was a big increase in quality going from a one bit increase, If we increase the bit rate to eight we should see and decrease in RMSE.
Eight bit window octave graph.

Input eight bit octave cord
Output eight bit octave cord
Eight bit window third graph.

Input eight bit third cord
Output eight bit third cord
Eight bit window fifth graph

Input eight bit fifth cord
Output eight bit fifth cord
The RMSE for each of these is down below
xxxxxxxxxxout_octive_8_b = 0.001588577705241out_third_8_b = 0.001599914601029out_fifth_8_b = 0.001604060168633Again, I calculated the difference between the eight bit window to the original four bit window.
xxxxxxxxxx%codeout_part_2_8_4_b_octive_diff = out_octive_8_b - out_octiveout_part_2_8_4_b_third_diff = out_third_8_b - out_thirdout_part_2_8_4_b_fifth_diff = out_fifth_8_b - out_fifth%outputout_part_2_8_4_b_octive_diff = -0.021427873952204out_part_2_8_4_b_third_diff = -0.018658936832001out_part_2_8_4_b_fifth_diff = -0.017312443147185The RMSE of the eight bit window is far greater than the RMSE given by the four or even five bit window show before an order of magnitude. Given this observation, we can assume that decreasing the window will increase the RMSE and decrease the sound quality, but we need to check to be sure.
Two bit window octave graph.

Input two bit octave cord
Output 2 bit octave cord
Eight bit window third graph.

Input eight bit third cord
Output eight bit third cord
Eight bit window fifth graph

Input eight bit fifth cord
Output eight bit fifth cord
The RSME for each can be seen below.
xxxxxxxxxxout_octive_2_b = 0.067092244350267out_third_2_b = 0.060674919507821out_fifth_2_b = 0.071753051203823I also calculated the differences of the two bit window RMSEs' from the four bit windows RMSEs', again shown below.
xxxxxxxxxx%codeout_part_2_2_4_b_octive_diff = out_octive_2_b - out_octiveout_part_2_2_4_b_third_diff = out_third_2_b - out_thirdout_part_2_2_4_b_fifth_diff = out_fifth_2_b - out_fifth%outputout_part_2_2_4_b_octive_diff = 0.044075792692822out_part_2_2_4_b_third_diff = 0.040416068074791out_part_2_2_4_b_fifth_diff = 0.052836547888005As you can see our prediction was correct and the quality of the sound noticeably decreased when we increase the window size. We can conclude that the larger the window, the better the compression. We can also state, since the window inherently uses more memory as it stores more bits of data when we increase it, the amount of compression decreases with and increase in window size.
A “windowing function” is often used to reduce codec error, due to the fact that the function being represented is not periodic over the window, but is being represented by periodic functions. The windowing function scales the input signal smoothly to zero at each end of the window, partially mitigating this problem. A common choice is to replace with , where
for a length 2n window, where . To undo the windowing function, multiply the inverse MDCT output w componentwise by the same h j . This results in multiplying component wise by the second half of the , and by the first half , before combining into the decoded signal. Compare RMSE, plots, and audible sound as in Steps 1 and 2.
For this I created a new function that is call mysimplecodec because I am uncreative with naming schemes.
It uses the previous function as a base line but includes the create of the windowing function matrix creation shown below.
xxxxxxxxxxfor j=1:2*n h_matrix(j) = (sqrt(2)*sin(((j-(1/2))*pi)/(2*n)));endIt then above matrix uses it in the code to smooth the output sound but multiplying it by the original sound before the compression. It is then reversed when the final compressed sound is created by multiplying it again by the same matrix (see why this work in part 4). below is the compression part of the algorithm with the changed parts surrounded by commits.
xxxxxxxxxxfor k=1:nw-1 % loop over l ength 2n windows x0=x(1+(k-1)*n:2*n+(k-1)*n)'; %CHANGED CODE BELOW x0 = (h_matrix' .* x0); %CHANGED CODE ABOVE %test_length = length(x0); y0=M*x0; y1=round(y0/q); % transform components quantized% Storage/transmission of file occurs here y2=y1*q; % transform components dequantized w(:,k)=N*y2; % invert the MDCT last_h_matrix = h_matrix(n+1:2*n)'; first_h_matrix = h_matrix(1:n)'; if(k>1) w2=w(n+1:2*n,k-1);w3=w(1:n,k); %CHANGED CODE BELOW w2 = last_h_matrix .* w2; w3 = first_h_matrix .* w3; %CHNAGED CODE ABOVE out=[out;(w2+w3)/2]; % collect the reconstructed signal end % (of length 2n less than length of x)endWith this new function I then created a new graphs for the four different sounds I created before: the solo, octave, third, and fifth tone respectively. All of the output is shown below.
Solo mysimplecodec sound graph

Input solo mysimplecodec sound file
Ouput solo mysimplecodec sound file
octave mysimplecodec sound graph

Input octave mysimplecodec sound
Output octave mysimplecodec sound
Third mysimplecodec sound graph

Input third mysimplecodec sound
Output third mysimplecodec sound
Fifth mysimplecodec sound graph

Input fifth mysimplecodec sound
Output fifth simplecodec sound
This is the RMSE of all the outputs for this part.
xxxxxxxxxxout_part_3_solo = 0.010451672559472out_part_3_octive = 0.012513506907362out_part_3_third = 0.013369124954884out_part_3_fifth = 0.015059057541787Also, for comparison purpose, here is the differences in the RMSE from the other 4 bit windows.
xxxxxxxxxx%codeout_part_3_solo_diff = out_part_3_solo - out_part_1out_part_3_octive_diff = out_part_3_octive - out_octiveout_part_3_third_diff = out_part_3_third - out_thirdout_part_3_fifth_diff = out_part_3_fifth - out_fifth%outputout_part_3_solo_diff = -0.018176877083135out_part_3_octive_diff = -0.010502944750083out_part_3_third_diff = -0.006889726478146out_part_3_fifth_diff = -0.003857445774032As you can see, the windowing function really help when you are trying to smooth out single sounds, but as the complexity increases, the benefit of the windowing function decreases.
Explain the method for undoing the windowing that is suggested in Step 3. In other words, assume that if and are each multiplied componentwise by the entire windowing function , and and in equation (11.38) are each multiplied componentwise by , that equation (11.39) still holds.
For this we are just looking into the math of the compression and explain why it work. More specifically why the windowing function does not add any additional sound to the original sound when we multiply by it twice. To understand why this happens we need to see what we we do to the each window.
To start, we take 2 windows to out of the sound file, where there is some overlap between the two.
and
to start off we multiply each one of these by the windowing matrix. For each individual point of the windowing matrix, it will be represented by .
and
Once this is done, we move on to the next part where we continue the normal MDCT to compress the sound by representing the matrix by adding the compression matrix in, represented by
and
When we get here we multiply by windowing function again.
and
Based on what the windowing function is suppose to do, if we add together the to the bits we should get an estimate for the original function.
How does this work? To understand this, we must go back to the and see what the windowing function is and then reduce it.
Looking at this, we see that we have to crack open a trig book again and remember how a sin function works. To start, the function is symmetric. This means that if you run through all the values and then subtract those values from the values you get by running function with the input you will get zero. Now knowing this we cancel out some of our equation.
To finish this, we need to go back to even more trig. What we must do next is write the larger in terms of the smaller (this will make sense in a second). The one major difference in each is where does function start. For we iterate through so goes through , goes through , goes through , and lastly goes through . We can then conclude the only thing we need to do to to make the input of the smaller indexed function into the larger indexed function is to add to the input.
After some algebra, we can see that that the input is just shifted by . If you remembered your trig, you would know that shifted by is the same thing as
With this information, we can go input this back into the original function and finish up.
If you go back to the trig identity law, you would remember that with this we can get rid of the two functions and take out there common factor, . This leaves us with just what we wanted, and .
With this, we prove that the smoothing function does not change the output of original function.
Import a .wav file with the MATLAB audioread command, or download an audio file of your choice. (Alternatively, load handel can be used. If you download a stereo file, you will need to work with each channel separately.) Reproduce the file (or a segment of it) using various values of b and with and without windowing. Compute RMSE for your choices of parameters and exhibit the results using the sound command.
For this one, I took two sound bits from one of my favorite movies, Monty Python and the Holy Grail and inputted into Matlab to see what happen. I also changed around the bit window to four, five, eight, and two and used both the function made in this project to compare the file compression.
Below are the sound files before I started compression them, also above the sound file is the name for each sound file so that you can know what graph goes to what sound file.
Newt.
runawayrun
I also added in a waittime variable to be input so that the time between the input and output does not overlap each other. In addition I added in a Fs (sample rate) variable as the sample rate is variable for a sound file and needs to be set per file.
xxxxxxxxxxfunction output=mysimplecodec(x,Fs,waittime,b, original_file_name, new_file_name, input_output_graph_filename)...%Fs=2^(13); % Fs=sampling rate...pause(waittime)First, lets look at both of the sound file compress with just a four bit window with and with out compresion.
Four bit without windowing Newt sound graph

Four bit without windowing Newt sound input
Four bit without windowing Newt sound output
Four bit with windowing Newt sound graph

Four bit with windowing Newt sound input
Four bit without windowing Newt sound output
Four bit without windowing runawayrun sound graph

Four bit without windowing runawayrun sound input
Four bit without windowing runawayrun sound output
Four bit with windowing runawayrun sound graph

Four bit with windowing runawayrun sound input
Four bit without windowing runawayrun sound output
This is the RMSE of each function
xxxxxxxxxxpart_5_simple_Newt = 0.012570623823620part_5_simple_run = 0.019979576146797 part_5_mysimple_Newt = 0.011238281579624part_5_mysimple_run = 0.017924222667254This is the difference between the RMSE of the two different functions
xxxxxxxxxx%codepart_5_Newt_diff = part_5_mysimple_Newt - part_5_simple_Newtpart_5_run_diff = part_5_mysimple_run - part_5_simple_run%outputpart_5_Newt_diff = -0.001332342243995part_5_run_diff = -0.002055353479543We can see and hear that the compression of these two sound file is not bad. You can tell very clearly that they are compress, but they audio is from a movie made in 1975 so having the audio being this clear is impressive. We can also see that the RMSE of the windowed function is less then the nonwindowed function, witch is to be expected. The audio of the two function is clearly different and is most easily heard during the pauses in audio.
Five bit without windowing Newt sound graph

Five bit without windowing Newt sound input
Five bit without windowing Newt sound output
Five bit with windowing Newt sound graph

Five bit with windowing Newt sound input
Five bit without windowing Newt sound output
Five bit without windowing runawayrun sound graph

Five bit without windowing runawayrun sound input
Five bit without windowing runawayrun sound output
Five bit with windowing runawayrun sound graph

Five bit with windowing runawayrun sound input
Five bit without windowing runawayrun sound output
This is the RMSE of each function
xxxxxxxxxxpart_5_5_b_simple_Newt = 0.007972959068517part_5_5_b_simple_run = 0.011557042409937part_5_5_b_mysimple_Newt = 0.006890323260522part_5_5_b_mysimple_run = 0.009674409889937This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.
xxxxxxxxxx%codepart_5_5_b_Newt_simple_diff = part_5_simple_Newt - part_5_5_b_simple_Newtpart_5_5_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_5_b_mysimple_Newtpart_5_5_b_run_simple_diff = part_5_simple_run - part_5_5_b_simple_runpart_5_5_b_run_mysimple_diff = part_5_mysimple_run - part_5_5_b_mysimple_runpart_5_5_b_Newt_diff = part_5_5_b_simple_Newt - part_5_5_b_mysimple_Newtpart_5_5_b_run_diff = part_5_5_b_simple_run - part_5_5_b_mysimple_run%outputpart_5_5_b_Newt_simple_diff = 0.004597664755103part_5_5_b_Newt_mysimple_diff = 0.004347958319102part_5_5_b_run_simple_diff = 0.008422533736860part_5_5_b_run_mysimple_diff = 0.008249812777317part_5_5_b_Newt_diff = 0.001082635807995part_5_5_b_run_diff = 0.001882632520000As expected, the sound quality is notably better with five bit windows instead of four bit windows. The difference amount is not as much as the previous window change. Also the windowing function did not seem to make that much of an impact at all compared to the previous test.
Eight bit without windowing Newt sound graph

Eight bit without windowing Newt sound input
Eight bit without windowing Newt sound output
Eight bit with windowing Newt sound graph

Eight bit with windowing Newt sound input
Eight bit without windowing Newt sound output
Eight bit without windowing runawayrun sound graph

Eight bit without windowing runawayrun sound input
Eight bit without windowing runawayrun sound output
Eight bit with windowing runawayrun sound graph

Eight bit with windowing runawayrun sound input
Eight bit without windowing runawayrun sound output
This is the RMSE of each function
xxxxxxxxxxpart_5_8_b_simple_Newt = 0.001537065085831part_5_8_b_simple_run = 0.001595168490493part_5_8_b_mysimple_Newt = 0.001491193856767part_5_8_b_mysimple_run = 0.001452684156640This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.
xxxxxxxxxx%codepart_5_8_b_Newt_simple_diff = part_5_simple_Newt - part_5_8_b_simple_Newtpart_5_8_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_8_b_mysimple_Newtpart_5_8_b_run_simple_diff = part_5_simple_run - part_5_8_b_simple_runpart_5_8_b_run_mysimple_diff = part_5_mysimple_run - part_5_8_b_mysimple_runpart_5_8_b_Newt_diff = part_5_8_b_simple_Newt - part_5_8_b_mysimple_Newtpart_5_8_b_run_diff = part_5_8_b_simple_run - part_5_8_b_mysimple_run%outputpart_5_8_b_Newt_simple_diff = 0.011033558737789part_5_8_b_Newt_mysimple_diff = 0.009747087722857part_5_8_b_run_simple_diff = 0.018384407656304part_5_8_b_run_mysimple_diff = 0.016471538510613part_5_8_b_Newt_diff = 4.587122906396751e-05part_5_8_b_run_diff = 1.424843338528824e-04As expected the eight bit window had a really low RMSE compare to the four bit window by an order of magnitude again. The difference between the window and non windowed function is not as large because there is not as much error to get rid of to start with. Other than that it was as expected.
Two bit without windowing Newt sound graph

Two bit without windowing Newt sound input
Two bit without windowing Newt sound output
Two bit with windowing Newt sound graph

Two bit with windowing Newt sound input
Two bit without windowing Newt sound output
Two bit without windowing runawayrun sound graph

Two bit without windowing runawayrun sound input
Two bit without windowing runawayrun sound output
Two bit with windowing runawayrun sound graph

Two bit with windowing runawayrun sound input
Two bit without windowing runawayrun sound output
This is the RMSE of each function
xxxxxxxxxxpart_5_2_b_simple_Newt = 0.031724613944732part_5_2_b_simple_run = 0.058912805988006part_5_2_b_mysimple_Newt = 0.031340472077966part_5_2_b_mysimple_run = 0.058427901101464This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.
xxxxxxxxxx%codepart_5_2_b_Newt_simple_diff = part_5_simple_Newt - part_5_2_b_simple_Newtpart_5_2_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_2_b_mysimple_Newtpart_5_2_b_run_simple_diff = part_5_simple_run - part_5_2_b_simple_runpart_5_2_b_run_mysimple_diff = part_5_mysimple_run - part_5_2_b_mysimple_runpart_5_2_b_Newt_diff = part_5_2_b_simple_Newt - part_5_2_b_mysimple_Newtpart_5_2_b_run_diff = part_5_2_b_simple_run - part_5_2_b_mysimple_run%outputpart_5_2_b_Newt_simple_diff = -0.019153990121112part_5_2_b_Newt_mysimple_diff = -0.020102190498342part_5_2_b_run_simple_diff = -0.038933229841209part_5_2_b_run_mysimple_diff = -0.040503678434210part_5_2_b_Newt_diff = 3.841418667655458e-04part_5_2_b_run_diff = 4.849048865422714e-04For this one, there was a massive loss it quality to the point that the sound is barely distinguishable from the original. The RMSE are also massive, but the windowing function does not seem to do much at all for the RMSE. This could be because with how low the quality was there was not much to fix, I would have to look into it more to get a certain answer for it.
This part was more of a proof that this codec can be use to compress more normal sound files rather than pure tone made in matlab. I would have to say this was a success, the next thing I would do would be to take a bunch of audio files and see what window size has the least amount of total quality loss with the most amount of size reduction.