Project 5

By Mark Schools

Preface

For this project we are studying how to compress audio files to make them easier to store. Audio files before compression are stored as an array of point that corresponding to frequencies of sound to play. This takes up a lot of space as the change of frequencies needs to be measured in fractions of a second to be accurate. There is also a lot of waste as the recording process records everything, including frequencies that the human ear can't hear. To try and fix this problem, people have been working on compression algorithms to reduce the amount of waste that raw audio files have. A very common way to compress many things is using a method called Discrete Cosine Transform, which allow use to create a function that reproduces the original product based the location of the bits in the repressing matrix. Though the function is just as long as the original file it also has the property that if you remove the last part of the function This works for many different things, but we have a problem as our sound matrix is one dimensional. We can fix this by using what is call Modified Discrete Cosine Transform to do the job. This uses overlapping vectors to create a sudo-matrix to create the transformation matrix. This is what I use for this project.

This is the basic code we are give to start.

This is the code for parts 1 and 2 in full

This is the code for parts 3 and 5 in full

This is the RMSE function I created to do the RMSE need for part 1,2,3, and 5

This the scirpt that I used to run all the tests

Spoiler Alert, Part 4 Does not need any coding as it is just some on-paper proving.

I will show some part of the code that I changes as I go, but if you want to just look and run the functions for yourself go right ahead.

I proceeded to modify it for the rest of the project.

Part 1

Question

Investigate the ability of MDCT to represent pure tones. Begin with $b = 4$ bits per window of size $n = 32$ . Pick a tone of frequency between 100 Hz and 1000 Hz, and calculate the difference (as RMSE) between the original signal and the signal after encoding/decoding. You should cut the original signal to $xshort = x(n+1: end-n);$ for comparison with the output signal, since the latter lacks n entries at the left and right ends. Plot a short section of the original and decoded signal.

For this one I modified the code to add in some basic plotting stuff shown below.

input_graph_filename and output_graph_filename where added into the necessary input to be the file name to store the images.


t = 1:n*5;
subx = x(n*11+1:n*16);
subout = out(n*10+1:n*15);
figure(1)
plot(t,subx,'b-')
ylabel('frequency')
xlabel('time')
title('input sound graph')
saveas(gcf, input_graph_filename)
figure(2)
plot(t,subout,'r-')
ylabel('frequency')
xlabel('time')
title('output sound graph')
saveas(gcf, output_graph_filename)

It makes it the graphs and then saves them for viewing proposes here.

I also changed the b and n as shown below. I also changed L as it was leading to some unnecessary loss. (b will become a input as of part 2, you can see it as the forth input in both functions)


xxxxxxxxxx
n=2^5;
b=4;
L=1;

I also added in and created a RMSE function to get the RMSE for me, you can find it above with the rest of the full length code.


xxxxxxxxxx
output = RMSE(out, x, n);

This is what I inputted and the results I got. (some of the inputs added to the function will be talked about later)


xxxxxxxxxx
format long
tone = 242;
Fs = 2^(13);
solo_tone = cos(((1:Fs)*2*pi*tone)/Fs);
out_part_1 = simplecodec(solo_tone,Fs,1,4, 'part_1_input.wav', 'part_1_output.wav','part_1_input_output.png')

This is the graph for the input and output (blue is the input wave and red is the output wave respectively)

Part 1 input sound graph

Input Sound

Output Sound

You can really see and hear the lose that happened because of the compression. We will improve this is part 3.

We can also compare the RMSE to later RMSE found to compare the lose, The RMSE shown below.


x
out_part_1 =
   0.028628549642607

Part 2

Question

Build chords and evaluate the RMSE as in Step 1. Simple intervals can be constructed by a simple addition of multiple pure tones. Rational ratios of frequencies with low numerators and denominators are pleasing to the ear: A $2 : 1$ ratio of frequencies gives an octave, $1.5 : 1$ ratio gives a third, a $1.25 : 1$ gives a fifth, and so forth. How does the RMSE depend on the number of bits used in the coder?

For this one I just added together a bunch of different tones to make an octave, third, and fifth tone respectfully. The code for these waves are down below.


xxxxxxxxxx
octive_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*2)/Fs);
third_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*1.5)/Fs) + cos(((1:Fs)*2*pi*tone*3)/Fs);
fifth_tone = cos(((1:Fs)*2*pi*tone)/Fs) + cos(((1:Fs)*2*pi*tone*1.25)/Fs) + cos(((1:Fs)*2*pi*tone*2.5)/Fs) + cos(((1:Fs)*2*pi*tone*3.75)/Fs) + cos(((1:Fs)*2*pi*tone*5)/Fs);

This is the picture of the two overlapping waves is for the octave is down below.

Part 1 input sound graph

Input octave sound.

Output octave sound.

This is the respective RMSE for this octave input


xxxxxxxxxx
out_octive =
   0.023016451657445

Next come the third wave input and output picture.

Part 1 input sound graph

Input third cord sound

Output third sound

This is the RMSE for the third input


xxxxxxxxxx
out_third =
   0.020258851433030

Last comes the fifth cord graph

Part 1 input sound graph

Input fifth cord sound

Output fifth cord sound

This is the RMSE of the fifth cord input


xxxxxxxxxx
out_fifth =
   0.018916503315818

As you see, the more complex the waves are, the better the compression works. This can also be heard in the sound and is shown in the reduction of the RMSE as well.

The very last part of the question asks me to play with the bits used to compress each window of the coder. I made a fix in the code to make it so the bit window was variable and changed the bit used to five, eight, and two to see what happened.

Five bit window octave graph.

Part 1 input sound graph

Input five bit octave cord

Output five bit octave cord

Five bit window third graph.

Part 1 input sound graph

Input five bit third cord

Output five bit third cord

Five bit window fifth graph

Part 1 input sound graph

Input five bit fifth cord

Output five bit fifth cord

The RMSE for each of these is down below


xxxxxxxxxx
out_octive_5_b =
   0.013421858756676
out_third_5_b =
   0.012601301695160
out_fifth_5_b =
   0.011595492187486

I also calculated the difference of each of the 5 bit window compressed RMSE's from the original 4 bit window compressed RMSE's with the answer show below.


xxxxxxxxxx
%code
out_part_2_5_4_b_octive_diff = out_octive_5_b - out_octive
out_part_2_5_4_b_third_diff = out_third_5_b - out_third
out_part_2_5_4_b_fifth_diff = out_fifth_5_b - out_fifth
%output
out_part_2_5_4_b_octive_diff =
  -0.009594592900769
out_part_2_5_4_b_third_diff =
  -0.007657549737871
out_part_2_5_4_b_fifth_diff =
  -0.007321011128332

As you can see there was a big increase in quality going from a one bit increase, If we increase the bit rate to eight we should see and decrease in RMSE.

Eight bit window octave graph.

Part 1 input sound graph

Input eight bit octave cord

Output eight bit octave cord

Eight bit window third graph.

Part 1 input sound graph

Input eight bit third cord

Output eight bit third cord

Eight bit window fifth graph

Part 1 input sound graph

Input eight bit fifth cord

Output eight bit fifth cord

The RMSE for each of these is down below


xxxxxxxxxx
out_octive_8_b =
   0.001588577705241
out_third_8_b =
   0.001599914601029
out_fifth_8_b =
   0.001604060168633

Again, I calculated the difference between the eight bit window to the original four bit window.


xxxxxxxxxx
%code
out_part_2_8_4_b_octive_diff = out_octive_8_b - out_octive
out_part_2_8_4_b_third_diff = out_third_8_b - out_third
out_part_2_8_4_b_fifth_diff = out_fifth_8_b - out_fifth
%output
out_part_2_8_4_b_octive_diff =
  -0.021427873952204
out_part_2_8_4_b_third_diff =
  -0.018658936832001
out_part_2_8_4_b_fifth_diff =
  -0.017312443147185

The RMSE of the eight bit window is far greater than the RMSE given by the four or even five bit window show before an order of magnitude. Given this observation, we can assume that decreasing the window will increase the RMSE and decrease the sound quality, but we need to check to be sure.

Two bit window octave graph.

Part 1 input sound graph

Input two bit octave cord

Output 2 bit octave cord

Eight bit window third graph.

Part 1 input sound graph

Input eight bit third cord

Output eight bit third cord

Eight bit window fifth graph

Part 1 input sound graph

Input eight bit fifth cord

Output eight bit fifth cord

The RSME for each can be seen below.


xxxxxxxxxx
out_octive_2_b =
   0.067092244350267
out_third_2_b =
   0.060674919507821
out_fifth_2_b =
   0.071753051203823

I also calculated the differences of the two bit window RMSEs' from the four bit windows RMSEs', again shown below.


xxxxxxxxxx
%code
out_part_2_2_4_b_octive_diff = out_octive_2_b - out_octive
out_part_2_2_4_b_third_diff = out_third_2_b - out_third
out_part_2_2_4_b_fifth_diff = out_fifth_2_b - out_fifth
%output
out_part_2_2_4_b_octive_diff =
   0.044075792692822
out_part_2_2_4_b_third_diff =
   0.040416068074791
out_part_2_2_4_b_fifth_diff =
   0.052836547888005

As you can see our prediction was correct and the quality of the sound noticeably decreased when we increase the window size. We can conclude that the larger the window, the better the compression. We can also state, since the window inherently uses more memory as it stores more bits of data when we increase it, the amount of compression decreases with and increase in window size.

Part 3

Question

A “windowing function” is often used to reduce codec error, due to the fact that the function being represented is not periodic over the window, but is being represented by periodic functions. The windowing function scales the input signal $x$ smoothly to zero at each end of the window, partially mitigating this problem. A common choice is to replace $x_j$ with $x_jh_j$ , where

$h _j = \sqrt 2 \sin (\frac{(j − \frac{1} {2} )π}{\ 2n})$

for a length 2n window, where $j = 1,...,2n$ . To undo the windowing function, multiply the inverse MDCT output w componentwise by the same h j . This results in multiplying $w_2$ component wise by the second half of the $h_j$ , $j = n + 1,...,2n,$ and $w_3$ by the first half $h_j$ , $j = 1,...,n$ before combining into the decoded signal. Compare RMSE, plots, and audible sound as in Steps 1 and 2.

For this I created a new function that is call mysimplecodec because I am uncreative with naming schemes.

It uses the previous function as a base line but includes the create of the windowing function matrix creation shown below.


xxxxxxxxxx
for j=1:2*n
    h_matrix(j) = (sqrt(2)*sin(((j-(1/2))*pi)/(2*n)));
end

It then above matrix uses it in the code to smooth the output sound but multiplying it by the original sound before the compression. It is then reversed when the final compressed sound is created by multiplying it again by the same matrix (see why this work in part 4). below is the compression part of the algorithm with the changed parts surrounded by commits.


xxxxxxxxxx
for k=1:nw-1                          % loop over l ength 2n windows
  x0=x(1+(k-1)*n:2*n+(k-1)*n)';
  %CHANGED CODE BELOW
  x0 = (h_matrix' .* x0);
  %CHANGED CODE ABOVE
  %test_length = length(x0);
  y0=M*x0;
  y1=round(y0/q);                     % transform components quantized
% Storage/transmission of file occurs here  
  y2=y1*q;                            % transform components dequantized
  w(:,k)=N*y2;                        % invert the MDCT
  last_h_matrix = h_matrix(n+1:2*n)';
  first_h_matrix = h_matrix(1:n)';
  if(k>1)
      w2=w(n+1:2*n,k-1);w3=w(1:n,k);
      %CHANGED CODE BELOW
      w2 = last_h_matrix .* w2;
      w3 = first_h_matrix .* w3;
      %CHNAGED CODE ABOVE
      out=[out;(w2+w3)/2];          % collect the reconstructed signal
  end                                 % (of length 2n less than length of x)
end

With this new function I then created a new graphs for the four different sounds I created before: the solo, octave, third, and fifth tone respectively. All of the output is shown below.

Solo mysimplecodec sound graph

Part 1 input sound graph

Input solo mysimplecodec sound file

Ouput solo mysimplecodec sound file

octave mysimplecodec sound graph

Part 1 input sound graph

Input octave mysimplecodec sound

Output octave mysimplecodec sound

Third mysimplecodec sound graph

Part 1 input sound graph

Input third mysimplecodec sound

Output third mysimplecodec sound

Fifth mysimplecodec sound graph

Part 1 input sound graph

Input fifth mysimplecodec sound

Output fifth simplecodec sound

This is the RMSE of all the outputs for this part.


xxxxxxxxxx
out_part_3_solo =
   0.010451672559472
out_part_3_octive =
   0.012513506907362
out_part_3_third =
   0.013369124954884
out_part_3_fifth =
   0.015059057541787

Also, for comparison purpose, here is the differences in the RMSE from the other 4 bit windows.


xxxxxxxxxx
%code
out_part_3_solo_diff = out_part_3_solo - out_part_1
out_part_3_octive_diff = out_part_3_octive - out_octive
out_part_3_third_diff = out_part_3_third - out_third
out_part_3_fifth_diff = out_part_3_fifth - out_fifth
%output
out_part_3_solo_diff =
  -0.018176877083135
out_part_3_octive_diff =
  -0.010502944750083
out_part_3_third_diff =
  -0.006889726478146
out_part_3_fifth_diff =
  -0.003857445774032

As you can see, the windowing function really help when you are trying to smooth out single sounds, but as the complexity increases, the benefit of the windowing function decreases.

Part 4

Question

Explain the method for undoing the windowing that is suggested in Step 3. In other words, assume that if $Z_1$ and $Z_2$ are each multiplied componentwise by the entire windowing function $h$ , and $NMZ_1$ and $NMZ_2$ in equation (11.38) are each multiplied componentwise by $h$ , that equation (11.39) still holds.

For this we are just looking into the math of the compression and explain why it work. More specifically why the windowing function does not add any additional sound to the original sound when we multiply by it twice. To understand why this happens we need to see what we we do to the each window.

To start, we take 2 windows to out of the sound file, where there is some overlap between the two.

$Z_1 = \begin{bmatrix}{x_1\\ x_2 \\ x_3\\x_4}\end{bmatrix}$ and $Z_2 = \begin{bmatrix}{x_3\\ x_4 \\ x_5\\x_6}\end{bmatrix}$

to start off we multiply each one of these by the windowing matrix. For each individual point of the windowing matrix, it will be represented by $h_{something}$ .

$Z_1 = \begin{bmatrix}{x_1h_1\\ x_2h_2 \\ x_3h_3\\x_4h_4}\end{bmatrix}$ and $Z_2 = \begin{bmatrix}{x_3h_1\\ x_4h_2 \\ x_5h_3\\x_6h_4}\end{bmatrix}$

Once this is done, we move on to the next part where we continue the normal MDCT to compress the sound by representing the matrix by adding the compression matrix in, represented by $R$

$Z_1 = \begin{bmatrix}{x_1h_1 - R(x_2h_2)\\ -R(x_1h_1)+x_2h_2 \\ x_3h_3+R(x_4h_4)\\R(x_3h_3)+x_4h_4}\end{bmatrix}$ and $Z_2 = \begin{bmatrix}{x_3h_1 - R(x_4h_2)\\ -R(x_3h_1)+x_4h_2 \\ x_5h_3+R(x_6h_4)\\R(x_5h_3)x_6h_4}\end{bmatrix}$

When we get here we multiply by windowing function again.

$Z_1 = \begin{bmatrix}{x_1h_1^2 - h_1(R(x_2h_2))\\ -h_2(R(x_1h_1))+x_2h_2^2 \\ x_3h_3^2+h_3(R(x_4h_4))\\h_4(R(x_3h_3))+x_4h_4^2}\end{bmatrix}$ and $Z_2 = \begin{bmatrix}{x_3h_1^2 - h_1(R(x_4h_2))\\ h_2(-R(x_3h_1))+x_4h_2^2 \\ x_5h_3^2+h_3(R(x_6h_4))\\h_4(R(x_5h_3))+x_6h_4^2}\end{bmatrix}$

Based on what the windowing function is suppose to do, if we add together the to the bits we should get an estimate for the original function.

$\frac{1}{2}(x_3h_3^2 + h_3(R(x_4h_4)) + x_3h_1^2 - h_1(R(x_4h_2))) = x_3$

$\frac{1}{2}(h_4(R(x_3h_3))+x_4h_4^2-h_2(R(x_3h_1))+x_4h_2^2) = x_4$

How does this work? To understand this, we must go back to the and see what the windowing function is and then reduce it.

$h _j = \sqrt 2 \sin (\frac{(j − \frac{1} {2} )π}{\ 2n})$

Looking at this, we see that we have to crack open a trig book again and remember how a sin function works. To start, the function is symmetric. This means that if you run through all the values $[1,n]$ and then subtract those values from the values you get by running function with the input $[n+1,2n]$ you will get zero. Now knowing this we cancel out some of our equation.

$\frac{1}{2}(x_3h_3^2 +\enclose{horizontalstrike}{ h_3(R(x_4h_4))} + x_3h_1^2 - \enclose{horizontalstrike}{h_1(R(x_4h_2))}) = x_3$

$\frac{1}{2}(\enclose{horizontalstrike}{h_4(R(x_3h_3))}+x_4h_4^2-\enclose{horizontalstrike}{h_2(R(x_3h_1))}+x_4h_2^2) = x_4$

$\frac{1}{2}x_3(h_3^2+h_1^2) = x_3$

$\frac{1}{2}x_4(h_4^2+h_2^2) = x_4$

To finish this, we need to go back to even more trig. What we must do next is write the larger $h$ in terms of the smaller $h$ (this will make sense in a second). The one major difference in each $h$ is where does function start. For $h$ we iterate through $[1,2n]$ so $h_1$ goes through $[1,\frac{n}{2}]$ , $h_2$ goes through $[\frac{n}{2}+1,n]$ , $h_3$ goes through $n+1,\frac{3n}{2}$ , and lastly $h_4$ goes through $[\frac{3n}{2}+1,2n]$ . We can then conclude the only thing we need to do to to make the input of the smaller indexed function into the larger indexed function is to add $n+1$ to the input.

$h _j = \sqrt 2 \sin (\frac{(j +n+1 − \frac{1} {2} )π}{\ 2n})$

After some algebra, we can see that that the input is just shifted by $\frac{\pi}{2}$ . If you remembered your trig, you would know that $sin$ shifted by $\frac{\pi}{2}$ is the same thing as $cos$

$h _j = \sqrt 2 \sin(\frac{\pi}{2}+\frac{(j − \frac{1} {2} )\pi}{\ 2n} )= \sqrt 2 \cos( \frac{(j − \frac{1} {2} )π}{\ 2n})$

With this information, we can go input this back into the original function and finish up.

$\frac{1}{2}x_3((\sqrt 2 \sin ( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2 +(\sqrt 2 \cos( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2)$

If you go back to the trig identity law, you would remember that $\cos(x)^2 +\sin(x)^2 = 1$ with this we can get rid of the two functions and take out there common factor, $2$ . This leaves us with just what we wanted, $x_3$ and $x_4$ .

$\frac{1}{2}x_32\enclose{horizontalstrike}{((\sin ( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2 +( \cos( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2)}$

$\frac{1}{2}x_42\enclose{horizontalstrike}{((\sin ( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2 +( \cos( \frac{(j − \frac{1} {2} )π}{\ 2n}))^2)}$

$x_3=x_3$

$x_4 = x_4$

With this, we prove that the smoothing function does not change the output of original function.

Part 5

Question

Import a .wav file with the MATLAB audioread command, or download an audio file of your choice. (Alternatively, load handel can be used. If you download a stereo file, you will need to work with each channel separately.) Reproduce the file (or a segment of it) using various values of b and with and without windowing. Compute RMSE for your choices of parameters and exhibit the results using the sound command.

For this one, I took two sound bits from one of my favorite movies, Monty Python and the Holy Grail and inputted into Matlab to see what happen. I also changed around the bit window to four, five, eight, and two and used both the function made in this project to compare the file compression.

Below are the sound files before I started compression them, also above the sound file is the name for each sound file so that you can know what graph goes to what sound file.

Newt.

runawayrun

I also added in a waittime variable to be input so that the time between the input and output does not overlap each other. In addition I added in a Fs (sample rate) variable as the sample rate is variable for a sound file and needs to be set per file.


xxxxxxxxxx
function output=mysimplecodec(x,Fs,waittime,b, original_file_name, new_file_name, input_output_graph_filename)
...
%Fs=2^(13);                            % Fs=sampling rate
...
pause(waittime)

First, lets look at both of the sound file compress with just a four bit window with and with out compresion.

Four bit without windowing Newt sound graph

Part 1 input sound graph

Four bit without windowing Newt sound input

Four bit without windowing Newt sound output

Four bit with windowing Newt sound graph

Part 1 input sound graph

Four bit with windowing Newt sound input

Four bit without windowing Newt sound output

Four bit without windowing runawayrun sound graph

Part 1 input sound graph

Four bit without windowing runawayrun sound input

Four bit without windowing runawayrun sound output

Four bit with windowing runawayrun sound graph

Part 1 input sound graph

Four bit with windowing runawayrun sound input

Four bit without windowing runawayrun sound output

This is the RMSE of each function


xxxxxxxxxx
part_5_simple_Newt =
   0.012570623823620
part_5_simple_run =
   0.019979576146797
   
part_5_mysimple_Newt =
   0.011238281579624
part_5_mysimple_run =
   0.017924222667254

This is the difference between the RMSE of the two different functions


xxxxxxxxxx
%code
part_5_Newt_diff = part_5_mysimple_Newt - part_5_simple_Newt
part_5_run_diff = part_5_mysimple_run - part_5_simple_run
%output
part_5_Newt_diff =
  -0.001332342243995
part_5_run_diff =
  -0.002055353479543

We can see and hear that the compression of these two sound file is not bad. You can tell very clearly that they are compress, but they audio is from a movie made in 1975 so having the audio being this clear is impressive. We can also see that the RMSE of the windowed function is less then the nonwindowed function, witch is to be expected. The audio of the two function is clearly different and is most easily heard during the pauses in audio.

Five bit without windowing Newt sound graph

Part 1 input sound graph

Five bit without windowing Newt sound input

Five bit without windowing Newt sound output

Five bit with windowing Newt sound graph

Part 1 input sound graph

Five bit with windowing Newt sound input

Five bit without windowing Newt sound output

Five bit without windowing runawayrun sound graph

Part 1 input sound graph

Five bit without windowing runawayrun sound input

Five bit without windowing runawayrun sound output

Five bit with windowing runawayrun sound graph

Part 1 input sound graph

Five bit with windowing runawayrun sound input

Five bit without windowing runawayrun sound output

This is the RMSE of each function


xxxxxxxxxx
part_5_5_b_simple_Newt =
   0.007972959068517
part_5_5_b_simple_run =
   0.011557042409937
part_5_5_b_mysimple_Newt =
   0.006890323260522
part_5_5_b_mysimple_run =
   0.009674409889937

This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.


xxxxxxxxxx
%code
part_5_5_b_Newt_simple_diff = part_5_simple_Newt - part_5_5_b_simple_Newt
part_5_5_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_5_b_mysimple_Newt
part_5_5_b_run_simple_diff = part_5_simple_run - part_5_5_b_simple_run
part_5_5_b_run_mysimple_diff = part_5_mysimple_run - part_5_5_b_mysimple_run
part_5_5_b_Newt_diff = part_5_5_b_simple_Newt - part_5_5_b_mysimple_Newt
part_5_5_b_run_diff = part_5_5_b_simple_run - part_5_5_b_mysimple_run
%output
part_5_5_b_Newt_simple_diff =
   0.004597664755103
part_5_5_b_Newt_mysimple_diff =
   0.004347958319102
part_5_5_b_run_simple_diff =
   0.008422533736860
part_5_5_b_run_mysimple_diff =
   0.008249812777317
part_5_5_b_Newt_diff =
   0.001082635807995
part_5_5_b_run_diff =
   0.001882632520000

As expected, the sound quality is notably better with five bit windows instead of four bit windows. The difference amount is not as much as the previous window change. Also the windowing function did not seem to make that much of an impact at all compared to the previous test.

Eight bit without windowing Newt sound graph

Part 1 input sound graph

Eight bit without windowing Newt sound input

Eight bit without windowing Newt sound output

Eight bit with windowing Newt sound graph

Part 1 input sound graph

Eight bit with windowing Newt sound input

Eight bit without windowing Newt sound output

Eight bit without windowing runawayrun sound graph

Part 1 input sound graph

Eight bit without windowing runawayrun sound input

Eight bit without windowing runawayrun sound output

Eight bit with windowing runawayrun sound graph

Part 1 input sound graph

Eight bit with windowing runawayrun sound input

Eight bit without windowing runawayrun sound output

This is the RMSE of each function


xxxxxxxxxx
part_5_8_b_simple_Newt =
   0.001537065085831
part_5_8_b_simple_run =
   0.001595168490493
part_5_8_b_mysimple_Newt =
   0.001491193856767
part_5_8_b_mysimple_run =
   0.001452684156640

This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.


xxxxxxxxxx
%code
part_5_8_b_Newt_simple_diff = part_5_simple_Newt - part_5_8_b_simple_Newt
part_5_8_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_8_b_mysimple_Newt
part_5_8_b_run_simple_diff = part_5_simple_run - part_5_8_b_simple_run
part_5_8_b_run_mysimple_diff = part_5_mysimple_run - part_5_8_b_mysimple_run
part_5_8_b_Newt_diff = part_5_8_b_simple_Newt - part_5_8_b_mysimple_Newt
part_5_8_b_run_diff = part_5_8_b_simple_run - part_5_8_b_mysimple_run
%output
part_5_8_b_Newt_simple_diff =
   0.011033558737789
part_5_8_b_Newt_mysimple_diff =
   0.009747087722857
part_5_8_b_run_simple_diff =
   0.018384407656304
part_5_8_b_run_mysimple_diff =
   0.016471538510613
part_5_8_b_Newt_diff =
     4.587122906396751e-05
part_5_8_b_run_diff =
     1.424843338528824e-04

As expected the eight bit window had a really low RMSE compare to the four bit window by an order of magnitude again. The difference between the window and non windowed function is not as large because there is not as much error to get rid of to start with. Other than that it was as expected.

Two bit without windowing Newt sound graph

Part 1 input sound graph

Two bit without windowing Newt sound input

Two bit without windowing Newt sound output

Two bit with windowing Newt sound graph

Part 1 input sound graph

Two bit with windowing Newt sound input

Two bit without windowing Newt sound output

Two bit without windowing runawayrun sound graph

Part 1 input sound graph

Two bit without windowing runawayrun sound input

Two bit without windowing runawayrun sound output

Two bit with windowing runawayrun sound graph

Part 1 input sound graph

Two bit with windowing runawayrun sound input

Two bit without windowing runawayrun sound output

This is the RMSE of each function


xxxxxxxxxx
part_5_2_b_simple_Newt =
   0.031724613944732
part_5_2_b_simple_run =
   0.058912805988006
part_5_2_b_mysimple_Newt =
   0.031340472077966
part_5_2_b_mysimple_run =
   0.058427901101464

This is the difference between the RMSE of the two different functions plus the difference between the four bit and five bit window functions.


xxxxxxxxxx
%code
part_5_2_b_Newt_simple_diff = part_5_simple_Newt - part_5_2_b_simple_Newt
part_5_2_b_Newt_mysimple_diff = part_5_mysimple_Newt - part_5_2_b_mysimple_Newt
part_5_2_b_run_simple_diff = part_5_simple_run - part_5_2_b_simple_run
part_5_2_b_run_mysimple_diff = part_5_mysimple_run - part_5_2_b_mysimple_run
part_5_2_b_Newt_diff = part_5_2_b_simple_Newt - part_5_2_b_mysimple_Newt
part_5_2_b_run_diff = part_5_2_b_simple_run - part_5_2_b_mysimple_run
%output
part_5_2_b_Newt_simple_diff =
  -0.019153990121112
part_5_2_b_Newt_mysimple_diff =
  -0.020102190498342
part_5_2_b_run_simple_diff =
  -0.038933229841209
part_5_2_b_run_mysimple_diff =
  -0.040503678434210
part_5_2_b_Newt_diff =
     3.841418667655458e-04
part_5_2_b_run_diff =
     4.849048865422714e-04

For this one, there was a massive loss it quality to the point that the sound is barely distinguishable from the original. The RMSE are also massive, but the windowing function does not seem to do much at all for the RMSE. This could be because with how low the quality was there was not much to fix, I would have to look into it more to get a certain answer for it.

This part was more of a proof that this codec can be use to compress more normal sound files rather than pure tone made in matlab. I would have to say this was a success, the next thing I would do would be to take a bunch of audio files and see what window size has the least amount of total quality loss with the most amount of size reduction.