Jesse Bollinger
UTA of CS 262-001 (Spring 2009)
email:
jbollin1@gmu.edu
AIM screen-name:
Jesse0x42
office hours and room:
Friday in S&T II room 365
9:30 am - 10:20 am
and
11:30 pm - 12:20 pm
document first
created: 2009-01-31 (year-month-day)
latest revision: 2009-04-01 (year-month-day)
- (2009-02-20)
"Death by scanf" section modified on
2009-02-20 to include information previously missing, re-read
part on using scanf("%*s") for clearing stdin if you read this section
before 2009-02-20
- (2009-02-27)
three
exercises posted
- (2009-04-01)
variable initialization section filled
one very "easy" excerise added to Additional
Exercises (the answer is not 42, sorry)
some assembly language content added
OMGdz ZOMBIES!!
Hello and welcome to my GMU website, my name is Jesse Bollinger. I am
currently a sophomore at GMU, majoring in mathematics. On a personal
note, I am leaning
towards becoming either a video game developer or a teacher some day,
or maybe even something else entirely or a combination of various
things (I haven't really decided).
By the way, all students and faculty of the university have space
allocated for them to create a website (20 mb to be precise). If you
wish to access your allocated space then access the following link and
follow the directions and info as necessary: http://webdev.gmu.edu/Main_Page
. Unfornunately these instructions are missing a step. If you want
other people to be able to view your website then you must add execute
permission to your home folder on the mason cluster. To do this, once
in your home directory (where you start out), enter the command "cd .."
(which means goto parent directory). Then, enter the command "chmod a+x
your_user_name", replacing your_user_name with the appropriate text
(presumably the same as your email address without the "@gmu.edu"
part). Then enter the command "cd ~" to return to your home directory.
You should also ensure that your public_html directroy and its contents
are set by "chmod 755 file_name".
Anyway, welcome to CS 262 section 001. All students are welcome to
contact me via email or AIM at any time. I also plan to hold office
hourse before and after class on friday (1 hour before, and 1 hour
after). I will also probably attend the lecture.
I am always glad to assist you in learning low-level programming and
the C language. I also know some C++ (among other languages) so you can
also ask me about that
too if you want.
Why learn C? Well, consider this: nearly all commerical video games
ever created in the modern day were coded in either C or C++ and video
games are by far one of the most resource and processor intensive
computer programming applications in the world. There's a reason why C
and C++ were the choosen languages. C is one of the oldest and most
powerful programming languages. Other languages are convienient for
less intense applications, but when you need to get down and dirty C or
C++ is usually better. If you were to only learn one heavy-duty
industry programming language, then C or C++ would be the wisest
choice in my opinion. Many modern alternative languages were in fact
originally written and compiled in C.
To assist you in your studies to learn the C language a will do a few
things. First I will be available for direct assistance via email, AIM,
and office hours. Second I will post various additional exercises,
practice, advice, and explanations as we go along. I will also mention
topics that may interest you beyond the scope of this semester as well.
To these ends I will place this components below, delimited by section
names of "Additional Discussion", "Additional Exercises", and "Above
and
Beyond". You should be aware that the content in "Above and Beyond" is
not required material, but is included for your enlightment so
to speak (i.e. the development of superior C skills) and exposure to
things beyond the scope of the course. Please feel
welcome to ask me for clarifications, to give comments, and to
point out any errors or shortcomings in the texts (I am not by any
means a "C expert", C is an intricate and detailed language). Error
correction is a great way to learn, by the way.
Additional
Discussion
Death by scanf:
You are perhaps aware that Dr. Nordstrom mentioned in class that scanf
is dangerous and should generally not be used in released programs, but
that we will be using it in class anyway and that it is ok in personal
programs. Why is scanf dangerous? Well, two common pitfalls with scanf
are that it can in certain circumstances cause unexpected or infinite
loop behavior when it fails to find the type of data it wants, and that
it can go out of bounds on the memory location it has been instructed
to send the read data.
When writing console (aka command-line) programs, a new C programmer
may be suprised when a loop construct they have created in order for
the program to ask for input, such as when writing some sort of console
command system for a program (a very common task), causes the program
to suddenly spiral out of control when invalid input is given to scanf,
perhaps causing an infinite loop or making the program crash, or,
perhaps even worse, running amok on its or others data. The programmer
then asks, "What's going on? There's nothing here that should be
causing an infinite loop?!". How could merely having a scanf statement
in a loop cause a loss of control? The reason is simple: when scanf
finds that the input string it has read does not match the criteria for
which you have asked it to look (such as "%d" or "%f" ) it resets its
current location in the input string to where it was before it began reading, whereas
for all cases where it successfuly finds the specified data it is left
at the end of the input stream (which is empty). The consequence of
this is that if erronous input is given to scanf then the next time
scanf reads data it will read first the data you already gave it the
previous time and then any new data placed on the input stream (which
could be nothing). Therefore, when one is reading with "%d", for
example, and one inputs a value of "hello" instead of a number then the
next call to scanf will read over the string "hello" plus any
additional characters manually inserted into the input stream between
this call of scanf and the previous. Therefore as "hello" + any other
string (including the emptry string "") is never a valid number
according to the "%d" format, then scanf will fail this time as well.
Therefore, by induction, it will also fail the next time, and the time
after that, and the time after that, and so on. Thus the user is no
longer able to place input because of the fact that scanf already still
has input to read when it has previously failed. Thus we have an
infinite loop.
Try this yourself using this small program I have written for
demonstrating the effect: death_by_scanf.c.
You can type the number zero as
input to scanf to leave the looping sequence manually (i.e., in this
case, to quit the program early). The program is set to loop no more
than 12 times, for safety, so don't worry.
How do we fix this problem? We must clear the input stream before
calling scanf if we are using scanf criteria that have the possibility
of failing (i.e., "%d" etc). The input stream is known more
specifically to the C language as the Standard Input Stream. Within C
source code, if you have included libraries which have access to the
stream (such as <stdio.h> ), then you can refer to and manipulate
the standard input stream directly in your code by the name "stdin".
Two methods of clearing stdin (the standard input stream) are as
follows: Call fflush with an argument of stdin, as in fflush( stdin ). Or, alternativly,
when you know that there's left-over input in stdin you can
call scanf with an argument of "%*s", as in scanf( "%*s" ). The scanf criteria
"%*s" essentially means read all characters in the input stream
(without performing any kind of criteria) but don't assign the read
data it to any storage location (i.e. eat all current input data).
However, remember that the scanf method will wait for some input to be
given and enter pressed if there isn't already input in stdin,
therefore using it as a clearing mechanism without causing it to wait
for enter requires that you know that there's input already in stdin.
One way to know might be to check whether the scanf read count (the
value the function returns) was zero previously, impling that it
finished waiting for enter but didn't find the desired input and thus
grabbed erroneous input and left it in stdin. Either method of clearing
stdin will work, but I've heard that fflush( stdin ) is not
required to have that effect by the C standard. I'm not sure if that's
true, but I've heard it said a few times. It's never failed to work for
me. If it does fail just use the other method. There are very likely
other ways to clear streams as well (I might look into it more if I get
around to it, but in the mean time feel free to research other ways of
doing it yourself if you wish).
There's a reason why scanf jumps back to where it was before it started
reading when it fails to find matching data. It was designed that way
so that the erronous input data would remain in the input stream so
that you could save or use it in some way, then remove or otherwise
appropriately alter
it, and then subsequently make the proper adjustments in the flow of
the program. It's a way of preventing information loss, to put it
simply. There are cases when this behavior is useful, but it can cause
a lot of trouble if the programmer isn't aware that it's going on
behind the scenes.
As for the scanf's ability to go out of bounds and overwrite data that
doesn't actually belong to the location it's been told to store data
to, it may be best to discuss that after we start doing arrays and
pointers.
Review of Control
Structure Independance and Dependance:
It is highly likely that you already know this, but it's important
enough that reviewing it for clarity would be wise.
A group of conditional statements can be either independant or
dependent. It is a common error of new programmers to not correctly
understand exactly how this independence and dependence works.
For example, suppose a programmer is writing a program that takes
a student's relative score for a class (their percentage grade) and
determines what letter grade that score falls under. In order to make
the determination one can use a series of conditions checks. The
implementation could use either an independent or a dependant structure
(or a mix of both) to accomplish this. However, the construct that uses
independent structure would in this case require roughly twice as many
comparisons in each condition and also would perform all condition
checks regardless of whether it already found ("knows") the correct
answer. Here's two C programs that show the two ways: independent_control.c and dependent_control.c . Note that for
this program we are assuming that grades above 100% and below 0% are
valid grades, and are A and F respectivly. When you run it, give the
program an integer value for the percent grade. I have neglected scanf
safety to make it more concise and easier to read.
Try both of the programs. For example, give them the number 72 (which
should come out as a letter grade of 'C') and compare the number of if
statements entered and comparisons performed between the two different
methods of implementation (the two different programs). Notice that the
independent construct always unnecessarily performs all comparisons and
also does more of them than the dependent method. The moral of the
story is that not understanding the logical indepedence/dependence
nature of what you are trying to program has consequences. In this case
the consequences are fairly light, but imagine what might happen in a
much larger or more complex construct. Incorrect dependency relations
in code can cause some very subtle and difficult (and sometimes
catastrophic) bugs in certain circumstances, such as when the
performance of one condition also sets the state requirement of another
or when conditions are otherwise not logically disjoint for example.
The key to having an excellent understanding of using dependent
structure most effectively is to understand that in an if-elseif-else
structure once the previous if or else-if in the chain has been
evaluated false and control passed to the next condition block then we
know for sure that the logical negation of the previous condition can
(and probably should) be assumed as a given. To visualize how the
dependent control structure works in our case of the letter grade
determination imagine a ruler-like bar representing the range in which
a numeric grade could fall. Now, imagine that as each comparison is
made moving down the control structure then each area of the range that
was just checked and found false for our value is greyed out. Notice
that as the comparisons proceed it is like the previous failed cases
have been "chopped off" from the possible state of the program, so we
dont' have to worry about them. For an analogy, it is akin to how one
might cut a vegetable (a cucumber or carrot for example) into slices
like a chef. As you slice each new interval of the vegetable off then
you no longer concern yourself with the previously sliced chunks as you
cut the next and you only continue cutting slices until you have what
you need for what you are doing.
Although in the case of the letter grade programs discussed above the
dependent control structure is better it is not always true. The choice
of independent or dependent contructs in a program is almost entirely
dependent on the context in which those constructs occur. My advice is
that you should program in such a away as to describe the fundamental
nature of what you are trying to program. It is generally unwise to use
only one way or the other. Aim for your program logic to be a natural
logical representation of what you are describing (when possible).
Natural logic has a tendency to be more robust and powerful than
bizzare or eccentric logic. Do not over optimize your code before you
know what you are doing. Think about what you are doing before you do
it, and perhaps also visualize it if possible, and you will find that
you generally produce better code than otherwise.
As an additional matter of interest, you may have noticed that in the
source code for dependent_control.c that within the if and else-if
blocks I counted the number of if statements and comparisons by
arbitrarly assigning the associated variables to the appropriate count
value. This was not just me being lazy or redundant or non-dynamic in
my code, it is impossible to
count it any other way in the if else-if else construct of the C
language. Think about the way if else-if else control flows. If I had
tried to do an actual increment of the variables inside the if or
else-if blocks what would have happend? Think about it, there's
something subtle going on. In fact, what I'm refering to here is the
fact that C (and many other high level langauges) do not allow you to
access the space in which you can place intermediate unconditional code
between the dependent conditional blocks. This unused space can,
however, be accessed and used manually with the help of assembly
language (and also can be accessed to varying degrees in certain other
languages). If I had somehow accessed this intermediate space in my
program then I would have been able to do a true counting increment for
the if statement and comparison count variables, but alas.
I plan to show you how to utilize the intermediate code space between
dependent
conditional blocks in the section about assembly language that I've
placed on this website, once I get around to it.
Initialization of
Variables or the Lack Thereof:
When a variable is declared its intial value will occur differently
depending on the scope in which it is defined.
If a variable is declared in a global context it will begin with a
value of zero. On the other hand, if it is defined locally it will
contain whatever random garbage happened to be lying around in the
memory location it was given by the system or compiler. It doesn't
assign it to zero because it might be inefficient to do so if the
variable value is going to immediately change to something else. Modern
optimizers can generally remove such extraneous assignments. Unless the
extra assignments are in a loop contruct or in an extremely frequently
used part of a program then even if it isn't optimized out then it will
probably not effect performance very much.
Accidently using a garbagge value from an uninitialized variable can
have some unintended and unexpected consequence. You should stay alert
of uninitialized variables when programming in C or C++.
Additional Exercises
swap function:
To test your understanding of pointers, write a small function to swap
the values of two integer variables.
random password:
Ever have trouble thinking of a good password? Think your password
could be guessed eventually if someone tried? Why not make the computer
generate an unbiased random password for you?
Write a function named random_password
that takes a pointer to char for where the password text will be stored
(i.e. in a char array), an int for the size of the char array where the
password will be stored, and an int for what length password the user
wants. The function should also return an int indicating whether the
function succeded or failed. Use the number 1 to indicate success and 0
to indicate failure (or whatever other error-code scheme you want).
Don't forget to null terminate the password string being stored,
otherwise it may not print correctly if you try to interpret it as a
string (append '\0 to the end of the password character array). If the
size of the array and the desired length of the password conflict then
the function should return failure. Other invalid arguments such as
negative numbers for size should also return failure. The function
should not cause segmentation faults or errors in any cases.
Make the password consist of random characters from the alphabetic and
numeric characters of the ascii code (i.e. 0-9, A-Z, and a-z).
You can use the <stdlib.h>'s
rand function to generate the
random numbers you need, or you can find another more advanced library
if you wish. You might find the following web page useful: http://www.cplusplus.com/reference/clibrary/cstdlib/
(although you should be aware that some of the content refers to C++,
which is not the same).
If you're clever you should be able to restrict your generated
characters to appropriate ascii ranges without even having to write the
characters in their numeric forms. Hint: a literal character (a single
character enclosed by single quotes) is
already a number to the compiler's eyes.
lowercase and
uppercase:
Write one or more functions that can be used to alter all the
alphabetic characters in a string so that they are either all lowercase
or all uppercase. Non-alphabetic characters should be left as is.
Feel free to think of other transformations you might apply to strings,
and code functions for them if you wish.
the meaning
of life:
Write a simple C function that returns the answer to what the meaning
of life is. The answer must include, "Why are we here?", "How I can
enslave Santa Clause?", and answers to every intricacy of everything
else too of course. I demand the answer by thursday. No excuses.
Above and
Beyond:
Assembly
Language:
Assembly language is often described as a human readable version of
machine code. Assembly language has the characteristic that most of its
commands are (usually) one to one with the actual ISA of the machine
the assembly code is being written for (in our case this is probably
x86). The ISA of a computer is the actual real set of commands made
available by the CPU. ISA stands for "Intruction Set Architecture". You
probably already know, but just to review CPU stands for "Central
Processing Unit". The ISA is electronically engineered directly into
the CPU. The CPU is the brains of the computer, without it the computer
could not make decisions or direct processes.
Because assembly is closely tied to the ISA of a particular machine, it
is generally not particularly portable. Keep this in mind when deciding
whether or not you want to use assembly. If you want your program to
work on a large variety of systems then you should probably avoid
assembly unless it becomes absolutely necessary.
There are two currently dominant syntaxs for assembly language. One is
AT&T syntax, which is the one that GCC uses. The other is Intel
syntax, which is used by Microsoft Visual Studio and by some others.
So, you probably want to see what assembly actually looks like now. The
short answer is "usually not very pretty". Har har har...
Anyway, the real answer is this: Simply compile any C program with GCC
using the command "gcc -S source_file_name.c", which will generate a
file named "source_file_name.s". Open this new ".s" file with a plain
text editor and what you see is the AT&T syntax GAS Assembly
Language version of the source code. This is what the compiler actually
converts your code into and then compiles for the machine architecture
you're compiling on or for.
When I used the assembly source output option for the following C
source file assembly_test.c, the
compiler gave me back this file: assembly.s.
Take a peek and notice how even an extremely simple program such as
this assembly_test.c is actually made up of much finer grained
commands. In fact, even things you may think of as fundamental, such as
if statements, comparisons, and variable assignments are generally made
up of multiple assembly commands. See if you can see what's going on in
the assembly, try to match particular components with what you know is
happening in the originating C code.
You can compile an assembly source file with gcc by using a command
like the following: "gcc -o executable_name.exe assembly_source_code.s"
You can also do what's called inline assembly, although it tends to
vary from compiler to compiler. For DJGPP some info can be found on the
following website: http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html.
The website also shows the difference in syntax between AT&T and
Intel syntax for some commands.
If you are curious about Intel syntax assembly, then one option is you
can try expierementing with a free assembler called "flat assembler".
The main site is: http://flatassembler.net/.
The two syntaxs have different operand orders.
AT&T syntax usually goes like this:
command source, destination
Wheras Intel syntax usually goes like this:
command destination, source
OpenGL:
As a hacker who broke into a traffic sign system relatively recently
once said:
////////////////////////
// ZOMBIES AHEAD!!! //
// EXPECT DELAYS!
//
////////////////////////
Except there's not really any zombies expected in this OpenGL
section... Ah if only there were... Alas...