Jesse Bollinger's GMU webpage

Jesse Bollinger
UTA of CS 262-001 (Spring 2009)

email:
jbollin1@gmu.edu

AIM screen-name:
Jesse0x42

office hours and room:
Friday in S&T II room 365
9:30 am - 10:20 am
and
11:30 pm - 12:20 pm

document first created: 2009-01-31 (year-month-day)

latest revision: 2009-04-01 (year-month-day)

- (2009-02-20)
    "Death by scanf" section modified on 2009-02-20 to include information previously missing, re-read part on using scanf("%*s") for clearing stdin if you read this section before 2009-02-20

- (2009-02-27)
    three exercises posted

- (2009-04-01)
    variable initialization section filled
    one very "easy" excerise added to Additional Exercises (the answer is not 42, sorry)
    some assembly language content added
    OMGdz ZOMBIES!!

Hello and welcome to my GMU website, my name is Jesse Bollinger. I am currently a sophomore at GMU, majoring in mathematics. On a personal note, I am leaning towards becoming either a video game developer or a teacher some day, or maybe even something else entirely or a combination of various things (I haven't really decided).

By the way, all students and faculty of the university have space allocated for them to create a website (20 mb to be precise). If you wish to access your allocated space then access the following link and follow the directions and info as necessary: http://webdev.gmu.edu/Main_Page . Unfornunately these instructions are missing a step. If you want other people to be able to view your website then you must add execute permission to your home folder on the mason cluster. To do this, once in your home directory (where you start out), enter the command "cd .." (which means goto parent directory). Then, enter the command "chmod a+x your_user_name", replacing your_user_name with the appropriate text (presumably the same as your email address without the "@gmu.edu" part). Then enter the command "cd ~" to return to your home directory. You should also ensure that your public_html directroy and its contents are set by "chmod 755 file_name".

Anyway, welcome to CS 262 section 001. All students are welcome to contact me via email or AIM at any time. I also plan to hold office hourse before and after class on friday (1 hour before, and 1 hour after). I will also probably attend the lecture. I am always glad to assist you in learning low-level programming and the C language. I also know some C++ (among other languages) so you can also ask me about that too if you want.

Why learn C? Well, consider this: nearly all commerical video games ever created in the modern day were coded in either C or C++ and video games are by far one of the most resource and processor intensive computer programming applications in the world. There's a reason why C and C++ were the choosen languages. C is one of the oldest and most powerful programming languages. Other languages are convienient for less intense applications, but when you need to get down and dirty C or C++ is usually better. If you were to only learn one heavy-duty industry programming language, then C or C++ would be the wisest choice in my opinion. Many modern alternative languages were in fact originally written and compiled in C.

To assist you in your studies to learn the C language a will do a few things. First I will be available for direct assistance via email, AIM, and office hours. Second I will post various additional exercises, practice, advice, and explanations as we go along. I will also mention topics that may interest you beyond the scope of this semester as well. To these ends I will place this components below, delimited by section names of "Additional Discussion", "Additional Exercises", and "Above and Beyond". You should be aware that the content in "Above and Beyond" is not required material, but is included for your enlightment so to speak (i.e. the development of superior C skills) and exposure to things beyond the scope of the course. Please feel welcome to ask me for clarifications, to give comments, and to point out any errors or shortcomings in the texts (I am not by any means a "C expert", C is an intricate and detailed language). Error correction is a great way to learn, by the way.

Additional Discussion

Death by scanf:

You are perhaps aware that Dr. Nordstrom mentioned in class that scanf is dangerous and should generally not be used in released programs, but that we will be using it in class anyway and that it is ok in personal programs. Why is scanf dangerous? Well, two common pitfalls with scanf are that it can in certain circumstances cause unexpected or infinite loop behavior when it fails to find the type of data it wants, and that it can go out of bounds on the memory location it has been instructed to send the read data.

When writing console (aka command-line) programs, a new C programmer may be suprised when a loop construct they have created in order for the program to ask for input, such as when writing some sort of console command system for a program (a very common task), causes the program to suddenly spiral out of control when invalid input is given to scanf, perhaps causing an infinite loop or making the program crash, or, perhaps even worse, running amok on its or others data. The programmer then asks, "What's going on? There's nothing here that should be causing an infinite loop?!". How could merely having a scanf statement in a loop cause a loss of control? The reason is simple: when scanf finds that the input string it has read does not match the criteria for which you have asked it to look (such as "%d" or "%f" ) it resets its current location in the input string to where it was before it began reading, whereas for all cases where it successfuly finds the specified data it is left at the end of the input stream (which is empty). The consequence of this is that if erronous input is given to scanf then the next time scanf reads data it will read first the data you already gave it the previous time and then any new data placed on the input stream (which could be nothing). Therefore, when one is reading with "%d", for example, and one inputs a value of "hello" instead of a number then the next call to scanf will read over the string "hello" plus any additional characters manually inserted into the input stream between this call of scanf and the previous. Therefore as "hello" + any other string (including the emptry string "") is never a valid number according to the "%d" format, then scanf will fail this time as well. Therefore, by induction, it will also fail the next time, and the time after that, and the time after that, and so on. Thus the user is no longer able to place input because of the fact that scanf already still has input to read when it has previously failed. Thus we have an infinite loop.

Try this yourself using this small program I have written for demonstrating the effect: death_by_scanf.c. You can type the number zero as input to scanf to leave the looping sequence manually (i.e., in this case, to quit the program early). The program is set to loop no more than 12 times, for safety, so don't worry.

How do we fix this problem? We must clear the input stream before calling scanf if we are using scanf criteria that have the possibility of failing (i.e., "%d" etc). The input stream is known more specifically to the C language as the Standard Input Stream. Within C source code, if you have included libraries which have access to the stream (such as <stdio.h> ), then you can refer to and manipulate the standard input stream directly in your code by the name "stdin". Two methods of clearing stdin (the standard input stream) are as follows: Call fflush with an argument of stdin, as in fflush( stdin ). Or, alternativly, when you know that there's left-over input in stdin you can call scanf with an argument of "%*s", as in scanf( "%*s" ). The scanf criteria "%*s" essentially means read all characters in the input stream (without performing any kind of criteria) but don't assign the read data it to any storage location (i.e. eat all current input data). However, remember that the scanf method will wait for some input to be given and enter pressed if there isn't already input in stdin, therefore using it as a clearing mechanism without causing it to wait for enter requires that you know that there's input already in stdin. One way to know might be to check whether the scanf read count (the value the function returns) was zero previously, impling that it finished waiting for enter but didn't find the desired input and thus grabbed erroneous input and left it in stdin. Either method of clearing stdin will work, but I've heard that fflush( stdin ) is not required to have that effect by the C standard. I'm not sure if that's true, but I've heard it said a few times. It's never failed to work for me. If it does fail just use the other method. There are very likely other ways to clear streams as well (I might look into it more if I get around to it, but in the mean time feel free to research other ways of doing it yourself if you wish).

There's a reason why scanf jumps back to where it was before it started reading when it fails to find matching data. It was designed that way so that the erronous input data would remain in the input stream so that you could save or use it in some way, then remove or otherwise appropriately alter it, and then subsequently make the proper adjustments in the flow of the program. It's a way of preventing information loss, to put it simply. There are cases when this behavior is useful, but it can cause a lot of trouble if the programmer isn't aware that it's going on behind the scenes.

As for the scanf's ability to go out of bounds and overwrite data that doesn't actually belong to the location it's been told to store data to, it may be best to discuss that after we start doing arrays and pointers.

Review of Control Structure Independance and Dependance:

It is highly likely that you already know this, but it's important enough that reviewing it for clarity would be wise.

A group of conditional statements can be either independant or dependent. It is a common error of new programmers to not correctly understand exactly how this independence and dependence works.

For example, suppose a programmer is writing a program that takes a student's relative score for a class (their percentage grade) and determines what letter grade that score falls under. In order to make the determination one can use a series of conditions checks. The implementation could use either an independent or a dependant structure (or a mix of both) to accomplish this. However, the construct that uses independent structure would in this case require roughly twice as many comparisons in each condition and also would perform all condition checks regardless of whether it already found ("knows") the correct answer. Here's two C programs that show the two ways: independent_control.c and dependent_control.c . Note that for this program we are assuming that grades above 100% and below 0% are valid grades, and are A and F respectivly. When you run it, give the program an integer value for the percent grade. I have neglected scanf safety to make it more concise and easier to read.

Try both of the programs. For example, give them the number 72 (which should come out as a letter grade of 'C') and compare the number of if statements entered and comparisons performed between the two different methods of implementation (the two different programs). Notice that the independent construct always unnecessarily performs all comparisons and also does more of them than the dependent method. The moral of the story is that not understanding the logical indepedence/dependence nature of what you are trying to program has consequences. In this case the consequences are fairly light, but imagine what might happen in a much larger or more complex construct. Incorrect dependency relations in code can cause some very subtle and difficult (and sometimes catastrophic) bugs in certain circumstances, such as when the performance of one condition also sets the state requirement of another or when conditions are otherwise not logically disjoint for example.

The key to having an excellent understanding of using dependent structure most effectively is to understand that in an if-elseif-else structure once the previous if or else-if in the chain has been evaluated false and control passed to the next condition block then we know for sure that the logical negation of the previous condition can (and probably should) be assumed as a given. To visualize how the dependent control structure works in our case of the letter grade determination imagine a ruler-like bar representing the range in which a numeric grade could fall. Now, imagine that as each comparison is made moving down the control structure then each area of the range that was just checked and found false for our value is greyed out. Notice that as the comparisons proceed it is like the previous failed cases have been "chopped off" from the possible state of the program, so we dont' have to worry about them. For an analogy, it is akin to how one might cut a vegetable (a cucumber or carrot for example) into slices like a chef. As you slice each new interval of the vegetable off then you no longer concern yourself with the previously sliced chunks as you cut the next and you only continue cutting slices until you have what you need for what you are doing.

Although in the case of the letter grade programs discussed above the dependent control structure is better it is not always true. The choice of independent or dependent contructs in a program is almost entirely dependent on the context in which those constructs occur. My advice is that you should program in such a away as to describe the fundamental nature of what you are trying to program. It is generally unwise to use only one way or the other. Aim for your program logic to be a natural logical representation of what you are describing (when possible). Natural logic has a tendency to be more robust and powerful than bizzare or eccentric logic. Do not over optimize your code before you know what you are doing. Think about what you are doing before you do it, and perhaps also visualize it if possible, and you will find that you generally produce better code than otherwise.

As an additional matter of interest, you may have noticed that in the source code for dependent_control.c that within the if and else-if blocks I counted the number of if statements and comparisons by arbitrarly assigning the associated variables to the appropriate count value. This was not just me being lazy or redundant or non-dynamic in my code, it is impossible to count it any other way in the if else-if else construct of the C language. Think about the way if else-if else control flows. If I had tried to do an actual increment of the variables inside the if or else-if blocks what would have happend? Think about it, there's something subtle going on. In fact, what I'm refering to here is the fact that C (and many other high level langauges) do not allow you to access the space in which you can place intermediate unconditional code between the dependent conditional blocks. This unused space can, however, be accessed and used manually with the help of assembly language (and also can be accessed to varying degrees in certain other languages). If I had somehow accessed this intermediate space in my program then I would have been able to do a true counting increment for the if statement and comparison count variables, but alas.

I plan to show you how to utilize the intermediate code space between dependent conditional blocks in the section about assembly language that I've placed on this website, once I get around to it.

Initialization of Variables or the Lack Thereof:

When a variable is declared its intial value will occur differently depending on the scope in which it is defined.

If a variable is declared in a global context it will begin with a value of zero. On the other hand, if it is defined locally it will contain whatever random garbage happened to be lying around in the memory location it was given by the system or compiler. It doesn't assign it to zero because it might be inefficient to do so if the variable value is going to immediately change to something else. Modern optimizers can generally remove such extraneous assignments. Unless the extra assignments are in a loop contruct or in an extremely frequently used part of a program then even if it isn't optimized out then it will probably not effect performance very much.

Accidently using a garbagge value from an uninitialized variable can have some unintended and unexpected consequence. You should stay alert of uninitialized variables when programming in C or C++.

Additional Exercises

swap function:

To test your understanding of pointers, write a small function to swap the values of two integer variables.

random password:

Ever have trouble thinking of a good password? Think your password could be guessed eventually if someone tried? Why not make the computer generate an unbiased random password for you?

Write a function named random_password that takes a pointer to char for where the password text will be stored (i.e. in a char array), an int for the size of the char array where the password will be stored, and an int for what length password the user wants. The function should also return an int indicating whether the function succeded or failed. Use the number 1 to indicate success and 0 to indicate failure (or whatever other error-code scheme you want). Don't forget to null terminate the password string being stored, otherwise it may not print correctly if you try to interpret it as a string (append '\0 to the end of the password character array). If the size of the array and the desired length of the password conflict then the function should return failure. Other invalid arguments such as negative numbers for size should also return failure. The function should not cause segmentation faults or errors in any cases.

Make the password consist of random characters from the alphabetic and numeric characters of the ascii code (i.e. 0-9, A-Z, and a-z).

You can use the <stdlib.h>'s rand function to generate the random numbers you need, or you can find another more advanced library if you wish. You might find the following web page useful: http://www.cplusplus.com/reference/clibrary/cstdlib/ (although you should be aware that some of the content refers to C++, which is not the same).

If you're clever you should be able to restrict your generated characters to appropriate ascii ranges without even having to write the characters in their numeric forms. Hint: a literal character (a single character enclosed by single quotes) is already a number to the compiler's eyes.

lowercase and uppercase:

Write one or more functions that can be used to alter all the alphabetic characters in a string so that they are either all lowercase or all uppercase. Non-alphabetic characters should be left as is.

Feel free to think of other transformations you might apply to strings, and code functions for them if you wish.

the meaning of life:

Write a simple C function that returns the answer to what the meaning of life is. The answer must include, "Why are we here?", "How I can enslave Santa Clause?", and answers to every intricacy of everything else too of course. I demand the answer by thursday. No excuses.

Above and Beyond:

Assembly Language:

Assembly language is often described as a human readable version of machine code. Assembly language has the characteristic that most of its commands are (usually) one to one with the actual ISA of the machine the assembly code is being written for (in our case this is probably x86). The ISA of a computer is the actual real set of commands made available by the CPU. ISA stands for "Intruction Set Architecture". You probably already know, but just to review CPU stands for "Central Processing Unit". The ISA is electronically engineered directly into the CPU. The CPU is the brains of the computer, without it the computer could not make decisions or direct processes.

Because assembly is closely tied to the ISA of a particular machine, it is generally not particularly portable. Keep this in mind when deciding whether or not you want to use assembly. If you want your program to work on a large variety of systems then you should probably avoid assembly unless it becomes absolutely necessary.

There are two currently dominant syntaxs for assembly language. One is AT&T syntax, which is the one that GCC uses. The other is Intel syntax, which is used by Microsoft Visual Studio and by some others.

So, you probably want to see what assembly actually looks like now. The short answer is "usually not very pretty". Har har har...

Anyway, the real answer is this: Simply compile any C program with GCC using the command "gcc -S source_file_name.c", which will generate a file named "source_file_name.s". Open this new ".s" file with a plain text editor and what you see is the AT&T syntax GAS Assembly Language version of the source code. This is what the compiler actually converts your code into and then compiles for the machine architecture you're compiling on or for.

When I used the assembly source output option for the following C source file assembly_test.c, the compiler gave me back this file: assembly.s.

Take a peek and notice how even an extremely simple program such as this assembly_test.c is actually made up of much finer grained commands. In fact, even things you may think of as fundamental, such as if statements, comparisons, and variable assignments are generally made up of multiple assembly commands. See if you can see what's going on in the assembly, try to match particular components with what you know is happening in the originating C code.

You can compile an assembly source file with gcc by using a command like the following: "gcc -o executable_name.exe assembly_source_code.s"

You can also do what's called inline assembly, although it tends to vary from compiler to compiler. For DJGPP some info can be found on the following website: http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html. The website also shows the difference in syntax between AT&T and Intel syntax for some commands.

If you are curious about Intel syntax assembly, then one option is you can try expierementing with a free assembler called "flat assembler". The main site is: http://flatassembler.net/.

The two syntaxs have different operand orders.

AT&T syntax usually goes like this:

command source, destination

Wheras Intel syntax usually goes like this:

command destination, source

OpenGL:

As a hacker who broke into a traffic sign system relatively recently once said:

////////////////////////
// ZOMBIES AHEAD!!! //
//   EXPECT DELAYS! //
////////////////////////

Except there's not really any zombies expected in this OpenGL section... Ah if only there were... Alas...