Friday, January 15, 2010

Why I dislike the C Hello World program

The computer is a machine that takes input, processes it and produces some output. This is a great definition in its own, but not without flaws. Many have argued that there might be some programs which do not take input from the user and still produce an output. One such example is the common Hello World program that one will encounter in all major programming language books. All that such a program does is print the words “Hello World” on the screen. This is how a C Hello World program looks like.
#include < stdio.h >
int main()
{
printf("Hello World\n");
return 0;
}

Look at this monster. It is a very complicated program to understand, unlike what a Hello World must be. Worse still, many students face this monster on the very first day of their programming lessons. It took me many years to completely comprehend what is happening in that program and then started hating this program. Let us start from the very beginning.

A program written in the C or C++ language goes through a series of operations when we compile it. At the end of the compilation process, an exe file is created which contains machine level code that can be executed by the underlying microprocessor. To print “Hello World” on the screen, we need to convert this program in to the machine readable format. The microprocessor then must understand that the characters have to be rendered into the frame buffer of the screen to appear on the physical screen.

Even before the compiler can convert this program into a machine readable format, something else needs to be done. The #include is a preprocessor directive. A small program called the preprocessor acts on all the commands starting with a hash (#). The #include command asks the preprocessor to copy the contents of another file into the text of this file. The included file in turn may include more files within itself. A fully expanded version of the C program is much larger than the 6 line program here. The standard IO header file (stdio.h) contains information about how to display characters on the output device and how to take input from the user.
Then comes the magic function main. The main function is always required in any C or C++ program. It is defined as the entry point of a program, but is it really so? If we do not specify the main function, the Visual C++ linker will give the following error: LINK : fatal error LNK1561: entry point must be defined

Therefore, main must be defined. But that does not mean that it is the entry point, or the very first thing that is executed in a program. Look at this second program.


#include < stdio.h >
#include < conio.h >
int func1();

int a = func1();
int b=func1()+100;

void main()
{
printf("a=%d, b=%d\n", a, b);
getch();
}

int func1()
{
printf("Entered func1()\n");
main();
return 10;
}


The main is not the first function to be executed. In fact, another function is executed because its return value is to be applied to a global variable. All global variables are evaluated before the entry point function is called. What this essentially means is that one may write an entire program without ever going inside the main function. Just create a global function and a global variable that takes the return value of this global function and write the entire program inside this global function. After all the processing is done, just call exit() before returning. In such a program, the main function will exist only to please the linker. It will never be called!

Having creased the foreheads of many teachers with the difficulty of explaining this simple program, let us move to the printf function. What does this function do? It prints the characters inside the quotes on screen. But as a side effect, it also returns the number of characters printed. Most real programs never use the return value from printf, but most interview questions do. What is even more intriguing about the printf function is that it is able to take any number of parameters. How do we write such a function? The printf function uses something called as the variable arguments facility present in the C construct. While declaring a function, you may say that there might be any number of parameters to a function. In the definition of the function, one would act on all these parameters. Within the quotes, one may specify certain special character sequences to indicate that a value from some variable will be plugged into at this place. Typical usage is:
printf(“a=%d, b=%d”, a, b);

The number of % sequences inside the quotes specifies the number of variables that will follow. One may specify more or less variables, and depending on the compiler, the additional variables may be ignored and the lesser variable values may be displayed as garbage. The way such a function is defined is by using ellipses (…) inside the function definition.

int printf(char *_Format, ...);

It was not before my final year of engineering, when we wrote a logging function for our project, that I first wrote a function using the ellipses. At that time, just giving … on Google Search yielded a blank page with no results and nothing written below the Google search bar. It has changed since then.

By introducing the printf function on the very first day of programming, we actually introduce a monster whose true value is never appreciated by most students.
Now let us look at the final statement. We are returning a zero. This is in line with the main function having promised that it will return an int in its definition. But who are we retuning to? This brings us to the discussion of how a program is run. After an executable of some kind has been created, a program known as the loader is kicked in. The program is run by the Unix shell or by a DOS terminal; the program will return control to the respective parent. The return value will be accessible before executing any other program.

Still, this Hello World program is most commonly taught as the first program to students. Most people think that it is as simple as it gets. Students are only satisfied if they see some output on the screen. I shall not argue, but I do expect the teachers to one day go back to the hello world program and appreciate the real complexities of these simple 6 lines of C code.
I hope my explanation proves helpful to someone.

No comments: