Sunday, January 17, 2010

How to find a mini-project

Every now and then, just as we near the end of a course we are entrusted with a mini project to be presented before the course finally comes to an end. Except to some entrenched souls who find this a relief from the monotone of assignments, most of us are immediately faced with the monstrous task of finding an appropriate project title and then see it to completion. A great many projects never get finished. A great many never really get started.

If we go back a few years and look into our mathematics text books, we will find that after every few chapters, a set of miscellaneous questions has been appended which requires us to apply the principles applied in the previous chapters. If your mathematics text books did not have this feature then probably those pages were glued together because some lawmakers felt that they were too much burden for the delicate minds. Jokes apart, these miscellaneous questions have a lot in common with the mini project that is expected of you today.

You have gone through a fair enough portion of the course and it is time to apply the knowledge that you have gained so far. I am not here to give you a list of projects to do. But I am going to provide you with ideas. I hope that you will think of similar projects on your own and not just copy my ideas.

A good feature of using TurboC in DOS is that you can create cool graphics by using the graphics.h header file. In actuality, a lot of work is being done for you by the header file, but we can remain blissfully ignorant about it. For someone who wants to know how the graphics header file accomplishes this feat, I have a project to offer. There is a great book by Peter Abel describing the DOS operating system interrupts and programming in the assembly language. With aid of this book, one may, inside TurboC create new functions to initialize the graphical screen, display pixels and change colors. Create a library of functions to draw simple shapes such as line, circle and rectangle using the most basic DOS interrupts and asm commands inside TurboC. Ask your friends to use your library instead of the graphics.h file. Tempt them with some additional features such as the ability to draw and fill tilted rectangles. This project will require knowledge of the algorithms to draw lines, circles etc. but they may be directly obtained from someone who has done the computer graphics course or from the internet. The intent of this project is to get the understanding that underlying every library is a set of basic constructs.

Someone has said that a good program is one which has a lot of inline documentation. Okay, there will be some argument about whether someone said it or nobody did, but now that I have, Google will surely find it. But how embarrassing comments would be to its author if he inadvertently slips in spelling mistakes? “But since the advent of the computers, who remembers spelling?” you want to ask. True. It is not necessary during writing your report because your word processor will do most of the work for you. But what about writing comments in code? Fortunately, editors such as Eclipse now do spell check in comments. What a God-send! But wait a minute. We don’t use Eclipse for writing our programs because our teachers feel that IDEs write too much code for you and leave less to learn. Too bad, except that this rationale of your teachers has created a great possibility for your project topic. Write a spell checker for comments in C or C++ programs. First of all, fish out a good enough dictionary from the web. An extremely complete dictionary is not required because most programmers do not use artistic language in their comments (if any one does, please remind them that the comments are for others to understand and it would serve its purpose better if written in plain English!) So just fish out a list of about a 100 thousand most commonly used words and store it in a file. Most such lists are sorted alphabetically. Now write a program to filter out comments from C/C++ programs and then check the spellings of all words in the comments. Simple, isn’t it? But there is a catch. Most programmers would freely make use of variables inside their comments to explain what they are doing. Since most variables are not proper English words, they would be flagged by your spell checker as incorrect and would irritate the user. So, your program must also store all declared variables and keywords and ignore them when applying the spell check. If this complexity was not too much for you, you will complete the program and observe something bitter. This program takes too long to execute even on small input texts. The reason is that you have stored the list of valid words on the disk and have to go through the entire list for finding each word. This computational effort can be reduced if you use something known as a hash-map. A simple way to do it would be to divide the list into 26 separate lists each for a symbol in the English alphabet and then search only the relevant list. Even inside a list one could use binary search to quickly converge on the correct word, or to report that there is a spelling mistake. The sharp minds amongst you will notice that dividing the list into 26 lists each for a symbol in the alphabet is not a very good idea because there are differing number of words starting from each letter. Some other, better hashing function can be devised, but I will leave you with that.

A common project done by many students is encryption of plain text. Substitution cipher is the most common method used. Professional cryptography requires that the algorithm of the cipher must be public, but these students writing the substitution cipher put too much trust in their formula of substitution and swear never to divulge it. Lo and behold! There are chits passing under your nose and you cannot understand a word of it. You cannot tolerate this and so you want to write a program that breaks this code. Remember the sentences passing under your nose are only small ones. Here is what you do. You first fetch a full word list from the web. Then you write a program that substitutes each letter in the message with another letter and check if the words appear in the word list. When a match is found, the program returns the association and a possible formula. Of course you know that if the messages become bigger or if the formula is changed, your program will take longer to run and figure out what the formula is. Also, if the formula changes per word / per letter, you are out of luck. Try running your program on more capable computers such as servers and you will see that the program returns results much faster than on a simple desktop. (If it does not happen so, do ask your system administrators to change the servers!) This kind of attack on ciphers is called brute force attack and can also be launched on standard encryption algorithms like AES. These algorithms use a key and a program that can guess the key can decipher the code. Keys are generally so long that it will take most computers a very long time to figure out the key and documents are encrypted with the appropriate length of key depending on their importance.

Your friend has keen interest in cryptography and one of the features in crypto that he is particularly intrigued by is the ability to storing and operating on big integers. These are numbers 128 or 256 bit long (or longer) and are to be stored accurately till the units place. Java provides an inbuilt package for operating on such large numbers, but you do not know if C allows it too. Instead of trying to find it out on the internet, your friend has asked you to take up the task and create a construct for the same in C / C++. Therefore you have taken it up to yourself to create a class called BigInt which will allow storage and operations on extremely large integers. You plan to store the integer in a linked list of integers and will support only the basic operations like addition, subtraction, multiplication and division. Your friend begged you, so you will also support modulo function. Enjoy! Is there some way of finding out if this big number is a prime?

A friend of yours is in aeronautical engineering and has called upon your computer expertise to solve a persistent problem. They frequently use integrals with limits and have to use calculators to solve them. But these problems are really only a part of a larger scheme of things and hence it would be best if they had a library that solved the integration problems. Think of some way to represent arbitrary integration problems for input and then solve those using numerical methods. You might consider keeping the log book as a look up table in the computer’s memory.

Confident that the computer can solve any problem, you have set out to see your future. For starters you plan to create a computer program that will predict the mass, position and velocity of all particles in a closed box at some future point in time given their mass, positions and velocity at some earlier time. Make all simplifying assumptions such as the walls of the box are perfectly rigid and the particles inside the box are under standard gravity or no gravity and are perfect spheres and also rigid. Create a program that will input an arbitrary number of particles, their positions and velocities in three dimensions and the time at which you seek to know their positions and velocities. Simulate all collisions with the walls and each other and predict the position at the required time. Be happy that you have predicted the future of these hypothetical particles after you complete the program, but bear in mind that Heisenberg’s uncertainty principle laughs at your simplifying assumptions.

You have learnt how a projectile travels and it can be easily demonstrated in a game. QBasic came with a game called gorilla. Two gorillas stood atop high rise buildings and threw banana skins at each other. It was a two player game and each player was expected to provide a velocity and angle to his gorilla for throwing the banana skin. There existed an element of wind and gravitation to make things more difficult. You may create some form of that game.

If you use Linux for writing C / C++ programs, sprint library may be used for graphics programs. I have only listed some projects that came to my mind of-hand. However, if you look around, or look in your own text books of other courses, you should be able to find dozens of interesting projects to do. If you belong to computer science, a project done for some other engineering field will be a humbling experience where you will understand that computers are really only aids for supporting the variety of branches of science and engineering. If you are from other fields, you will feel proud that you have made life easier for many of your friends by introducing a simple tool for everyone to use. Enjoy your project. Nothing prepares one for the real world as projects do. To see the projects that I undertook during my BE and MTech, please visit my webpage.

Friday, January 15, 2010

Why friends are trusted more than children

I hope it is not true in reality. But in the world of C++ programming, it is true. Friend classes may access private members of a class whereas, private members are out of scope even for child classes (or classes that inherit a base class.)

Consider the following program:
class Father
{
private:
 int annualIncome;
public:
 int noOfKeys;

 void subIncome(int n)
 {
  annualIncome -=n;
 }
 friend class Fathers_Friend;
 //friend class Son; //Flag Z: In some rare circumstances, the inheriting class can also be a friend class. 
 //In that case, public members are automatically inherited and private are accessible because they are friends.
};

class Fathers_Friend
{
private:
 Father myFriend;
 int annualTax;
public:
 int noOfShoes;
 void addToFathersIncome(int n)
 {
  myFriend.annualIncome = myFriend.annualIncome + n;
  //noOfKeys++;// Not allowed because a friend does not automatically inherit everything from another class. It has to explicitly use an object of the friend class.
 }
 void assocFriend(Father f)
 {
  myFriend = f;
 }
};

class Son : public Father
{
private:
 int noOfHairCream;
public:
 void subFromFathersIncome(int n)
 {
  //Father::annualIncome = Father::annualIncome - n; //error C2248: cannot access private member declared in class 'Father
  subIncome(20000); //Have to access inheritted function's private members through a public function.
  noOfKeys++; //Can directly access public members.
  //A Son can directly inherit a father's public property, but not his private property.
  //annualIncome = 10;//Flag Z: This is accessible if Son is also a friend of Father!
 }
 
};

void main()
{
 Father father;
 Son son;
 Fathers_Friend fathers_friend;
 fathers_friend.addToFathersIncome(10000);
 son.subFromFathersIncome(20000);
}

An object of the class Father inside the friend class Friend can be used to access a private member of the class Father. None of the private members of Father are accessible to Friend except through the object of Father class. On the other hand, Son can access all public members of Father since Son inherits from Father. However, Son cannot access private members of the Father class.

Why is this so? Why are classes inheriting another class have lesser privileges than friend classes? It is so because a class declares who all are its friends inside the class definition, whereas, any class is free to inherit from another class without requiring the parent’s consent. This means, the parent knows nothing about the children classes, but it knows about the friend classes and trusts them to not misbehave.

A special case may prevail when Son inherits Father as well as is declared as a friend of Father. This case can be seen in the above program by un-commenting the Flag Z labeled statements. In this case, the Son inherits public members of the Father class as well as the private members. Son does not need an explicit object of Father, because the inherited class always has an implicit copy of the base class.

Just as a side note, I am yet to see a good example of use of friend classes. I am not aware if there is something that cannot be done without having the friend class construct. Java does not having anything like a friend class.

Why I dislike the C Hello World program

The computer is a machine that takes input, processes it and produces some output. This is a great definition in its own, but not without flaws. Many have argued that there might be some programs which do not take input from the user and still produce an output. One such example is the common Hello World program that one will encounter in all major programming language books. All that such a program does is print the words “Hello World” on the screen. This is how a C Hello World program looks like.
#include < stdio.h >
int main()
{
printf("Hello World\n");
return 0;
}

Look at this monster. It is a very complicated program to understand, unlike what a Hello World must be. Worse still, many students face this monster on the very first day of their programming lessons. It took me many years to completely comprehend what is happening in that program and then started hating this program. Let us start from the very beginning.

A program written in the C or C++ language goes through a series of operations when we compile it. At the end of the compilation process, an exe file is created which contains machine level code that can be executed by the underlying microprocessor. To print “Hello World” on the screen, we need to convert this program in to the machine readable format. The microprocessor then must understand that the characters have to be rendered into the frame buffer of the screen to appear on the physical screen.

Even before the compiler can convert this program into a machine readable format, something else needs to be done. The #include is a preprocessor directive. A small program called the preprocessor acts on all the commands starting with a hash (#). The #include command asks the preprocessor to copy the contents of another file into the text of this file. The included file in turn may include more files within itself. A fully expanded version of the C program is much larger than the 6 line program here. The standard IO header file (stdio.h) contains information about how to display characters on the output device and how to take input from the user.
Then comes the magic function main. The main function is always required in any C or C++ program. It is defined as the entry point of a program, but is it really so? If we do not specify the main function, the Visual C++ linker will give the following error: LINK : fatal error LNK1561: entry point must be defined

Therefore, main must be defined. But that does not mean that it is the entry point, or the very first thing that is executed in a program. Look at this second program.


#include < stdio.h >
#include < conio.h >
int func1();

int a = func1();
int b=func1()+100;

void main()
{
printf("a=%d, b=%d\n", a, b);
getch();
}

int func1()
{
printf("Entered func1()\n");
main();
return 10;
}


The main is not the first function to be executed. In fact, another function is executed because its return value is to be applied to a global variable. All global variables are evaluated before the entry point function is called. What this essentially means is that one may write an entire program without ever going inside the main function. Just create a global function and a global variable that takes the return value of this global function and write the entire program inside this global function. After all the processing is done, just call exit() before returning. In such a program, the main function will exist only to please the linker. It will never be called!

Having creased the foreheads of many teachers with the difficulty of explaining this simple program, let us move to the printf function. What does this function do? It prints the characters inside the quotes on screen. But as a side effect, it also returns the number of characters printed. Most real programs never use the return value from printf, but most interview questions do. What is even more intriguing about the printf function is that it is able to take any number of parameters. How do we write such a function? The printf function uses something called as the variable arguments facility present in the C construct. While declaring a function, you may say that there might be any number of parameters to a function. In the definition of the function, one would act on all these parameters. Within the quotes, one may specify certain special character sequences to indicate that a value from some variable will be plugged into at this place. Typical usage is:
printf(“a=%d, b=%d”, a, b);

The number of % sequences inside the quotes specifies the number of variables that will follow. One may specify more or less variables, and depending on the compiler, the additional variables may be ignored and the lesser variable values may be displayed as garbage. The way such a function is defined is by using ellipses (…) inside the function definition.

int printf(char *_Format, ...);

It was not before my final year of engineering, when we wrote a logging function for our project, that I first wrote a function using the ellipses. At that time, just giving … on Google Search yielded a blank page with no results and nothing written below the Google search bar. It has changed since then.

By introducing the printf function on the very first day of programming, we actually introduce a monster whose true value is never appreciated by most students.
Now let us look at the final statement. We are returning a zero. This is in line with the main function having promised that it will return an int in its definition. But who are we retuning to? This brings us to the discussion of how a program is run. After an executable of some kind has been created, a program known as the loader is kicked in. The program is run by the Unix shell or by a DOS terminal; the program will return control to the respective parent. The return value will be accessible before executing any other program.

Still, this Hello World program is most commonly taught as the first program to students. Most people think that it is as simple as it gets. Students are only satisfied if they see some output on the screen. I shall not argue, but I do expect the teachers to one day go back to the hello world program and appreciate the real complexities of these simple 6 lines of C code.
I hope my explanation proves helpful to someone.