Sunday, January 17, 2010

How to find a mini-project

Every now and then, just as we near the end of a course we are entrusted with a mini project to be presented before the course finally comes to an end. Except to some entrenched souls who find this a relief from the monotone of assignments, most of us are immediately faced with the monstrous task of finding an appropriate project title and then see it to completion. A great many projects never get finished. A great many never really get started.

If we go back a few years and look into our mathematics text books, we will find that after every few chapters, a set of miscellaneous questions has been appended which requires us to apply the principles applied in the previous chapters. If your mathematics text books did not have this feature then probably those pages were glued together because some lawmakers felt that they were too much burden for the delicate minds. Jokes apart, these miscellaneous questions have a lot in common with the mini project that is expected of you today.

You have gone through a fair enough portion of the course and it is time to apply the knowledge that you have gained so far. I am not here to give you a list of projects to do. But I am going to provide you with ideas. I hope that you will think of similar projects on your own and not just copy my ideas.

A good feature of using TurboC in DOS is that you can create cool graphics by using the graphics.h header file. In actuality, a lot of work is being done for you by the header file, but we can remain blissfully ignorant about it. For someone who wants to know how the graphics header file accomplishes this feat, I have a project to offer. There is a great book by Peter Abel describing the DOS operating system interrupts and programming in the assembly language. With aid of this book, one may, inside TurboC create new functions to initialize the graphical screen, display pixels and change colors. Create a library of functions to draw simple shapes such as line, circle and rectangle using the most basic DOS interrupts and asm commands inside TurboC. Ask your friends to use your library instead of the graphics.h file. Tempt them with some additional features such as the ability to draw and fill tilted rectangles. This project will require knowledge of the algorithms to draw lines, circles etc. but they may be directly obtained from someone who has done the computer graphics course or from the internet. The intent of this project is to get the understanding that underlying every library is a set of basic constructs.

Someone has said that a good program is one which has a lot of inline documentation. Okay, there will be some argument about whether someone said it or nobody did, but now that I have, Google will surely find it. But how embarrassing comments would be to its author if he inadvertently slips in spelling mistakes? “But since the advent of the computers, who remembers spelling?” you want to ask. True. It is not necessary during writing your report because your word processor will do most of the work for you. But what about writing comments in code? Fortunately, editors such as Eclipse now do spell check in comments. What a God-send! But wait a minute. We don’t use Eclipse for writing our programs because our teachers feel that IDEs write too much code for you and leave less to learn. Too bad, except that this rationale of your teachers has created a great possibility for your project topic. Write a spell checker for comments in C or C++ programs. First of all, fish out a good enough dictionary from the web. An extremely complete dictionary is not required because most programmers do not use artistic language in their comments (if any one does, please remind them that the comments are for others to understand and it would serve its purpose better if written in plain English!) So just fish out a list of about a 100 thousand most commonly used words and store it in a file. Most such lists are sorted alphabetically. Now write a program to filter out comments from C/C++ programs and then check the spellings of all words in the comments. Simple, isn’t it? But there is a catch. Most programmers would freely make use of variables inside their comments to explain what they are doing. Since most variables are not proper English words, they would be flagged by your spell checker as incorrect and would irritate the user. So, your program must also store all declared variables and keywords and ignore them when applying the spell check. If this complexity was not too much for you, you will complete the program and observe something bitter. This program takes too long to execute even on small input texts. The reason is that you have stored the list of valid words on the disk and have to go through the entire list for finding each word. This computational effort can be reduced if you use something known as a hash-map. A simple way to do it would be to divide the list into 26 separate lists each for a symbol in the English alphabet and then search only the relevant list. Even inside a list one could use binary search to quickly converge on the correct word, or to report that there is a spelling mistake. The sharp minds amongst you will notice that dividing the list into 26 lists each for a symbol in the alphabet is not a very good idea because there are differing number of words starting from each letter. Some other, better hashing function can be devised, but I will leave you with that.

A common project done by many students is encryption of plain text. Substitution cipher is the most common method used. Professional cryptography requires that the algorithm of the cipher must be public, but these students writing the substitution cipher put too much trust in their formula of substitution and swear never to divulge it. Lo and behold! There are chits passing under your nose and you cannot understand a word of it. You cannot tolerate this and so you want to write a program that breaks this code. Remember the sentences passing under your nose are only small ones. Here is what you do. You first fetch a full word list from the web. Then you write a program that substitutes each letter in the message with another letter and check if the words appear in the word list. When a match is found, the program returns the association and a possible formula. Of course you know that if the messages become bigger or if the formula is changed, your program will take longer to run and figure out what the formula is. Also, if the formula changes per word / per letter, you are out of luck. Try running your program on more capable computers such as servers and you will see that the program returns results much faster than on a simple desktop. (If it does not happen so, do ask your system administrators to change the servers!) This kind of attack on ciphers is called brute force attack and can also be launched on standard encryption algorithms like AES. These algorithms use a key and a program that can guess the key can decipher the code. Keys are generally so long that it will take most computers a very long time to figure out the key and documents are encrypted with the appropriate length of key depending on their importance.

Your friend has keen interest in cryptography and one of the features in crypto that he is particularly intrigued by is the ability to storing and operating on big integers. These are numbers 128 or 256 bit long (or longer) and are to be stored accurately till the units place. Java provides an inbuilt package for operating on such large numbers, but you do not know if C allows it too. Instead of trying to find it out on the internet, your friend has asked you to take up the task and create a construct for the same in C / C++. Therefore you have taken it up to yourself to create a class called BigInt which will allow storage and operations on extremely large integers. You plan to store the integer in a linked list of integers and will support only the basic operations like addition, subtraction, multiplication and division. Your friend begged you, so you will also support modulo function. Enjoy! Is there some way of finding out if this big number is a prime?

A friend of yours is in aeronautical engineering and has called upon your computer expertise to solve a persistent problem. They frequently use integrals with limits and have to use calculators to solve them. But these problems are really only a part of a larger scheme of things and hence it would be best if they had a library that solved the integration problems. Think of some way to represent arbitrary integration problems for input and then solve those using numerical methods. You might consider keeping the log book as a look up table in the computer’s memory.

Confident that the computer can solve any problem, you have set out to see your future. For starters you plan to create a computer program that will predict the mass, position and velocity of all particles in a closed box at some future point in time given their mass, positions and velocity at some earlier time. Make all simplifying assumptions such as the walls of the box are perfectly rigid and the particles inside the box are under standard gravity or no gravity and are perfect spheres and also rigid. Create a program that will input an arbitrary number of particles, their positions and velocities in three dimensions and the time at which you seek to know their positions and velocities. Simulate all collisions with the walls and each other and predict the position at the required time. Be happy that you have predicted the future of these hypothetical particles after you complete the program, but bear in mind that Heisenberg’s uncertainty principle laughs at your simplifying assumptions.

You have learnt how a projectile travels and it can be easily demonstrated in a game. QBasic came with a game called gorilla. Two gorillas stood atop high rise buildings and threw banana skins at each other. It was a two player game and each player was expected to provide a velocity and angle to his gorilla for throwing the banana skin. There existed an element of wind and gravitation to make things more difficult. You may create some form of that game.

If you use Linux for writing C / C++ programs, sprint library may be used for graphics programs. I have only listed some projects that came to my mind of-hand. However, if you look around, or look in your own text books of other courses, you should be able to find dozens of interesting projects to do. If you belong to computer science, a project done for some other engineering field will be a humbling experience where you will understand that computers are really only aids for supporting the variety of branches of science and engineering. If you are from other fields, you will feel proud that you have made life easier for many of your friends by introducing a simple tool for everyone to use. Enjoy your project. Nothing prepares one for the real world as projects do. To see the projects that I undertook during my BE and MTech, please visit my webpage.

No comments: