CS14 - At-Home Programming Assignment 4
Project due Thursday, Feb 24 at 8:00 pm
In this project, you will write a program to hash 5 letter words. You will
decide on your own hash function. Use seperate chaining. You must use the
Standard Template Library list class for your chains. The program will
consist of two parts: passing the functionality tests and the success
of your hash function. Your grade will be based on the functionality
tests. We will then compare the success of your hash function will
the other students in the class. Extra credit will be given to a few
of the top hashing functions. For us to test the functionality,
you need to write at
least the following functions in your Hash class.
You must use the function prototypes
provided. If you change these function prototypes at all, your code will
not interface with the test file and your project will not be graded.
- void print ( ) - Print the hash table in the following format.
Each array location will be printed on a seperate line. The line will
start with the number of the array location followed by a : and followed
by one tab. You will then print each word stored at that hash array location
seperated by a , and a single space. After all words at that hash array
location are printed, you will output a return.
Here is a sample of what your output should look like
- void processFile ( string filename ) - Add all of the words in the file
specified as "filename" to the hash table. Here is
a sample input file. Use push_back to insert each of the words into
the list at the hash location that the words hashes to.
- bool search ( string word ) - Search for the string "word" in the
hash table. Return true if it is found and false if it is not found
- void remove ( string word ) - Remove the string "word" from the
hash table
- void output ( string filename ) - Output the hash table to the file
specified as "filename." Print instructions are the same as listed above.
- void printStats ( ) - Print the hash statistics. During program
execution, you will need to keep track of some statistics.
- Keep a count of the number of collisions when doing inserts. A collision
is when a word is being inserted into a non-empty list, which will add 1 to
the count (regardless of how many words are already in the list). All
other operations should not affect count in any way
- Keep track of the length of the longest list
- Keep a track of the running average list length (this means that you will
update this value after EACH SUCCESSFUL insert and remove operation and
not simply
find the average list length at the end of the program). So after EACH insert
or remove operation, determine the average list length of the current hash
table. Then average this with the previous average list length. Your average
list length must match mine exactly (you will see a sample below). So, to make
this a bit more clear, your average length should start at 0 and each time
a successful insert/remove is done update the average length. Here is
a formula you can use to update your average length:
avgLength = (newAvgListLen + avgLength) / 2.0
The Hash Function
You will need to write your own hash function for the project.
You will need to write a unique and original hash function so that you
can be eligible for extra credit (you are competing the creativity
of the other students). Your hash function must be contained within
hash_function.cc.
Hash_function.cc already contains a very
simple and very poor hash function. You will need to delete the contents
of the hash function and replace it with your own hash function. You MUST
place your hash function code in the hash function in
hash_function.cc and link this file to your project with your makefile.
This is extremely important because to test the functionality of your program,
we will be replacing your hash function with our own. Actually, we will
simply download a fresh copy of
hash_function.cc and compile and run your program. If we are unable
to do this, you will lose points on your project. Like the main.cc file,
DO NOT put any code necessary for correct operation of your program in
hash_function.cc besides your hash function.
At the top of this file, you will need to describe your hash function, why
it is good, and how you happened upon it.
Defining the hash table size
The size of your hash table should be specified as a defined value. If
you look at hash_function.cc you will
see that there is a value HASH_TABLE_SIZE that is not defined in this file.
For this assignment, you will be learning something new. Using a command
line parameter to g++, you can define the value of constants used within
your program during compilation. This is accomplished using the -D flag
(usage = "-D HASH_TABLE_SIZE=10"). This example sets the hash table size
to 10 for your program. This table size is extremely small so I suggest you
test your program with table sizes into the thousands. So for this project,
the flags to g++ in your makefile will be "-Wall -W -Werror -pedantic -g
-D HASH_TABLE_SIZE=X" where X will be replaced with whatever size you
want your table to be. Using this method will make it very simple for
us to test your programs for different situations.
Provided Files
- hash_function.cc - Put your hash function in
the function in this file and link this file in with your makefile. DO NOT
include your the file by having #include "hash_function.cc" in your
hash.cc file. This is very, very, very bad. You MUST link in the hash_function.cc
file with your makefile. (Remember
to be creative so that you can compete for the extra credit).
Like the main.cc file,
DO NOT put any code necessary for correct operation of your program in
hash_function.cc besides your hash function.
- hash.h - Your hash class with the declarations
of the functions needed to interface with main. The variables and function
prototypes provided are by no means inclusive and you will most likely
need to add to this class
- main.cc - We will be using this main file to
test your program. This means, the first thing that we will do when grading
is delete your main.cc and replace it with ours. DO NOT put any code necessary
for correct operation of your program in your main.cc file. The output of
your program must look EXACTLY like sample.out using
the provided hash function in hash_function.cc
and a hash table size of 10. This means that you need to
do a diff between our sample output and your actual output to determine if
your program will pass our testing (do a man on diff to read about it).
Be advised, that if your output does not look EXACTLY like our output, you
will lose points. This means that diff would output nothing when comparing
our sample output to your sample output.
In addition to main.cc, we may use additional test
cases or test files. If
I don't test something in main.cc do not simply assume
that you don't need to
handle it and that it won't be tested.
- sample.out - Sample output obtained by using
main.cc and
hash_function.cc
- test1.txt - The sample input file used for testing
main.cc.
- hash.out - The output obtained by printing the
hash table in main.cc to the output file. Your hash.out
must look EXACTLY like this one. Do a diff to make sure.
The hash competition
Extra credit will be available for the top few hash functions (don't ask
how much extra credit or how many students will get it.) To be
eligible for the hash competition, your hash statistics must match
mine EXACTLY during the functionality testing phase (see the stats
at the bottom of sample.out and make sure that
yours are EXACTLY the same. The reason for this is fairness. We have to make
sure that you tally your stats correctly before we can compare your stat
with the other students. We will test table sizes of 2004, 10007, and
15659 with
dict5.txt and random.txt.
Use these files and these table sizes to tune your hash function. Additionally,
we will be recording the time that your hash function takes to execute and
the time will also be considered when determining the winners. You want
to create a good hashing function but if it takes too long to execute,
the hashing function is not as useful.
Special note - you must have a unique hashing function to get extra credit. If
the top hashing functions are too similar, those students will share a small
amount of extra credit. This is to ensure that you do not talk with
the other students about your hashing function.
Programming
Be sure to use good programming where possible (use of classes, functions,
good parameter passing techniques, operator overloading, templates, error
checking). Part of your grade will be based on your actual coding.
What to turn in
You should have turned in what you have thus far on our program at least
6 hours before the due date (by 2pm). Then continue to work on your program
and turn in more current versions as you get them working. Ideally, you should
have your project done well before the due date.
You must turn in all .cc and .h files and your makefile. In addition, you
must follow any guidelines and include any information that is stated
on the
main page about at-home programming assignments.
DO NOT, I repeat, DO NOT turn in any files that are not required for the final
compilation of your assignment. Do not turn in your temporary .cc files
or your saved/olded revisions of your .cc files.
If you turn in files that aren't required for your
assignment, it hinders the grading processes. You will be docked points
if you turn in extraineous files.
A reminder about collaboration on home programming assignments
Please remember, at-home programming assignments are not lab assignments and
you may not team-code with your lab partner or any other individual. Limited
collaboration may be acceptable, but programs must represent
YOUR OWN original work. Sharing code or team-coding are strictly
not allowed. Copying code from ANY source (any book, current or
past students, past solutions, web, etc) is
strictly not allowed even with citation. Collaboration may
consist of discussing the general approach to solving the
problem, but should not involve communicating in code or even
pseudo-code. Students may help others find bugs. Your code MUST
be unique -- the odds of randomly producing similar code is
very low. Computing, like surgery or driving a car or playing
golf, can only be learned by doing it yourself!