.h files.
But I want to use a helper function.
Don't I need to modify the .h file to add a function prototype declaration for my helpers?
Can I still use helper functions even if I don't modify the .h file?
#include a file, the compiler literally just copy/pastes the contents of that file into the current file.
We have already done this on hw1, hw2, and others.
PSEUDO_EOF is a global constant that is visible to your program.
It is just an int constant whose value happens to be 256, so you can put it in your map as a key with the value of 1. Something like this:
myMap.put(PSEUDO_EOF, 1);
You also need to explicitly write out a single occurrence of PSEUDO_EOF's binary encoding when you compress a file, in Step 4 (the actual encoding of the data, represented by the encodeData function).
Write out all of the necessary bits to encode the file's data, and then after that, look up the binary encoding for PSEUDO_EOF and write out all of that encoding's bits to the file at the end.
-1? Because file input functions like get() return -1 when you reach the end of the file, so are they returning "real" EOF?
There is a difference between PSEUDO_EOF and the notion of a "real" EOF.
PSEUDO_EOF is 256, and it's a fake value that our program is using to signal the end of compressed data in a file.
A real EOF is not -1.
It is not a character or integer value at all; it is something decided internally by the operating system.
The real file system knows where the end of a file is because there is master table of data about all the files on the disk, and that table stores every file's length in bytes.
The OS doesn't insert any special character at the end of each file; it just knows that you have hit the end-of-file once you have read a certain number of bytes equal to that file's length.
The input stream's get function just returns -1 when you're done because that's how they chose to indicate to you that the file was ended, not because an actual -1 is on the hard disk.
NOT_A_CHAR?
When will I see it?
What do I need to use it for?
NOT_A_CHAR, like PSEUDO_EOF, is a global constant that is visible to your program.
It is just an int, so you can use it in places where a character is expected.
The only place NOT_A_CHAR should be used in this assignment is when you create a HuffmanNode that has children, when you are combining nodes during Step 2 of the encoding process.
The parent node has two subtrees under it and it doesn't directly represent any one character, so you store NOT_A_CHAR as the character data field of the parent node.
That should be the only time you see NOT_A_CHAR and the only place you need to use it.
You'll never see that value in an input or output file or anything like that.
<< operator, so you can print them out.
There is also a printSideways function provided that takes a HuffmanNode* and prints that entire tree sideways.
<< operator, so you can print them out.
There is also a printSideways function provided that takes a HuffmanNode* and prints that entire tree sideways.
A: Here's a rundown of the different types of streams:
istream (aka ifstream) reads bytes from a file.
You'd use this to read a normal file byte-by-byte so that you can compress its contents.
ostream (aka ofstream) writes bytes to a file.
You'd use this to write to an uncompressed file byte-by-byte when you are decompressing.
ibitstream reads bits from a file.
You'd use this to read a compressed file bit-by-bit when you are decompressing it.
obitstream writes bits to a file.
You'd use this to write to a compressed file bit-by-bit when you are compressing.
Here's a diagram summarizing the streams:
compress:
+-----------------+ read bytes write bits +-----------------+
| normal file | istream YOUR obitstream | compressed file |
| foo.txt | --------------> CODE ---------------> | foo.huf |
+-----------------+ 'h', 'i', ... 010101010101 +-----------------+
=================================================================================
decompress:
+-----------------+ read bits write bytes +-----------------+
| compressed file | ibitstream YOUR ostream | normal file |
| foo.huf | --------------> CODE ---------------> | foo-out.txt |
+-----------------+ 010101010101 'h', 'i', ... +-----------------+
You never need to create or initialize a stream; the client code does that for you. You are passed a stream that is ready to use; you don't need to create it or open it or close it.
<< and >> operators to write your map into the stream, and then after that, read or write the binary bits as appropriate. Something like this:
// compress output << frequencyTable; // write header while (...) { output.writeBit(...); // write compressed binary data }
// decompress Map<int, int> frequencyTable; input >> frequencyTable; // read header while (...) { input.readBit(...); // read compressed binary data }
compress and decompress.
The other functions, such as encodeData and decodeData, should not worry about headers at all and should not have any code related to headers.
buildFrequencyTable, encodeData, etc.) work fine, but my compress function always produces an empty file or a very small file. Why?
compress function reads over the input stream data twice: once to count the characters for the frequency table, and a second time to actually compress it using your encoding map.
Between those two actions, you must rewind the input stream by writing code such as:
input.clear(); // removes any current eof/failure flags input.seekg(0, ios::beg); // tells the stream to seek back to the beginning
char rather than as type int.
Use int. Type char works fine for ASCII characters but not for extended byte values that commonly occur in binary files.
char? What char value can I use to represent nothing, or the lack of a character?
char value is '\0', sometimes called the 'null character'. (Not the same as NULL, the null pointer.)
But Huffman nodes that have children should store NOT_A_CHAR, a constant declared by our support code.
freeTree function?
Do I ever need to call it myself?
buildEncodingTree function should not free the tree because it is supposed to return that tree to the client, and presumably that client will later free it.
But if you call buildEncodingTree somewhere in your code because you want to use an encoding tree to help you, then when you are done using it, you should immediately call freeTree on it.