1

I have a project as an assignment and I haven't implemented it yet, but I have the idea for it figured out, but the last step is something that I can't figure out, how to even implement. This is what I want from my project:

I want to create a text based compression model using java. I will create a Huffman encoder and decoder and store these programs in both computers: PC1 and PC2.

Now in PC1, i will give a "data.txt" file as an input to my Huffman encoder program and it will create another file called "binary.txt" with 0's and 1's. Now instead of transmitting the "data.txt" file, I will email "binary.txt" file to PC2 and using the decoder java program there, I will try to recreate "data.txt". This way I would have compressed my file using Huffman mechanism.

So the problem is that I am new to java (which doesn't scare me as of now), and I don't know how to export the tree data structure from PC1 along with "binary.txt" file because for Huffman decoding (later on in PC2), I will need to have access to the Huffman tree which was created in PC1, so how do I go about solving or rather creating this project?

Are there any other better ways to implement this project? Hashmap or splay tree or any other data structure?

Thanks

theprogrammer
  • 1,698
  • 7
  • 28
  • 48
  • I do not know Huffman coding, but it sounds like you're interested in serializing and deserializing a tree structure into/from some transferable format like a text file which I guess you would send along with the binary.txt file. You should take a look at tree traversal algorithms and figure out a way to make use of those. This question might provide a solution as well: https://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree – Roman Apr 18 '16 at 02:19
  • Usually when I want to export a tree I do an "in-order walk" and print (or write to a stream) each node. https://en.wikipedia.org/wiki/Tree_traversal – markspace Apr 18 '16 at 03:01
  • 1
    Or on the other hand, say that I have created a class and created an object of that particular class and then set an attribute called "length" to 5 for that particular object. Is there anyway I can use this value in another java program? Also after posting this question, I came across another site where it said something about including header files in case you want to export references. Any idea about these two things? – theprogrammer Apr 18 '16 at 03:21
  • @rohitkrishna094 Perhaps you are thinking of object serialization? http://www.javapractices.com/topic/TopicAction.do?Id=57 – Roman Apr 18 '16 at 03:34
  • @Roman Yes I think that's exactly what I need. Thanks. – theprogrammer Apr 20 '16 at 03:59
  • @markspace I find your idea of doing an in-order walk very intuitive and simple. However is there a way I can recreate an entire tree using inorder traversal? – theprogrammer Apr 20 '16 at 04:00

2 Answers2

0

You don't need to export your Huffman tree into your binary.txt file. You need a Map, which maps all symbols from your text to (0,1)-strings - for coding, and you need a reversed Map - for decoding. You should use Huffman algorithm to build the tree, and then - use this tree to populate the Map. After that you can forget about the tree.

So, your 'binary.txt' file must contain only this reversed Map in some form, understandable by your decoder.

HEKTO
  • 3,876
  • 2
  • 24
  • 45
  • But the main aim of my project is to reduce the size of "data.txt" to the size of "binary.txt" (the 1's and 0's of Huffman encoding). So is there anyway I can just transmit this binary.txt file and make my decoder read it and decode it? and also if I used a map, I would have to transmit both "binary.txt" and the reversible map to the receiver end (PC2 in my case) right? So, wouldn't that take more space than the original "data.txt" file (in some cases)? – theprogrammer Apr 18 '16 at 04:40
  • If text is short then yes - the space, taken by the reversed map, will be significant percentage of the whole binary file. But in this case compressing doesn't make sense. For really long texts the reversed map will take small fraction of the binary file. – HEKTO Apr 18 '16 at 04:51
  • So if I have longer texts, then reversed map technique is easier to implement right? But if I have shorter files, is there a way to compress using Huffman technique, well in this case, compression doesn't make sense, but is there a way we can compress it without any loss of data). – theprogrammer Apr 18 '16 at 04:56
0

If you create a canonical Huffman representation, you can transmit only the code lengths. The decoder can recreate the tree from those.

flanglet
  • 564
  • 4
  • 11
  • I don't know about canonical Huffman Tree, but will check it for sure, but I have another doubt. If the decoder creates a tree from code lengths, wouldn't it be more inefficient instead of say a hash-map approach someone above suggested? Thanks – theprogrammer Apr 18 '16 at 22:43
  • Canonical Huffman is rather simple. Roughly speaking, it is using lexical (natural symbol) order as secondary key to create Huffman codes. You save some space by encoding only code lengths instead of a full tree. Now, if you want to be fast, do not use a tree at all: use arrays. It is a little bit more complex to write but way faster than a tree. Example here: https://github.com/flanglet/kanzi/blob/master/java/src/kanzi/entropy/HuffmanEncoder.java. – flanglet Apr 19 '16 at 00:28