Many compression algorithms take advantage of the fact that there's redundancy/patterns in data. aaaaaaaaaabbbbbbbbbbbcccccccccccc
could be compressed to 10'a'11'b'12'c'
, for example.
But that there's no more redundancy in my compressed data, I couldn't really compress it further. However, I can encrypt or encode it and turn it into a different string of bytes: xyzxyzxyzxyzxyz
.
If the random bits just so happened to have a pattern in them, it seems that it'd be easy to take advantage of that: 5'xyz'
Here's what our flow looks like:
Original: aaaaaaaaaabbbbbbbbbbbcccccccccccc
Compressed: 10'a'11'b'12'c'
Encrypted: xyzxyzxyzxyzxyz
Compressed again: 5'xyz'
But the more data you have, the larger your file, the more effective many forms of encryption will be. Huffman encoding, especially, seems like it'd work really well on random bits of data, especially when the file gets pretty large!!
I imagine this would be atrocious when you need data fast, but I think it could have merits for storing archives, or other stuff like that. Maybe downloading a movie over a network would only take 1MB of bandwidth instead of 4MB. Then, you could unpack the movie as the download happened, getting the full 4MB file on your hard drive without destroying your network's bandwidth.
So I have a few questions:
Do people ever encode data so it can be compressed better?
Do people ever "double-compress" their data?
Are there any well-known examples of "double" compression, where data is compressed, encrypted or encoded, then compressed again?