8

Possible Duplicate:
Reversing an MD5 Hash

Given this method in c#

public string CalculateFileHash(string filePaths) {
    var csp = new MD5CryptoServiceProvider();
    var pathBytes = csp.ComputeHash(Encoding.UTF8.GetBytes(filePaths));
    return BitConverter.ToUInt64(pathBytes, 0).ToString();
}

how would one reverse this process with a "DecodeFileHash" method?

var fileQuery = "fileone.css,filetwo.css,file3.css";
var hashedQuery = CalculateFileHash(fileQuery); // e.g. "23948759234"
var decodedQuery = DecodeFileHash(hashedQuery); // "fileone.css,filetwo.css,file3.css"

where decodedQuery == fileQuery in the end.

Is this even possible? If it isn't possible, would there by any way to generate a hash that I could easily decode?

Edit: So just to be clear, I just want to compress the variable "fileQuery" and decompress fileQuery to determine what it originally was. Any suggestions for solving that problem since hashing/decoding is out?

Edit Again: just doing a base64 encode/decode sounds like the optimal solution then.

public string EncodeTo64(string toEncode) {
    var toEncodeAsBytes = Encoding.ASCII.GetBytes(toEncode);
    var returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
    return returnValue;
}

public string DecodeFrom64(string encodedData) {
    var encodedDataAsBytes  = System.Convert.FromBase64String(encodedData);
    var returnValue = Encoding.ASCII.GetString(encodedDataAsBytes);
    return returnValue;
}
Community
  • 1
  • 1
tester
  • 22,441
  • 25
  • 88
  • 128
  • What and why do you want to reverse a hash in the first place? What are you trying to accomplish? – Thomas May 06 '11 at 22:29
  • i'm trying to build a chain of file names, compress for the actual file name, but decode them for figuring out what filenames are included in the concatenated file – tester May 06 '11 at 22:31
  • It sounds like you are trying to compress the files. I have posted an example of what that might look like. – Thomas May 06 '11 at 22:49

6 Answers6

15

Impossible. By definition and design hashes cannot be reverted to plain text or their original input.


It sounds like you are actually trying to compress the files. If that is the case, here is a simple method to do so using GZip:

public static byte[] Compress( byte[] data )
{
    var output = new MemoryStream();
    using ( var gzip = new GZipStream( output, CompressionMode.Compress, true ) )
    {
        gzip.Write( data, 0, data.Length );
        gzip.Close();
    }
    return output.ToArray();
}
Thomas
  • 63,911
  • 12
  • 95
  • 141
  • Yep, hash functions are one way. – House May 06 '11 at 22:28
  • how does something like this: http://www.md5decrypter.com/ work then? – tester May 06 '11 at 22:29
  • @tester: Most likely it is trying out a large number of strings to see if any of them produce the wanted hash code. – BrokenGlass May 06 '11 at 22:30
  • @tester - That is basically a cracker for MD5. In short, they have found a way to subvert the original purpose of the MD5 hash which is the reason that MD5 is no longer recommended. The new SHA hashes and now, SHA128 or higher are recommended. – Thomas May 06 '11 at 22:31
  • 2
    Although you can be lucky and be able to reverse MD5 sums of short strings or similar (f.e. MD5(test) <=> 098f6bcd4621d373cade4e832627b4f6), you can never be sure if the string you found was the original string or a collision. Also see http://www.mscs.dal.ca/~selinger/md5collision/ that has some interesting collision examples. – schnaader May 06 '11 at 22:31
  • 2
    @tester - Btw, that site does not account for salts. I.e., if your hash has a salt value added to the input prior to hashing, that site won't help you. (Which is also why you should use salts). – Thomas May 06 '11 at 22:33
  • Sorry, it is possible as there are flaws in algorithms used, and you can always generate an "equivalent" - not the exact input, but something that gives same hash result – TFD May 06 '11 at 22:34
  • @tester - If what you are trying to accomplish to protect something such that you can revert it to its inputs, then you want to *encrypt* it rather than hash it. I.e., you should look into the AES (Rijndael) encryption algorithms. Hashes are designed to be a one-way trip. – Thomas May 06 '11 at 22:34
  • 10
    If you could reverse a hash, you would have the most powerful compression algorithm ever. No matter what you hash using MD5, the result is 128bits. If you were able to reverse that, you would be able to get your 600MB video file back from the 128 bit hash! – Timothy Strimple May 06 '11 at 22:35
  • 1
    @tester: that site only reverses MD5s for strings that they've already created MD5s with md5encrypter.com. For proof of this, create an MD5 for some random string of text using another tool. Enter that MD5 into md5decrypter.com. It'll come up unknown. Then create the MD5 using md5encrypter.com, and enter that MD5 back into md5decrypter.com. It'll come up known. – CanSpice May 06 '11 at 22:42
  • @Thomas, just compressing the string of file names, not the actual files. Sorry if I didn't make that clear. Cool solution though – tester May 06 '11 at 22:50
  • 1
    @tester - You can use that solution for both. – Thomas May 06 '11 at 22:52
6

A hash is derived from the original information, but it does not contain the original information. If you want a shorter value that hides the original information, but can be resolved to the original value your options are fairly limited:

  • Compress the original information. If you need a string, then your original information would have to be fairly large in order for the compressed, base-64-encoded version to not be even bigger than the original data.
  • Encrypt the original information - this is more secure than just compressing it, and can be combined with compression, but it's also probably going to be larger than the original information.
  • Store the original information somewhere and return a lookup key.
Joel Mueller
  • 28,324
  • 9
  • 63
  • 88
  • 1
    *typically* derived from the original information? If not from the original information, then from where? Salted hash still derives from the original information... otherwise what's the point? – Matthew May 06 '11 at 22:34
  • 1
    Fine, fine, you win one nitpicking point. – Joel Mueller May 06 '11 at 22:35
4

What you want to do is Encrypt and Decrypt....

Not Hash and Unhash which, as @Thomas pointed out, is impossible. Hashes are typically defeated using rainbow tables or some other data set which includes something which produces the same hash... not guaranteed to be the input value, just some value which produces the same output in the hashing algorithm.

Jeff Atwood has some good code for understanding encryption here:
http://www.codeproject.com/KB/security/SimpleEncryption.aspx

If that's useful to you

Matthew
  • 10,244
  • 5
  • 49
  • 104
4

A cryptographic hash is by definition not reversible with typical amounts of computation power. It's usually not even possible to find any input which has the same hash as your original input.

Getting back the original input is mathematically impossible if there are more than 2^n different inputs. With n being the bitlength of the hash(128 for md5). Look up the pidgeonhole principle.

A hash is no lossless compression function.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
4

If you want to be able to get the data back, you want compression, not hashing.

Rob Agar
  • 12,337
  • 5
  • 48
  • 63
  • how might one write a method for compression that solves my problem? – tester May 06 '11 at 22:36
  • I'm not sure. General purpose compression algorithms wouldn't really work with short strings - the "compressed" version could be longer than the original, and is likely to include characters not allowed in file paths. How about another approach: why not just prepend the original file paths to the concatenated data? – Rob Agar May 06 '11 at 22:47
3

A cryptographic hash, like MD5, is designed to be a one-way function, that is, it is computationally infeasable to derive the source data from which a given hash was computed. MD5, though, hasn't considered to be secure for some time, due to weaknesses that have been discovered:

Wikipedia on MD5 Security MD5 Considered Harmful

Another weakness in MD5 is that due to its relative small size, large rainbow tables have been published that let you look up a given MD5 hash to get a source input that will collide with the specified hash value.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135