2

I'm writing a simple Java http server that responds with JSON data. I'm trying to GZip the data before sending it, but it usually sends back gzipped data that produces an error in the browser. For example, in Firefox it says:

Content Encoding Error The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.

Sometimes it works if the string I'm compressing is small without certain characters, but it seems to mess up when there are brackets, etc. In particular, the example text I have below fails.

Is this some kind of character encoding issue? I've tried all sorts of things, but it just doesn't want to work easily.

String text;            
private Socket server;
DataInputStream in = new DataInputStream(server.getInputStream());
PrintStream out = new PrintStream(server.getOutputStream());

while ((text = in.readLine()) != null) {
    // ... process header info
    if (text.length() == 0) break;
}

out.println("HTTP/1.1 200 OK");
out.println("Content-Encoding: gzip");
out.println("Content-Type: text/html");
out.println("Connection: close");


// x is the text to compress
String x = "jsonp1330xxxxx462022184([[";
ByteArrayOutputStream outZip = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(outZip);

byte[] b = x.getBytes(); // Changing character encodings here makes no difference

gzip.write(b);
gzip.finish();
gzip.close();
outZip.close();
out.println();
out.print(outZip);
server.close();
Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
DFx
  • 249
  • 3
  • 14
  • Jusr curious, which Server are you using? Cause settings like these are easier done at server level. eg: for tomcat, you have to enable the `gzip` compression for content type `application/json` and you're done. Or you're actually WRITING a server yourself as your first statement says? – Niks Feb 29 '12 at 07:38
  • 1
    You are at least missing a `CRLF` after the last response header line, before the content. – Bombe Feb 29 '12 at 07:50
  • Thanks for the comments guys - I am actually writing my own server as it's really just a simple task. I open a port and just listen for requests from Javascript JSONP requests. I hope there are no real security implications to that. Regarding the CRLF, I believe I have that near the bottom with: out.println(); – DFx Feb 29 '12 at 13:47
  • look at this answer http://stackoverflow.com/q/6717165/779408 – Bob Feb 23 '13 at 10:44

2 Answers2

2

UPDATE: This is no longer the correct answer, see the answer by @amichair above.

Counter-intuitively, I don't think GZIPOutputStream is suitable for streaming. Try this:

...
out.println("Content-Encoding: deflate");  // NOTICE deflate encoding
out.println("Content-Type: text/html");
out.println("Connection: close");
out.println();
String x = "jsonp1330xxxxx462022184([[";
DeflaterInputStream dis = new DeflaterInputStream(out);
dis.write(x.getBytes("utf-8"));   // JSON is UTF-8
dis.close();
server.close(); //  this a bad idea, the client may not have read the data yet
brettw
  • 10,664
  • 2
  • 42
  • 59
  • Thanks! I had to change DeflaterInputStream to DeflaterOutputStream, but this worked like a charm! What should I be waiting for before closing the server connection? It seems to work this way, but as you say, I don't want premature random drops to occur. – DFx Feb 29 '12 at 13:52
  • Actually, it's probably fine if you flush() the output stream ('out' in your case) before closing the socket. Please mark this answer correct if you are satisfied with it. – brettw Feb 29 '12 at 23:58
1

The accepted answer is incorrect.

GZIPOutputStream can indeed be used to implement gzip content encoding in HTTP. In fact, that's exactly how I implemented it in the JLHTTP lightweight HTTP server. Support for deflate content encoding is identical, just with DeflaterOutputStream used instead. The problem with the above code is simply that it's buggy :-)

  • All println statements (including the one at the bottom) should be replaced with print and an explicit \r\n at the end of the string. This is because the newline characters printed by println are platform-dependent, so e.g. on Linux it will only print a \n, whereas HTTP requires a full CRLF (\r\n).

  • out.print(outZip) basically calls outZip.toString() and prints that out to the stream. However, outZip contains compressed binary data, so converting it to a string (using the arbitrary platform default encoding, no less), is very likely to corrupt the data.

  • The code takes the string, converts it to bytes, compresses them, converts them back to a string, converts them back to bytes and writes them out. Instead, it need only convert the string to bytes, compress them and write them out. You don't need the ByteArrayOutputStream for that either, the GZIPOutputStream can wrap the underlying output stream directly. Just don't forget to flush the print stream after the headers (and trailing CRLF), and only then start with the compressed stream for the body.

  • Closing resources should be done in finally or try-with-resources blocks, and with the correct order and timing.

  • In this sample, the connection is closed at the end of the stream, which is fine. But in general, if you want to keep the connection alive and stream potentially large data with unknown length (you don't know the compressed size in advance), you need to implement the chunked transfer encoding as well (it's pretty simple).

With the code fixed, GZIPOutputStream works like a charm.

However, while great for educational purposes, please note that this is not an HTTP server, even if fixed. You could further read RFC 2616 or 7230 to learn what else HTTP is required to do... but why reinvent the weel? There are a bunch of lightweight embeddable HTTP servers out there that you can use to get the job done properly with little effort, JLHTTP among them.

amichair
  • 4,073
  • 1
  • 21
  • 19
  • This is the correct answer. When I wrote my original answer below, there were still browsers out there (IE6 I'm looking at you) that did not support gzip encoding (but did support deflate). – brettw Jul 03 '18 at 09:49