2

I am reading a txt file into a String buffer and writing the content into a word document using OutputStreamWriter.

The problem is that the formatting is not retained in the document. The spaces and the line breaks are not retained as in the text file. The txt file is formatted properly with spaces, page breaks, and tabs. I want to replicate the txt in word document. Please suggest how can the same formatting be retained. The link to the file is: http://s000.tinyupload.com/index.php?file_id=09876662859146558533.

This is the sample code:

private static String readTextFile() {
    BufferedReader br = null;
    String content = null;
    try {
        br = new BufferedReader(new FileReader("ORDER_INVOICE.TXT"));
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();

        while (line != null) {
            sb.append(line);
            line = br.readLine(); 
            sb.append(System.lineSeparator());
        }
        content = sb.toString();
    } catch (FileNotFoundException e) {
        e.printStackTrace();

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            br.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return content;
}

private static void createDocument(String docName, String content) {
    FileOutputStream fout = null;
    try {
        fout = new FileOutputStream(docName);
        OutputStreamWriter out = new OutputStreamWriter(fout);
        out.write(content);
        out.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}
Megan
  • 876
  • 2
  • 10
  • 20
  • How is a OutputStreamWriter writing to a "Word" document in particular? Are you using a library which creates Microsoft Word documents or are you simply giving the file a .DOC extension? – slipperyseal Mar 29 '16 at 04:56
  • 1
    for a start, you are reading all the lines into the StringBuilder, but not re-adding the line breaks. readLine() returns the string without the carriage returns or line feeds. so try sb.append(line); followed by sb.append('\n'); – slipperyseal Mar 29 '16 at 05:01
  • public String readLine() throws IOException Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed. Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached – Megan Mar 29 '16 at 05:10
  • Still the lines comes differently. The line breaks into different lines further and the content goes to the next line where this is a single line in text file. – Megan Mar 29 '16 at 05:23

2 Answers2

0

Try to change your readTextFile() like this and try.

    BufferedReader br = null;
    String content = null;
    try {
        br = new BufferedReader(new FileReader("ORDER_INVOICE.TXT"));
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();
        while(line != null) {
          content += line + "\n";
          line = br.readLine();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            br.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return content;

Actually if your using java 7, you can use try-with-resources in order to decrease the number of lines in your code.

PHCJS
  • 445
  • 4
  • 19
  • Actually I am already doing sb.append(System.lineSeparator()) after reading the line. The problem is with the lines in the word document going to the next line. I want the lines to be in same formatting as in text file. – Megan Mar 29 '16 at 06:32
  • your coed does not show the thing your telling. sb.append(System.lineSeparator()) – PHCJS Mar 29 '16 at 06:37
0

Try to avoid printing \n chars. Use \r\n for Windows - remember that line separators differ across platforms.

A more reliable way is to use PrintWriter, see How to write new line in Java FileOutputStream

After the discussion in comments:

  • the source file has unix line breaks
  • the output file is expected to have Windows line breaks
  • we shall strip the 0x0c (form feed - i.e. move to next page on the printer) from the source file, as it is non-printable.

    public static void main(String[] args) throws IOException {
        String content = new String(Files.readAllBytes(Paths.get("f:\\order_invoice.txt")))
            .replace("\u000c","");
    
        PrintWriter printWriter=new PrintWriter(new FileWriter("f:\\new_order_invoice.txt"));
    
        for (String line:content.split("\\n")) {
            printWriter.println(line);
        }
    
        printWriter.close();
    }
    

So:

  • read the file as it is into a String
  • get rid of the form feed (0x0c, unicode u000c)
  • split the string at unix line breaks \n
  • write it out line by line using PrintWriter which uses the platform default line ending, i.e. windows cr-lf.

Remember that you can actually do this in one line, using a regexp to replace unix line endings to windows line endings in the string representing the whole file, and use Files.write to write out the whole file in one line. However this presented solution is probably a bit better as it always uses platform native line separators.

enter image description here

Community
  • 1
  • 1
Gee Bee
  • 1,794
  • 15
  • 17
  • I am not using \n chars. – Megan Mar 30 '16 at 00:03
  • Sorry, my bad, wrong scroll :) Use PrintWriter, and put together a simple self container example, e.g. http://tutorials.jenkov.com/java-io/printwriter.html Use its println methods for writing lines with platform native line breaks. It works :) – Gee Bee Mar 30 '16 at 00:07
  • I think the problem is If I compare the spaces before the content in the text file it is 62 bytes in text pad and in word document also it shows 62 spaces but the content goes to the next line. – Megan Mar 30 '16 at 00:17
  • Well, if you share your text file with us, we can surely give a better help. Otherwise we're all just shooting in the dark :) – Gee Bee Mar 30 '16 at 00:24
  • The link to the file is: http://s000.tinyupload.com/index.php?file_id=09876662859146558533 – Megan Mar 30 '16 at 03:30
  • ok, this is a tricky file. It starts with a 0x0c char - which is nonprintable. The rest of the file is using Unix line breaks - 0x0a. Now the next question is that on which platform are you willing to read this file in (Linux or Windows), and which target format do you prefer (Linux or Windows line breaks). Another question is whether you prefer UTF8 output or plain ASCII. UTF8 comes with a byte order mark, which might be misunderstood by some oldschool software. – Gee Bee Mar 30 '16 at 16:27
  • I am using the Windows platform and deployment will be on windows itself. and I don't have any formatting specific requirements as it is a simple report with all English characters and numbers – Megan Mar 30 '16 at 22:52
  • Oh great! Then we know it all! You have a text file with unix line separators, plus some extra 0x0c chars. You need to parse it to lines at 0x0a Unix separators, remove the 0x0c mess, and finally write out in platform specific new lines using PrintWriter. This way you get the file in Windows line separators! Let me know if you need a concrete code example! – Gee Bee Mar 31 '16 at 11:31
  • Could you please share some example. – Megan Mar 31 '16 at 22:57
  • I've added the example to my answer. Does it work well? :) – Gee Bee Apr 02 '16 at 00:21
  • Can you elaborate *what* did not work please? :) I have added a screenshot of the outputs, and for me it seems working well. Did I misunderstood some of your requirements? – Gee Bee Apr 04 '16 at 12:31
  • I need to write the document not the text file. Copying from txt to txt works well with the code I had shared. But I am writing a word document. where the lines go to next lines and do not stay the same was as in the txt file. – Megan Apr 05 '16 at 00:23
  • Come on, you're definitely **not** writing a word document. What you're doing is writing a plain text file with a .doc extension, which is autoconverted by Word. I tried to open it in word, and really the lines are totally off. Not because of the file, but because of the font size. Change the font size to like 8pt, and it is all in place. To write a *Word* document, use e.g. http://www.docx4java.org/trac/docx4j :) – Gee Bee Apr 05 '16 at 19:29