9

I would like to blank out the first line of a text file in Java. This file is several gigabytes and I do not want to do a copy. Using the suggestion from this post, I am attempting to do so using RandomAccessFile, however it is writing too much.

Here is my code:

RandomAccessFile raInputFile = new RandomAccessFile(inputFile, "rw");
origHeaderRow = raInputFile.readLine();
raInputFile.seek(0);
raInputFile.writeChars(Strings.repeat(" ",origHeaderRow.length()));
raInputFile.close();

And if you want some sample input and output, here is what happens:

Before:

first_name,last_name,age
Doug,Funny,10
Skeeter,Valentine,9
Patti,Mayonnaise,11
Doug,AlsoFunny,10

After:

                        alentine,9
Patti,Mayonnaise,11
Doug,AlsoFunny,10

In this example, in most editors the file correctly begins with 24 blank spaces, but 48 characters (including newlines) have been replaced. After pasting into here I see strange question mark things. The double size replacement makes me thing something involving encoding is getting messed up but I tried writeUTF with the same results.

Community
  • 1
  • 1
Aaron Silverman
  • 22,070
  • 21
  • 83
  • 103
  • Just so you know, it's impossible to edit a file "in place" with modern filesystems. A new copy is always made. – toto2 Aug 19 '11 at 20:22
  • What encoding is the file in? 1521? UTF8? UCS2? – Dilum Ranatunga Aug 19 '11 at 20:22
  • @Dough, looks like Jon Skeet is there too as "Skeeter" :) – Kiril Aug 19 '11 at 20:23
  • 3
    @toto2: It's not impossible in this case. Overwriting individual bytes is very simple. Deleting or inserting a byte is the thing that requires copying. – Roland Illig Aug 19 '11 at 20:25
  • @Roland I can't find a reference, but I read that even if you overwrite bytes, a copy is made anyway on most modern filesystems. I was very surprised, so I remembered. I can't remember where it was... maybe it was not reliable. – toto2 Aug 19 '11 at 20:33
  • @Roland I remember I read about this in the context of trying to wipe a hard drive: overwriting the content of files leaves the original files on the disk anyway (with no reference to the file placement, but a hacker could possibly reconstruct some files). – toto2 Aug 19 '11 at 20:41
  • @Roland However maybe if you just overwrite bytes in one block, maybe only one new block is created. I would be easy to test how long it takes to add one byte to file versus changing one byte... but I don't want to do it. – toto2 Aug 19 '11 at 20:50
  • @toto2: Yes, safely deleting data is difficult if you want to do it right, but I don't think that's the point here. I think the original poster just wants to remove the header from the CSV file so that other programs will (hopefully) ignore the first empty line. – Roland Illig Aug 19 '11 at 21:04
  • @Roland I think he does not want to waste time copying the file. I'm not sure if one method is faster than another. – toto2 Aug 19 '11 at 21:05

2 Answers2

8

char in Java is 2 bytes.

use writeBytes instead.

raInputFile.writeBytes(Strings.repeat(" ",origHeaderRow.length()));

From JavaDoc looks exactly what you are looking for.

Alexander Pogrebnyak
  • 44,836
  • 10
  • 105
  • 121
5

As you are writing chars (which in Java are 16-bit) each character uses two bytes. I suggest you try writing the number of bytes you wants otherwise your spaces will turn into nul and space bytes.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130