0

I have written a program which reads a file word by word, and counts the number of times each word appears. I am doing this using a Scanner. The issue comes up when I try to run my code on different machines. On my Windows 10 machine, the code runs through the entire file perfectly, but when the code is run on my MacBook, the Scanner stops reading the file part way through. I have included the part of the code that is looping through the file below.

Scanner s = new Scanner(theFile);
List<String> words = new LinkedList<>();

while (s.hasNextLine())
{
    String word s.next().replaceAll("\\p{Punct}", "");
    words.add(word.toLowerCase());
}

As I said, on Windows, the entire file is read, but on Mac, only a very small part is read. Also, I am using an SVN repository, and have made sure that the code, as well as the file being read is identical.

tjulich
  • 51
  • 6
  • Can you share the file? – LppEdd Feb 22 '19 at 18:30
  • what is the proper way to attach a file? – tjulich Feb 22 '19 at 18:33
  • If the file is too large to paste in its entirety, then post a small subsection of the file that reproduces the problem. Use the code formatting to render it as a single block (ie. indented with 4 spaces). – Dunes Feb 22 '19 at 18:35
  • It is a text file containing all of "The Adventures of Sherlock Holmes". I have uploaded it to DropBox. [link](https://www.dropbox.com/s/eip4afera0fpuei/ASH.txt?dl=0) – tjulich Feb 22 '19 at 18:40
  • @tjulich thanks. Where does it stop? Could you maybe point at a specific line? – LppEdd Feb 22 '19 at 18:43
  • On my mac, the scanner reads 8359 tokens, while the windows machine reads 106836 tokens, which is what an online word counter reported the word count was for that file. @LppEdd I can try and change the code a bit to see what line it is that stops it. – tjulich Feb 22 '19 at 18:43
  • Changing the code to call scanner.nextLine() inside the loop, while keeping track of the number of iterations through the loop, says that it is calling nextLine 1135 times. – tjulich Feb 22 '19 at 18:46
  • @tjulich line should terminate with "and so made sure that I was" then – LppEdd Feb 22 '19 at 18:51
  • @tjulich I can't see any significant character which could stop the Scanner – LppEdd Feb 22 '19 at 18:53

1 Answers1

0

I ended up switching from using a Scanner, to using a BufferedReader. Instead of reading each word, I read a whole line at a time, and then split the line up into individual words. Still not sure why Scanner was unable to work correctly on both platforms, but this method gave me the results I needed.

tjulich
  • 51
  • 6
  • maybe it's not scanner, maybe it is the file?! (https://stackoverflow.com/q/6373888/592355) – xerx593 Feb 22 '19 at 23:33
  • Possibly, but as I said above, I'm using an SVN repository so all of my code and all of the text files being read are identical across the platforms I'm working on. The file behaves fine on my Windows machine, but not on my MacOS machine. – tjulich Feb 24 '19 at 02:48