3

I'm trying to read an InputStream of String tokens with a Scanner. Every token ends with a comma ,. An empty string "" is also a valid token. In that case the whole token is just the comma that ends it.

The InputStream is slowly read from another process, and any tokens should be handled as soon as they have been fully read. Therefore reading the whole InputStream to a String is out of the question.

An example input could look like this:

ab,,cde,fg,

If I set the delimiter of the Scanner to a comma, it seems to handle the job just fine.

InputStream input = slowlyArrivingStreamWithValues("ab,,cde,fg,");

Scanner scan = new Scanner(input);
scan.useDelimiter(Pattern.quote(","));
while (scan.hasNext()) {
    System.out.println(scan.next());
}

output:

ab

cde
fg

However the problems appear when the stream begins with an empty token. For some reason Scanner just ignores the first token if it is empty.

/* begins with empty token */
InputStream input = slowlyArrivingStreamWithValues(",ab,,cde,fg,");
...

output:

ab

cde
fg

Why does Scanner ignore the first token? How can I include it?

Tuupertunut
  • 741
  • 7
  • 9
  • 1
    As a side note, if you're trying to implement a CSV parser, STOP. There are endless parsers out there ([apache-csv](https://commons.apache.org/proper/commons-csv/), to name one). It's going to get very very complicated on some edge cases. – Mordechai Jan 02 '18 at 04:43
  • Possible duplicate of [Tokenising a String containing empty tokens](https://stackoverflow.com/questions/12395862/tokenising-a-string-containing-empty-tokens) – NineBerry Jan 02 '18 at 04:44

2 Answers2

1

Try using a lookbehind as the pattern:

(?<=,)

and then replace comma with empty string with each token that you match. Consider the following code:

String input = ",ab,,cde,fg,";
Scanner scan = new Scanner(input);
scan.useDelimiter("(?<=,)");
while (scan.hasNext()) {
    System.out.println(scan.next().replaceAll(",", ""));
}

This outputs the following:

(empty line)
ab

cde
fg

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

It's easier if you write it yourself, without using Scanner:

static List<String> getValues(String source){
    List<String> list = new ArrayList<String>();
    for(int i = 0; i < source.length();i++){
        String s = "";
        while(source.charAt(i) != ','){
            s+=source.charAt(i++);
            if(i >= source.length()) break;
        }
        list.add(s);
    }
    return list;
}

For example, if source = ",a,,b,,c,d,e", the output will be "", "a", "", "c", "d", "e".