0

I am reading in a CSV file if Vice Presidents with their names and ages. The issue I am having is trying to split the string on a space and a period.

import java.io.*;
import java.util.Scanner;
import java.util.ArrayList;

public class VicePresidents {

    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub
        String filename = "VicePresidentAges.csv";
        File file = new File(filename);
        Scanner infile = new Scanner(file);

        ArrayList<String> names = new ArrayList<String>();
        ArrayList<Integer> ages = new ArrayList<Integer>();
        
        while (infile.hasNext())
        {
            String line = infile.nextLine();
            String[] tokens = line.split("[" ".]");
            
            System.out.println(tokens[0]);
            //put the tokens into their correct ArrayList
        }
        
        infile.close();

I am getting an error message on the split line. What is weird is that if I split with a comma, the output I get is correct except the first name comes out like this: John Adams. The part that confuses me is the file doesn't have any commas, which is why I am trying to split on a space and a period (middle initial). Not understanding how using a comma with no comma in the file works. My book has using two delimiters as ("[@.]"). But when I try doing line.split("[" ".]"); I get an error "Syntax error on token "".]"", delete this token" *This has been edited to include the error message as requested. Can this please be re-opened?

Ikefactor
  • 27
  • 3

1 Answers1

1

Not very sure it’s exactly what you are after: this will split the line where there’s either a space or a dot:

       String[] tokens = line.split("[ .]");

Edit thanks to Arvind Kumar Avinash: while usually a dot in a regular expression needs to be escaped with a backslash, this is not necessary within square brackets. For a simple demonstration:

    String line = "John Adams 53";
    String[] tokens = line.split("[ .]");
    System.out.println(Arrays.toString(tokens));

This outputs:

[John, Adams, 53]

What went wrong? Java takes this as two strings: "[" ".]". (1) "[" is a string. (2) ".]" is one more string. Since in Java syntax you cannot put one more string immediately after a string, your Eclipse suggested you deleted the second string (which was obviously not the way you wanted to solve the error).

user15358848 adds: The funny characters  at the start of a file seems much like the BOM (Byte Order Marker) that WIndows applications like to use to indicate that a file is UTF-8 formatted - try saving the CSV not using the UTF-8 CSV file type (or check the first 2 bytes and disregard).

Design tip: Putting the names into one ArrayList and the ages into another is a poor design. Instead I suggest you create a Vicepresident class or just a Person class with instance variables (fields) for names and age and just have one ArrayList of such objects. It will be much more manageable in your further processing. Link: Anti-pattern: parallel collections.

Related question: Java string split with “.” (dot) [duplicate].

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • Thank you. Every example of using a period was just split("."). I didn't know you can do that way. Anyone have an idea as to why the first name would come out like this: John Adams – Ikefactor Apr 10 '21 at 18:06
  • It's an excel file that I downloaded on my windows computer that opens just fine? – Ikefactor Apr 10 '21 at 18:09
  • 1
    You do not need to escape `.` with a \ inside a square bracket e.g. `System.out.println(Arrays.toString("He.l l.o".split("[ .]")));`. However, it is necessary if you are using `.` outside the brackets. – Arvind Kumar Avinash Apr 10 '21 at 18:13
  • 1
    @Ole, thank you very much for explaining that. The assignment calls for two Array Lists one for the names and one for the ages. As much as it might be poor design, that is my assignment. – Ikefactor Apr 10 '21 at 18:35
  • I erred in opening the file in excel versus a text editor. The file looks like John Adams, 53 etc. So I split with a comma and can get the name when I print the array. Trying to figure out how to parse the age. – Ikefactor Apr 10 '21 at 18:37