Yet another update:
This update is to fulfil the following requirement mentioned by the OP as a comment below the answer:
I would like each word as a separate element in the List. With your
solution, all the elements are in the same List entry. For example, I
would like 10000=[sub-Saharan, Africa, and, Australia.]
In order to achieve this, you should not split the string of words.
Demo:
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> customList = Arrays.asList(
"5000 Buruli ulcer is an infectious disease",
"6000 characterized by the development",
"7000 of painless open wounds.",
"8000 The disease largely occurs in",
"10000 sub-Saharan Africa and Australia."
);
TreeMap<String, List<String>> collect = customList.stream().map(s -> s.split(" ", 2))
.collect(Collectors.groupingBy(a -> a[0],
() -> new TreeMap<String, List<String>>(Comparator.comparingInt(Integer::parseInt)),
Collectors.mapping(a -> a[1], Collectors.toList())));
System.out.println(collect);
}
}
Output:
{5000=[Buruli ulcer is an infectious disease], 6000=[characterized by the development], 7000=[of painless open wounds.], 8000=[The disease largely occurs in], 10000=[sub-Saharan Africa and Australia.]}
Or the one based on my original answer:
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> customList = Arrays.asList(
"5000 Buruli ulcer is an infectious disease",
"6000 characterized by the development",
"7000 of painless open wounds.",
"8000 The disease largely occurs in",
"10000 sub-Saharan Africa and Australia."
);
Map<String, List<String>> collect = customList.stream().map(s -> s.split("\\s+", 2))
.collect(Collectors.groupingBy(a -> a[0], TreeMap::new,
Collectors.mapping(a -> a[1], Collectors.toList())));
System.out.println(collect);
}
}
Output:
{10000=[sub-Saharan Africa and Australia.], 5000=[Buruli ulcer is an infectious disease], 6000=[characterized by the development], 7000=[of painless open wounds.], 8000=[The disease largely occurs in]}
The solution suggested by Aniket:
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> customList = Arrays.asList(
"5000 Buruli ulcer is an infectious disease",
"6000 characterized by the development",
"7000 of painless open wounds.",
"8000 The disease largely occurs in",
"10000 sub-Saharan Africa and Australia."
);
TreeMap<String, List<String>> collect = customList.stream().map(s -> s.split(" ", 2))
.collect(Collectors.groupingBy(a -> a[0],
() -> new TreeMap<String, List<String>>(Comparator.comparingInt(Integer::parseInt)),
Collectors.mapping(a -> Arrays.toString(a[1].split(" ")), Collectors.toList())));
System.out.println(collect);
}
}
Output:
{5000=[[Buruli, ulcer, is, an, infectious, disease]], 6000=[[characterized, by, the, development]], 7000=[[of, painless, open, wounds.]], 8000=[[The, disease, largely, occurs, in]], 10000=[[sub-Saharan, Africa, and, Australia.]]}
Original answer:
You were almost there. You can do it as follows:
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> customList = Arrays.asList(
"5000 Buruli ulcer is an infectious disease",
"6000 characterized by the development",
"7000 of painless open wounds.",
"8000 The disease largely occurs in",
"10000 sub-Saharan Africa and Australia."
);
Map<Object, List<Object>> collect = customList.stream().map(s -> s.split("\\s+", 2))
.collect(Collectors.groupingBy(a -> a[0], TreeMap::new,
Collectors.mapping(a -> Arrays.asList(a[1].split("\\s+")), Collectors.toList())));
System.out.println(collect);
}
}
Output:
{10000=[[sub-Saharan, Africa, and, Australia.]], 5000=[[Buruli, ulcer, is, an, infectious, disease]], 6000=[[characterized, by, the, development]], 7000=[[of, painless, open, wounds.]], 8000=[[The, disease, largely, occurs, in]]}