4

I have a list like that

List<String> customList = Arrays.asList(
   "5000  Buruli ulcer is an infectious disease",
   "6000  characterized by the development",
   "7000  of painless open wounds.",
   "8000  The disease largely occurs in",
   "10000  sub-Saharan Africa and Australia."
);

I want to convert that List into a TreeMap<String, List<String>> like that:

"5000", ["Buruli", "ulcer", "is", "an", "infectious", "disease"]
"6000", ["characterized", "by", "the", "development"]
// etc

My code so far:

TreeMap<String, List<String[]>> collect = customList.stream()
      .map(s -> s.split("  ", 2))
      .collect(Collectors
         .groupingBy(a -> a[0], TreeMap::new, Collectors.mapping(a -> a[1].split(" "), Collectors.toList())));

I have two problems.

  1. First is that TreeMap::new is probably not working because the order is not the same as the original's List.
  2. Second is that I don't seem to find a way to make that List<String[]> into a List<String>.

Any ideas?

Thanos M
  • 604
  • 6
  • 21

2 Answers2

7

You want to use a LinkedHashMap to preserve original order. So your code should look like this:

Map<String, List<String>> collect = customList.stream()
    .map(s -> s.split(" +"))
    .collect(Collectors.toMap(a -> a[0], a -> Arrays.asList(a)
        .subList(1, a.length), (a, b) -> a, LinkedHashMap::new));

If your keys are not unique, you can use the grouping collector with something like this (Collectors.flatMapping requires Java 9+):

collect = customList.stream()
    .map(s -> Arrays.asList(s.split(" +")))
    .collect(Collectors.groupingBy(l -> l.get(0), 
        LinkedHashMap::new, 
        Collectors.flatMapping(l -> l.stream().skip(1), Collectors.toList())));
ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • This yields the expected output. – MC Emperor Mar 29 '21 at 10:11
  • Alternatively, if the original list could contain clashes, that `(a, b) -> a` could be a list merging function (don't foget that `Arrays.asList()` returns a non-extendable list). – Thomas Mar 29 '21 at 10:14
  • 2
    @Thomas Yes, I'm assuming the keys are unique. Otherwise, groupBy, as Arvind suggests, is the way to do it (or of course a proper merge function in toMap). – ernest_k Mar 29 '21 at 10:17
  • Note that with a merge function (the 3rd parameter `Collectors.toMap()` takes) you'd do the same. `Collectors.groupingBy()` actually moves merging downstream, i.e. the `toList()` collector would handle that. – Thomas Mar 29 '21 at 10:21
  • I had almost the same in mind as this code. However, I would have preserved the `map(s -> s.split(" ", 2))`, and then as `valueMapper` function `a -> Arrays.asList(a[1].split(" "))` instead of `a -> Arrays.asList(a).subList(1, a.length)`. Now I am slightly curious about whether it would matter or not. – MC Emperor Mar 29 '21 at 10:22
  • @MCEmperor I wanted to avoid calling `String.split()` more than once :) – ernest_k Mar 29 '21 at 10:23
  • 1
    @ernest_k That seems like an understandable reason, and it also implies that you think it *does* matter. ;-) I can imagine that, for large input strings, you indeed *don't* want to split twice. – MC Emperor Mar 29 '21 at 10:25
  • Is it possible to show the answer using `groupingBy`? I am curious about how can I use `Arrays.asList(a).subList(1, a.length)` with `groupingBy` function – Thanos M Mar 29 '21 at 12:31
  • @ThanosM It gets a bit hard to read. Here's one solution: `Map> collect = customList.stream() .map(s -> Arrays.asList(s.split(" +"))) .collect(Collectors.groupingBy(a -> a.get(0), LinkedHashMap::new, Collectors .collectingAndThen(Collectors.mapping(l -> l.subList(1, l.size()), Collectors.toList()), l -> l.stream() .flatMap(List::stream) .collect(Collectors.toList()))));`. It may be simplified a lot if you're using Java 11. – ernest_k Mar 29 '21 at 12:42
5

Yet another update:

This update is to fulfil the following requirement mentioned by the OP as a comment below the answer:

I would like each word as a separate element in the List. With your solution, all the elements are in the same List entry. For example, I would like 10000=[sub-Saharan, Africa, and, Australia.]

In order to achieve this, you should not split the string of words.

Demo:

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> customList = Arrays.asList(
                   "5000  Buruli ulcer is an infectious disease",
                   "6000  characterized by the development",
                   "7000  of painless open wounds.",
                   "8000  The disease largely occurs in",
                   "10000  sub-Saharan Africa and Australia."
                );
        
        TreeMap<String, List<String>> collect = customList.stream().map(s -> s.split("  ", 2))
                .collect(Collectors.groupingBy(a -> a[0],
                        () -> new TreeMap<String, List<String>>(Comparator.comparingInt(Integer::parseInt)),
                        Collectors.mapping(a -> a[1], Collectors.toList())));
        
        System.out.println(collect);
    }
}

Output:

{5000=[Buruli ulcer is an infectious disease], 6000=[characterized by the development], 7000=[of painless open wounds.], 8000=[The disease largely occurs in], 10000=[sub-Saharan Africa and Australia.]}

Or the one based on my original answer:

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> customList = Arrays.asList(
                   "5000  Buruli ulcer is an infectious disease",
                   "6000  characterized by the development",
                   "7000  of painless open wounds.",
                   "8000  The disease largely occurs in",
                   "10000  sub-Saharan Africa and Australia."
                );

        Map<String, List<String>> collect = customList.stream().map(s -> s.split("\\s+", 2))
                .collect(Collectors.groupingBy(a -> a[0], TreeMap::new,
                        Collectors.mapping(a -> a[1], Collectors.toList())));

        System.out.println(collect);
    }
}

Output:

{10000=[sub-Saharan Africa and Australia.], 5000=[Buruli ulcer is an infectious disease], 6000=[characterized by the development], 7000=[of painless open wounds.], 8000=[The disease largely occurs in]}

The solution suggested by Aniket:

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> customList = Arrays.asList(
                   "5000  Buruli ulcer is an infectious disease",
                   "6000  characterized by the development",
                   "7000  of painless open wounds.",
                   "8000  The disease largely occurs in",
                   "10000  sub-Saharan Africa and Australia."
                );
        
        TreeMap<String, List<String>> collect = customList.stream().map(s -> s.split("  ", 2))
                .collect(Collectors.groupingBy(a -> a[0],
                        () -> new TreeMap<String, List<String>>(Comparator.comparingInt(Integer::parseInt)),
                        Collectors.mapping(a -> Arrays.toString(a[1].split(" ")), Collectors.toList())));

        System.out.println(collect);
    }
}

Output:

{5000=[[Buruli, ulcer, is, an, infectious, disease]], 6000=[[characterized, by, the, development]], 7000=[[of, painless, open, wounds.]], 8000=[[The, disease, largely, occurs, in]], 10000=[[sub-Saharan, Africa, and, Australia.]]}

Original answer:

You were almost there. You can do it as follows:

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> customList = Arrays.asList(
                   "5000  Buruli ulcer is an infectious disease",
                   "6000  characterized by the development",
                   "7000  of painless open wounds.",
                   "8000  The disease largely occurs in",
                   "10000  sub-Saharan Africa and Australia."
                );

        Map<Object, List<Object>> collect = customList.stream().map(s -> s.split("\\s+", 2))
                .collect(Collectors.groupingBy(a -> a[0], TreeMap::new,
                        Collectors.mapping(a -> Arrays.asList(a[1].split("\\s+")), Collectors.toList())));

        System.out.println(collect);
    }
}

Output:

{10000=[[sub-Saharan, Africa, and, Australia.]], 5000=[[Buruli, ulcer, is, an, infectious, disease]], 6000=[[characterized, by, the, development]], 7000=[[of, painless, open, wounds.]], 8000=[[The, disease, largely, occurs, in]]}
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
  • are there any advantages to using `groupBy` instead of `toMap` if knowing the keys are unique? – Thanos M Mar 29 '21 at 10:20
  • @ThanosM - There are many excellent answers already available describing the difference between these two e.g. you can check [this](https://stackoverflow.com/q/45231351/10819573). – Arvind Kumar Avinash Mar 29 '21 at 10:23
  • I would like each word as a separate element in the `List`. With your solution, all the elements are in the same `List` entry. For example, I would like `10000=[sub-Saharan, Africa, and, Australia.]` – Thanos M Mar 29 '21 at 12:29
  • 1
    @ThanosM - I was out for lunch. Just saw your comment and posted an update as the solution to this requirement. Feel free to comment in case of any further doubt/issue. – Arvind Kumar Avinash Mar 29 '21 at 13:51