3

If I want to add a list to another list, I call target.adAll(source).

But what if I need to process each value from a list first?

I can do something like

for(String s: source) {
  target.add(s.toLowerCase());
}

or using java 8:

source.stream().map(x->x.toLowerCase()).forEachOrdered(target::add);

But either way, I appear to lose the performance benefit of addAll. What is the most performant way of doing this?

ykaganovich
  • 14,736
  • 8
  • 59
  • 96
  • 3
    I guess the JIT should be able to transform the second piece of code to be the first. BTW see http://stackoverflow.com/questions/39495347/whats-the-better-way-to-add-elements-from-a-stream-to-an-existing-list. – kennytm Oct 15 '16 at 07:00
  • There's nothing better you can do. – Louis Wasserman Oct 15 '16 at 15:50

2 Answers2

2

You can use the collect() pattern from Eclipse Collections, collect() is equivalent of map() from JDK.

MutableList<String> source = Lists.mutable.with("A", "B", "C");
MutableList<String> target = source.collect(String::toLowerCase);

If you have an existing target list, you can use the variant of collect() which accepts a target collection:

MutableList<String> source = Lists.mutable.with("A", "B", "C");
source.collect(String::toLowerCase, target);

If you can't change list from List:

List<String> source = Arrays.asList("A", "B", "C");
List<String> target = ListAdapter.adapt(source).collect(String::toLowerCase);

You can also use replaceAll() on List:

List<String> source = Arrays.asList("A", "B", "C");
List<String> target = new ArrayList<>(source);
target.replaceAll(String::toLowerCase);

The above solutions may or may not be more performant, however, they all pre-size the target list.

Note: I am a contributor to Eclipse Collections.

Nikhil Nanivadekar
  • 1,152
  • 11
  • 10
2

Well, what’s the “performance benefit of addAll”? In the end, addAll has to add all elements to the target Collection anyway. In case, the target is an ArrayList, the main benefit is to ensure that there are no unnecessary capacity increase operations.

But note that this happens at the expense of creating a temporary array, see implementation of ArrayList.addAll. To outweigh this expense, you have to add a significant number of elements.

If we are going to add more elements than the target’s current capacity, an increase operation is unavoidable. So addAll offers only a benefit, if the target has to increase the capacity more than once if you simply use add. Since the capacity is raised by factor 1.5 and the capacity is equal or higher than the current size, we have to add at least more elements than half of it’s current size to expect an unnecessary capacity increase operation.

If you really think, this is going to be an issue, it’s easy to fix:

if(target instanceof ArrayList)
    ((ArrayList)target).ensureCapacity(target.size()+source.size());
source.stream().map(String::toLowerCase).forEachOrdered(target::add);

Of course, there are a few corner cases where the costs of add are much higher, e.g. CopyOnWriteArrayList. For this target collection type, collecting into a List via collect(Collectors.toList()) first, followed by addAll might be beneficial. Or you create a simple lazy Collection as intermediate step:

public static <T> Collection<T> lazyCollection(Supplier<? extends Stream<T>> s) {
    return new AbstractCollection<T>() {
        public Iterator<T> iterator() { return s.get().iterator(); }
        public int size() { return (int)s.get().count(); }
        public Object[] toArray() { return s.get().toArray(); }
    };
}

which can be used like:

target.addAll(lazyCollection(() -> source.stream().map(String::toLowerCase)));

This approach would suffer from evaluating the Stream twice, if a collection asks for size() first, before acquiring an Iterator, but afaik, no standard collection does. They either, use the iterator without relying on a predicted size, or resort to toArray(), like ArrayList.addAll or CopyOnWriteArrayList.addAll do.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thank you, that's a very good explanation. Can you explain why LinkedList.addAll implementation calls toArray on the source collection? It made wonder if there's some other "performance magic" I may be missing (like L2 caching), because I don't see any good reason for it otherwise. – ykaganovich Oct 17 '16 at 16:42
  • 1
    Well, if the source is a thread safe collection, acquiring all elements in a single call *might* be atomic, depending on the source’s type, making certain combinations safe, but since such a behavior is not specified, there is no reason to implement it. Also, this ensures that it works, if you add a list to itself, but on the other hand, there’s a cheap way to test this special case, so it’s not necessary to handle all sources that way. Generally, asking for the complete array shows some distrust against the source’s iterator, but we don’t know for sure what drove this decision. – Holger Oct 18 '16 at 08:43