10

I got a program which is using an ArrayList<T> and that type T also implements Comparable<T>. I need to keep that list sorted.

For now, when I insert a new item, I add it to the ArrayList and then invoke Collections.sort(myArrayList).

Is sorting with Collections.sort every time I insert a new item seriously hurt run time complexity?

Is there a better suited data structure I can use to always keep the list sorted? I know of a structure called a PriorityQueue but I also need to be able to get the list's elements by index.

EDIT: In my specific case, inserting a new item happens much less than geting an already existing item, so eventually a good advice could also be to stay with the ArrayList since it got a constant time complexity of getting an item. But if you know of anything else...

CodeMonkey
  • 11,196
  • 30
  • 112
  • 203
  • 6
    `TreeSet` is your friend. – Konstantin Yovkov Jul 01 '15 at 14:16
  • Any particular reason you are not able/permitted to insert any new element into its sorted location? – Stig Tore Jul 01 '15 at 14:19
  • @kocko `TreeSet` does not provide random access either. – Robin Krahl Jul 01 '15 at 14:19
  • 4
    instead of Collections.sort, you could use a binary search to find the insertion point - that would be more efficient. – assylias Jul 01 '15 at 14:21
  • 1
    May be this http://stackoverflow.com/questions/2661065/a-good-sorted-list-for-java will help you – prasad Jul 01 '15 at 14:39
  • For efficiency, search (library binary search) and insert yourself will almost certainly be the fastest in practice. For complexity, I'm pretty sure most jdks will use timsort and that will be O(n) for one element out of order, so it's likely to be the same complexity – moreON Jul 01 '15 at 15:31

4 Answers4

5

It seems like Collection.Sort is actually the way to go here since when the collection is already almost sorted, the sorting will take not longer than O(n) in the worst case.

CodeMonkey
  • 11,196
  • 30
  • 112
  • 203
4

Instead of using Collections.sort(myArrayList) after every insertion you might want to do something a bit smarter as you know that every time you insert an element your collection is already ordered.

Collections.sort(myArrayList) takes 0(nlogn) time, you could do an ordered insert in an ordered collection in O(n) time using Collections.binarySearch. If the collection is ordered in ascending order Collections.binarySearch returns the index of the element you are looking for if it exists or (-(insertion point) - 1). Before inserting an element you can look for it with Collections.binarySearch (O(logn) time). Done that you can derive the index at which inserting the new element. You can then add the element with addAt in O(n) time. The whole insertion complexity is bounded by the addAt so you can do an ordered insert in an ArrayList in O(n) time.

mziccard
  • 2,158
  • 9
  • 17
  • From the documentation of `Collections.sort(List)`: *"This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted."* – Andy Thomas Jul 01 '15 at 14:48
  • _"far fewer than n lg(n)"_ does not necessarily mean O(n) in the worst case. Moreover, as far as I know merge sort also requires a temporary copy of the collection that you do not need with binarySearch (I might be wrong). – mziccard Jul 01 '15 at 14:52
  • As you noted yourself, the list is already sorted but for one element. The worst-case behavior of `Collections.sort()` does not apply. *"If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays ..."* – Andy Thomas Jul 01 '15 at 15:01
  • So from Andy's comment, I understand that Collections.sort() (how do you mark a code in grey like you do in an answer or question? ctrl+k doesn't work here), is actually not that bad since when it's almost sorted it's gonna be O(n) in the worst case? which is similiar if I just iterate over the list and look for the index. – CodeMonkey Jul 02 '15 at 08:41
  • From Java 7 `Collections.sort` implements a modified version of Python's [TimSort](http://svn.python.org/projects/python/trunk/Objects/listsort.txt) which for partially ordered collections needs less than lg(N!) comparisons and as few as N-1. Since your array is always sorted but one element we can soundly assume that the worst case for `Collections.sort` is O(n). – mziccard Jul 02 '15 at 08:49
  • @mziccard change the answer according to the comments and I'll accept it. – CodeMonkey Aug 09 '16 at 18:03
  • O(n) for every insertion sounds very bad. Binary search is O(log n). – Blake Miller Jun 24 '17 at 00:43
4

List is an ordered collection, which means you need to have the ability to access with index. If a collection internally shuffles or sorts the elements, the insertion order wont be same as the order of the elements in the internal data structure. So you cannot depend on index based access anymore. Hence Sun didn't provide a SortedList or a TreeList class. That is why you use Collections.sort(..)

Apache commons-collections does provide a TreeList class but it is not a sorted List and is called so because it uses a tree data structure to store the elements internally. Check its documentation here - http://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/list/TreeList.html

Raman Shrivastava
  • 2,923
  • 15
  • 26
1

A single data structure cannot provide both sorting and index based retrieval. If the input set is limited to a few hundreds or thousands, you can keep two data structures in parallel.

For example, ArrayList for index based retrieval and TreeMap (or priority queue) for sorting.

Amber Beriwal
  • 1,568
  • 16
  • 30
  • 1
    -1 a simple binary tree can easily support O(log n) index-based retrieval by keeping track of how many children are in each sub-tree. Or a sorted array-list with O(1) retrieval but O(n) inserts. – BlueRaja - Danny Pflughoeft Sep 23 '16 at 21:56
  • @BlueRaja-DannyPflughoeft Please elaborate the index based retrieval in tree. – Amber Beriwal Sep 24 '16 at 15:33
  • 2
    There are 20 items in the left subtree and 30 items in the right subtree, and the user wants index 25. Since the tree is ordered, items 1-20 are in the left subtree, 21-50 are in the right. So recurse into the right subtree, looking for index 5. I use exactly this method for my [weighted item randomizer](https://github.com/BlueRaja/Weighted-Item-Randomizer-for-C-Sharp) (specifically [here](https://github.com/BlueRaja/Weighted-Item-Randomizer-for-C-Sharp/blob/master/Weighted%20Randomizer/DynamicWeightedRandomizer.cs#L415-L433)) – BlueRaja - Danny Pflughoeft Sep 24 '16 at 17:29