0

Given the following data:

[4]
[5,  8]
[9,  12, 20]
[10, 15, 23, 28]
[14, 19, 31, 36, 48]
[15, 22, 34, 41, 53,  60]
[19, 26, 42, 49, 65,  72,  88]
[20, 29, 45, 54, 70,  79,  95,  104]
[24, 33, 53, 62, 82,  91,  111, 120, 140]
[25, 36, 56, 67, 87,  98,  118, 129, 149, 160]
[29, 40, 64, 75, 99,  110, 134, 145, 169, 180, 204]
[30, 43, 67, 80, 104, 117, 141, 154, 178, 191, 215, 228]
[34, 47, 75, 88, 116, 129, 157, 170, 198, 211, 239, 252, 280]
[35, 50, 78, 93, 121, 136, 164, 179, 207, 222, 250, 265, 293, 308]
[Etc.]

What could be the best searching algorithm with the most optimal Time Complexity for finding a given number?

  • The rows are sorted
  • The columns are sorted
  • A number may occur more than once

Extra info:

Suppose we are looking for the number 26:

  • Due to order, this means we can eliminate the first 3 rows and the remaining columns to the right.

  • Due to order, this also means we can ignore every row after row=11.

Which results to this:

[10, 15, 23]
[14, 19, 31]
[15, 22, 34]
[19, 26, 42]
[20, 29, 45]
[24, 33, 53]
[25, 36, 56]
[29, 40, 64]

My current algorithm has a time complexity of O(x log(y)) where x is the amount of columns and y is the size for the Binary Search algorithm for each column.

I'm looking for something faster because I'm dealing with huge amount of data.

Currently I'm using BST on every column, but could I use BST on rows aswell? maybe achieving a O(log(x) log(y))?

ruakh
  • 175,680
  • 26
  • 273
  • 307
Hades
  • 865
  • 2
  • 11
  • 28

2 Answers2

2

It can be done in O(x)

Let's call the element we are trying to find n

Start with the bottom left element.

For each element we search through (let's call it e):

  1. if e == n: we found it

  2. if e < n: move to the right

Justification:

All elements to the left of e, including the column that e is in, are less than e. Those elements cannot == n and can be eliminated.

  1. if e > n: move up

Justification:

All elements below e are greater than e and can be eliminated. What about the values less than e to the left of e? Can't those be == n? No. For e to make those moves to the right and have values to it's left, those values would have been already eliminated in step 2

Repeat until n found or index out of bounds in which case such an element does not exist.

Time complexity:

The worst case scenario is if the element isn't in the array and we have an index out of bounds. This occurs at the main diagonal and the total distance to the right and total distance up to any element on the long diagonal always sums to x.

Primusa
  • 13,136
  • 3
  • 33
  • 53
  • But this is slower? – Hades Jan 03 '19 at 21:52
  • It's not, you lose the `log y` factor, this is `O(x)`. In the initial edit it was `O(N)`, that N meant the number of rows not the number of total elements. – Primusa Jan 03 '19 at 21:52
  • I think it's `O(x + y)`. Worst case you check a value in each row and each column. – AShelly Jan 03 '19 at 21:54
  • I'm pretty sure this is slower, I presume that your O(x) is where x = m*n where m = rows and n = columns. Think about it, you start at bottom left, if the value is top right, it will take the amount of columns and the amount of rows to get there, ergo x = m*n which is slower than O(m log(n)) ? – Hades Jan 03 '19 at 21:55
  • Right `O(x + y)` – Primusa Jan 03 '19 at 21:56
  • I stand correct O(m+n) , still tho :P – Hades Jan 03 '19 at 21:57
  • That's faster than `O(mlog(n))` – Primusa Jan 03 '19 at 21:57
  • @Primusa yep, but m+n is still a lot, got anything faster? ^^ – Hades Jan 03 '19 at 21:58
  • Well if you don't mind an `O(NM)` preprocessing step you can get an `O(1)` in operation for each subsequent query with a gigantic hash table :P – Primusa Jan 03 '19 at 22:00
  • @AShelly hold up, it's not a square array but instead diagonal so that worst case is just if it's on the long diagonal, which is just `O(x)` – Primusa Jan 03 '19 at 22:04
  • The `2. if e < n: move to the right` iteration can be replaced with binary search, no? Same for the `3. if e > n: move up`. And I might be wrong, but this would become exactly O(log(x) + log(y)) – Georgi Gerganov Jan 03 '19 at 22:06
  • 1
    @GeorgiGerganov that's true but the worst case for that is still `O(x)`, which is u gotta move up and to the right once each time – Primusa Jan 03 '19 at 22:09
1

You can find the bottom left of your trimmed array with a binary search of the first column, and the top right with a binary search of the last column of each row.

From there, the problem degenerates to How do I search for a number in a 2d array sorted left to right and top to bottom? which is well-studied in the linked question. The best algorithm is dependent on the shape of the result.

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • +1, though I think you're actually better off skipping the "You can find the bottom left of your trimmed array with a binary search of the first column" part: finding the top right is enough to reduce this problem to the one that you link to, so you should use the strategies there instead of trying to pre-shrink the problem in a way that might turn out to not be optimal. – ruakh Jan 04 '19 at 17:10