0
  1. With respect to the while loop to keep the search running while (low <= high), is there any consensus if it's better to use a '<' or '<=' since I see instances of both online?

  2. The exit conditions of some binary search templates differ in the sense that some of them use Style A where as soon as target equals the midpoint we return target. B, on the other hand, only happens after the while loop is exited and returns nums[low]. Is this just a matter of personal preference?

A. if (target === nums[mid]) return target;

B. return nums[low] after exiting while loop

Mike Chan
  • 167
  • 1
  • 11
  • It depends on the problem you're trying to solve. Finding an occurrence would use A, finding the first occurrence would use B – Abhinav Mathur Jan 28 '21 at 15:38
  • Duplicate of https://stackoverflow.com/questions/39221303/binary-search-algorithm-implementations, but there isn't a nice long answer like @trincot's about this one specific facet – Matt Timmermans Jan 29 '21 at 01:06
  • You may also enjoy https://stackoverflow.com/questions/26564658/binary-search-and-invariant-relation/26569973#26569973 – Gene Jan 29 '21 at 04:15

1 Answers1

2

The difference in the while condition comes from the difference in the meaning of high: sometimes it is the index after the last element that belongs to the range, and sometimes it represent the index of the last element itself. There is no consensus on which definition of high is to be used.

However, you could align this choice to how ranges are usually represented in the language environment you use. For instance, in Python, range(a, b) excludes b, and in JavaScript arr.slice(a, b) excludes b too. So in those languages I would go for a definition of high that excludes it from the range, and so your while condition would be <. In VBA there are LBound and UBound, which suggest an inclusive definition of high, and so the while condition would be <=.

Either way the while condition should reflect that the current range is non-empty.

The addition of this line in the loop

if (target === nums[mid]) return target;

...is another matter.

First of all, you would not return target, as that doesn't really help the caller (they already know the value of target, since they provided it). It is more appropriate to return the index where the value was found, so that the caller might act on that index:

if (target === nums[mid]) return mid;

For the same reasons, the alternative code would do:

return low;

On the one hand the first code variant offers a possibility to exit the loop early. In the extreme case we might even get lucky, and find a match in the very first iteration of the while loop, and thereby save maybe 10 useless iterations that you would otherwise still do using the alternative code that does not include this if. This may seem a good thing.

But, statistically, if the value is in the list, you'll have like a 50% change to only find it in the last possible iteration (when the range is just 1 wide). And if the value is not in the list, you will certainly have to perform all iterations. So it makes sense to make each iteration do as little as possible. And that is the reason not to include that if statement in the loop, but only execute it once, after the loop.

To choose what is the best strategy for your case, implement them both, and do a benchmark comparison based on realistic data and searches.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • A great, thoughtful answer, but I wonder if there's more consensus about ranges than stated. Isn't it fair to say, in the most common tools, that ranges exclude their max parameter, arrays are indexed from 0 to n-1 and that loops test `< n` more commonly than `<= n-1`? – danh Jan 29 '21 at 01:34
  • @danh I have asked about 30 people (job candidates) to write a binary search, and all but 2 of the successful ones used an inclusive range, and every single one used a ternary condition. – Matt Timmermans Jan 29 '21 at 02:26
  • @MattTimmermans - I just found a bsearch I had lying around, and I'm now embarrassed to admit that it tests for the pointers crossing with `<=`. – danh Jan 29 '21 at 02:40
  • Yes, I would agree that in most common tools, ranges exclude their max parameter. But there are language environments that deviate from that. For instance, if in VBA you declare `Dim arr(5)`, then the array has an index 5. And a basic `For i = 1 To 10` loop will include an iteration with `i=10`. We may criticise this pattern, but in such a language environment it would make sense to remain consistent with it. But like in c-style languages, it makes more sense to have an exclusive max parameter. – trincot Jan 29 '21 at 08:22