You get to pick the invariant. It's a skill learned from practice. Even with experience, it usually involves some trial and error. Pick one. See how it goes. Look for opportunities to choose a different one that will require less work to maintain. The invariant you choose can make a significant difference in your code's complexity and/or efficiency.
There are at least four reasonable choices for invariant in a binary search:
a[lo] < target < a[hi]
a[lo] <= target < a[hi]
a[lo] < target <= a[hi]
a[lo] <= target <= a[hi]
You'll usually see the last one because it's the easiest to explain and doesn't involve tricky initialization with out-of-range array indices, which the others do.
Now there is a reason to use an invariant like a[lo] < target <= a[hi]
. If you want always to find the first of a repeated series of the target, this invariant will do it O(log n) time. When hi - lo == 1
, hi
points to the first occurrence of the target.
int find(int target, int *a, int size) {
// Establish invariant: a[lo] < target <= a[hi] || target does not exist
// We assume a[-1] contains an element less than target. But since it is never
// accessed, we don't need a real element there.
int lo = -1, hi = size - 1;
while (hi - lo > 1) {
// mid == -1 is impossible because hi-lo >= 2 due to while condition
int mid = lo + (hi - lo) / 2; // or (hi + lo) / 2 on 32 bit machines
if (a[mid] < target)
lo = mid; // a[mid] < target, so this maintains invariant
else
hi = mid; // target <= a[mid], so this maintains invariant
}
// if hi - lo == 1, then hi must be first occurrence of target, if it exists.
return hi > lo && a[hi] == target ? hi : NOT_FOUND;
}
NB this code is untested, but ought to work by the invariant logic.
The invariant with two <=
's will only find some instance of the target. You have no control over which one.
This invariant does required initialization with lo = -1
. This adds a proof requirement. You must show that mid
can never be set to -1
, which would cause out-of-range access. Fortunately this proof is not hard.
The article you cited is a poor one. It has several mistakes and inconsistencies. Look elsewhere for examples. Programming Pearls is a good choice.
Your proposed change is correct but may be a bit slower because it replaces a test that runs only one time with one that runs once per iteration.