Code to read: binary search

As an apprentice programmer, your growing command of programming practice mostly comes through writing and debugging your own code, but you can also learn quite a lot from reviewing code written by others. The prolific open-source development community provides an amazing wealth of interesting code to use as an object of study. Reading code shows you how various programming techniques are used in context, give insight into industry best practices, and provide opportunities for reflection and critique.

Many standard libraries include an implementation of binary search. Below are a few versions taken from the standard library of Python, Java, C, and C++. Even without strong knowledge of the syntactic details of the particular language, you can generally follow along enough to get a reasonable understanding. I think it is fascinating to compare and contrast how the pro's do it! Here are some points to ponder as you read:

How readable do you find the code to be?
Which of its stylistic conventions aid/impede your understanding?
Is the code commented? Are the comments helpful/sufficient?
In which ways does the code follow the "classic" algorithm and where does it take a different tact? What is different about it and why?
How confident are you that the code is fully correct?

Note: The bsearch in C has to use a primitive and messy syntax (pointer arithmetic) to access the element at an index. When you get to CS107, you'll get to look into this in depth, for now just skim past that part and focus on the algorithm "skeleton".

Python bisect_left

def bisect_left(a, x, lo=0, hi=None):
    """Return the index where to insert item x in list a, assuming a is sorted.
    The return value i is such that all e in a[:i] have e < x, and all e in
    a[i:] have e >= x.  So if x already appears in the list, a.insert(x) will
    insert just before the leftmost x already there.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        # Use __lt__ to match the logic in list.sort() and in heapq
        if a[mid] < x: lo = mid+1
        else: hi = mid
    return lo

Java.util.Arrays.binarySearch

private static int binarySearch(int[] a, int key) {
    int low = 0;
    int high = a.length - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        int midVal = a[mid];

        if (midVal < key)
            low = mid + 1;
        else if (midVal > key)
            high = mid - 1;
        else
            return mid; // key found
    }
    return -(low + 1);  // key not found.
}

Musl bsearch

void *bsearch(const void *key, const void *base, size_t nel, size_t width, int (*cmp)(const void *, const void *))
{
    void *try;
    int sign;
    while (nel > 0) {
        try = (char *)base + width*(nel/2);
        sign = cmp(key, try);
        if (!sign) return try;
        else if (nel == 1) break;
        else if (sign < 0)
            nel /= 2;
        else {
            base = try;
            nel -= nel/2;
        }
    }
    return NULL;
}

Apple Darwin bsearch

/*
 * Perform a binary search.
 *
 * The code below is a bit sneaky.  After a comparison fails, we
 * divide the work in half by moving either left or right. If lim
 * is odd, moving left simply involves halving lim: e.g., when lim
 * is 5 we look at item 2, so we change lim to 2 so that we will
 * look at items 0 & 1.  If lim is even, the same applies.  If lim
 * is odd, moving right again involes halving lim, this time moving
 * the base up one item past p: e.g., when lim is 5 we change base
 * to item 3 and make lim 2 so that we will look at items 3 and 4.
 * If lim is even, however, we have to shrink it by one before
 * halving: e.g., when lim is 4, we still looked at item 2, so we
 * have to make lim 3, then halve, obtaining 1, so that we will only
 * look at item 3.
 */
void *
bsearch(key, base0, nmemb, size, compar)
    const void *key;
    const void *base0;
    size_t nmemb;
    size_t size;
    int (*compar)(const void *, const void *);
{
    const char *base = base0;
    size_t lim;
    int cmp;
    const void *p;

    for (lim = nmemb; lim != 0; lim >>= 1) {
        p = base + (lim >> 1) * size;
        cmp = (*compar)(key, p);
        if (cmp == 0)
            return ((void *)p);
        if (cmp > 0) {  /* key > p: move right */
            base = (char *)p + size;
            lim--;
        }       /* else move left */
    }
    return (NULL);
} 

Glibc bsearch

__extern_inline void *
bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size,
     __compar_fn_t __compar)
{
  size_t __l, __u, __idx;
  const void *__p;
  int __comparison;

  __l = 0;
  __u = __nmemb;
  while (__l < __u)
    {
      __idx = (__l + __u) / 2;
      __p = (void *) (((const char *) __base) + (__idx * __size));
      __comparison = (*__compar) (__key, __p);
      if (__comparison < 0)
        __u = __idx;
      else if (__comparison > 0)
        __l = __idx + 1;
      else
        return (void *) __p;
    }

  return NULL;
}

C++ std::lower_bound

  template<typename _ForwardIterator, typename _Tp, typename _Compare>
    _ForwardIterator
    __lower_bound(_ForwardIterator __first, _ForwardIterator __last,
          const _Tp& __val, _Compare __comp)
    {
      typedef typename iterator_traits<_ForwardIterator>::difference_type
    _DistanceType;

      _DistanceType __len = std::distance(__first, __last);

      while (__len > 0)
        {
          _DistanceType __half = __len >> 1;
          _ForwardIterator __middle = __first;
          std::advance(__middle, __half);
          if (__comp(__middle, __val))
            {
              __first = __middle;
              ++__first;
              __len = __len - __half - 1;
            }
          else
            __len = __half;
        }
      
      return __first;
    }