Another thing about Java binary search routines

Java (J2SE) has binary search routines in the java.util package, as Collections.binarySearch (for searching in List implementations), and Arrays.binarySearch (for searching in arrays).

The JavaDoc for both of these methods states:

Returns:
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(), if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.

Now, Java specifies two's complement arithmetic for its primitive integer types. With two's complement (and using “~” to stand for the bitwise complement operator, as in Java), the identity

-x = ~x + 1

holds, so

-x - 1 = (~x + 1) - 1 = ~x.

So (-(insertion point) - 1) in the quote above could equally well be given as ~(insertion point), which to my eyes at least is a more straightforward expression. And when reading up on the equivalent facility in .NET - System.Collections.ArrayList.BinarySearch - I was interested to find:

Return Value

The zero-based index of value in the sorted ArrayList, if value is found; otherwise, a negative number, which is the bitwise complement of the index of the next element that is larger than value or, if there is no larger element, the bitwise complement of Count.

It occurred to me that maybe Sun thinks that typical Java programmers cannot handle bitwise operators. But the results of javap -c java.util.Arrays shows that, in the relevant cases, the Sun J2SE implementation actually does -(low + 1), just as shown in Josh Bloch's recent entry. So I suppose this was simply copied from Bentley's Programming Pearls, with the trivial expansion from -(low + 1) to (-(insertion point) - 1) by whoever wrote the JavaDoc for those methods.

Something else for the VM trivia fans: The Java Virtual Machine does not actually have a bitwise complement instruction corresponding to the ~ operator in the Java language. Instead, the compiler generates an iconst_m1, ixor sequence (i.e. x ^ -1). In contrast, the CIL instruction set used by .NET (as described by ECMA-335) does have a not instruction which performs a bitwise complement. (Neither VM has the specification of its instructions online in a convenient form for linking!)

Comment from crimsonflaw

hi david.. i know next to nothing of the fancy equations and the noble pursuits. I find the term'' lassitude and longitude'' so beautiful... i congratulate you on coming up with this twist of meaning....

Comment from Eugene

But the results of javap -c java.util.Arrays shows that, in the relevant cases, the Sun J2SE implementation actually does -(low + 1)

For what it's worth, they do provide the source code:
$ unzip -l /usr/java/jdk1.5.0_07/src.zip | grep Arrays
116729 05-03-06 07:08 java/util/Arrays.java

Neither VM has the specification of its instructions online in a convenient form for linking!

Now that depends on your definition of "convenient". Here is, for example, the specification for JVM's ixor.

Comment from David

For what it's worth, they do provide the source code:
$ unzip -l /usr/java/jdk1.5.0_07/src.zip | grep Arrays
116729 05-03-06 07:08 java/util/Arrays.java


But if I read that, can I work on GNU classpath later?

Now that depends on your definition of "convenient". Here is, for example, the specification for JVM's ixor.

Is there an instruction index somewhere containing that link, or did you look at the HTML to find the anchor?

Comment from eugene

But if I read that, can I work on GNU classpath later?

Standard libraries source code is apparently covered by the same license agreement as the rest of the JDK. I do not see any viral clauses in it.

At the same time, the license agreement explicitly prohibits decompiling their software. Russian law allows it for various reasonable purposes, so you will probably get away with it. :)

Is there an instruction index somewhere containing that link, or did you look at the HTML to find the anchor?

The latter. With anchor being simpy the instruction name it does not seem too complicated anyways. And there is an index in the VM spec with links to instructions' definitions among other things.

(Yes, my comments are nit-picking. But so is the original posting. :) )