17

I was attempting to create a faster version of String.equals() method and started by simply copying it. The result I found was quite confusing. When I ran the copy pasted version, timed it and compared it against the JVM one, the JVM version was faster. The difference ranged from 6x to 34x faster! Simply put, the longer the string, larger is the difference.

boolean equals(final char a[], final char b[]) {
    int n = a.length;
    int i = 0;

    while (n-- != 0) {
        if (a[i] != b[i]) return false;
        i++;
    }
    return true;
}

public static void main() throws Exception {
    String a = "blah balh balh";
    String b = "blah balh balb";

    long me = 0, jvm = 0;

    Field value = String.class.getDeclaredField("value");
    value.setAccessible(true);

    final char lhs[] = (char[]) value.get(a);
    final char rhs[] = (char[]) value.get(b);
    for (int i = 0; i < 100; i++) {
        long t = System.nanoTime();
        equals(lhs, rhs);
        t = System.nanoTime() - t;
        me += t;
    }

    for (int i = 0; i < 100; i++) {
        long t = System.nanoTime();
        a.equals(b);
        t = System.nanoTime() - t;
        jvm += t;
    }

    System.out.println("me  = " + me);
    System.out.println("jvm = " + jvm);
}

Output:

me  = 258931
jvm = 14991

The equals() method I wrote is a copy-pasted version of the one found in String.equals() method. Why is the JVM version faster than its copy-pasted version? Isn't it effectively the same?

Could someone explain why I see such visible differences?

PS: If you wish to see large differences, you could create long (really, really long) strings with just one character differing at the end.

Gus
  • 3,534
  • 1
  • 30
  • 39
Unmanned Player
  • 1,109
  • 9
  • 30
  • 1
    Just a guess, but i think it might be related to the runtime optimization of the JVM. The native version is likely used a lot internally. Methods which are used frequently are more likely to be optimized by the JVM. – Philipp Mar 23 '13 at 12:56
  • 1
    I think `jvm` optimizes `String.equals` to equivalent assembler instruction based on its name rather than code. And probably also inlines it. When you copy the code, optimizations are lost. –  Mar 23 '13 at 12:57
  • 1
    @Phillip: I guessed that too. If that's the case then JVM is treating its own classes specially! – Unmanned Player Mar 23 '13 at 12:59
  • 2
    @doublep: Isn't String.equals() a non-static method? – Unmanned Player Mar 23 '13 at 13:00
  • 2
    Micro-tests like this usually aren't meaningful. I don't have an explanation, but I think this is one of those. Benchmarks can be misleading. – duffymo Mar 23 '13 at 13:00
  • @duffymo: There is a bigger picture to this simple test. This attempt is rougly equivalent to my baby step. – Unmanned Player Mar 23 '13 at 13:02
  • I actually achieve better results for the `me` version, not the `jvm`; this is expected since the JDK version checks that is in the same class and if they point to the same object (which happens often for Strings since most of them rely in a pool). – Random42 Mar 23 '13 at 13:06
  • 1
    Switch the order of operations around: perform the "jvm" measurement first, then the "me" measurement, then things change. Reminds me of warming up the cache first on databases. Original run: me = 142638, jvm = 67669; order switched: me = 40926, jvm = 47815 – Glenn Mar 23 '13 at 13:12
  • @Gelnn: Switched it but still the same. me = 261359, jvm = 15802. Isn't cache supposed to be warmed up after the first few iterations? – Unmanned Player Mar 23 '13 at 13:17
  • @m3th0dman: Didn't get the JDK version part. I use 1.7.0-17. And just so that there are two different objects even in pool, I changed one character at end. – Unmanned Player Mar 23 '13 at 13:19
  • @Eshan The equals from class String is implemented in the JDK String class; it is not a native method. I have used 1.6 u 37. – Random42 Mar 23 '13 at 14:00
  • @Eshan: `String` is a `final` class, so compiler can often still optimize if it knows an object is a string. I.e. `s.equals` when `s` is a `String` is bound to be this exact method, cannot be overriden. –  Mar 23 '13 at 14:26
  • One of the fundamental checks in your version is missing: the string length. If String b is larger then you will incorrectly return true if the strings are the same up until that point. This is not exactly a copy of the official version. – Sebastiaan van den Broek Apr 13 '19 at 05:41
  • @SebastiaanvandenBroek I think [so](http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/b2317f5542ce/src/share/classes/java/lang/String.java#l1024). The sample there clearly shows the meat of it which is good enough to convey my very specific message here. – Unmanned Player Jun 03 '19 at 02:10

3 Answers3

17

Why is the JVM version faster than it's copy-pasted version. Isn't it effectively the same?

Surprisingly, it isn't.

String comparison is such an ubiquitous operation that it is almost certainly the case that your JIT compiler has an intrinsic for String.equals(). This means that the compiler knows how to generate specially-crafted machine code for comparing strings. This is done transparently to you, the programmer, when you use String.equals().

This would explain why String.equals() is so much faster than your method, even if superficially they appear identical.

A quick search finds several bug reports that mention such an intrinsic in HotSpot. For example, 7041100 : The load in String.equals intrinsic executed before null check.

The relevant HotSpot source can be found here. The functions in question are:

  848 Node* LibraryCallKit::make_string_method_node(int opcode, Node* str1, Node* cnt1, Node* str2, Node* cnt2) {

and

  943 bool LibraryCallKit::inline_string_equals() {
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Even if the JIT doesn't have an intrinsic, the JVM may. It's possible to replace a JDK method with a native method "under the covers", and I wouldn't bet that it's not done. – Hot Licks Mar 23 '13 at 13:23
  • @HotLicks: Does that mean I can write some JNI under the hood so that it uses SIMD like instruction set to perform comparisions which could beat intrinsics or JVM/JDK natives? – Unmanned Player Mar 23 '13 at 13:28
  • @Eshan: You could try. Of course, there is no guarantee that you will succeed. – NPE Mar 23 '13 at 13:32
2

Hotspot allows developers to provide a native implementation of a method in addition of the Java implementation. The Java code is swapped out at runtime and replaced by the optimized version. It is called an intrinsic. Few hundred of methods from base classes are optimized by intrinsics.

By looking at the OpenJDK source code you can see the x86_64 implementation of String.equals. You can also look into vmSymbols to get the list of all instrinsics (search for do_intrinsic)

Clément MATHIEU
  • 3,030
  • 23
  • 25
-1

As you know, JAVA truly is neither a compiler based nor interpreter based language. By this I mean is java uses both i.e. a compiler to convert source code into an intermediate form which is then interpreted at runtime.

It marks the LoC (Line of Code) executed to know which part of the code is run more frequently. As soon as JAVA figures out a part of code running more than a given threshold, it marks it hot and sends this piece of code to the compiler on the fly for a better-optimized version to be run when asked for the next time. This is called JIT (Just in Time)

Now, since both the code are exactly similar, JAVA HotSpot should have optimized both the methods exactly the same and hence same execution time. Sadly, this isn't the case.

As soon as, JIT figures out equals() method is hot and is being called too frequently, it goes for a special optimization at the hardware level.
Yes, There is an entirely new set of instructions created by Intel to speed up text processing code. This means, there is a separate instruction to make Strings.equals() method more faster than your copy-pasted equals method.

Now the question is how does this happen. Well, this is simple, Whenever String.equals() is warm (i.e. used more often but not heavily) the compiler does the same optimization as with the copy-pasted method. But as the equal() method gets hot, JIT directly uses the new instruction set for string comparison and Hence fast execution.

swayamraina
  • 2,958
  • 26
  • 28
  • like? As far as I understand JIT, this is how it works – swayamraina Aug 01 '18 at 06:59
  • "JAVA truly is neither a compiler" -- Without a compiler, you can't translate source code to byte code. What is `javac` then? :) warm, cold is not explained.. This is not your university answer sheet and I'm not your professor.. Please study the technology first, then answer. – Unmanned Player Aug 14 '18 at 03:52
  • By this, I meant was java uses both i.e. a compiler to convert source code into an intermediate form which is then interpreted at runtime – swayamraina Apr 13 '19 at 05:29