Benchmarking Java with the Richards benchmark

Richards is an interesting medium-sized language benchmark (400-500 lines). It simulates the task dispatcher in the kernel of an operating system. The original version was written in BCPL by Martin Richards at Cambridge University, England. It has since been translated into many languages, including C, C++ and Smalltalk.

One translation into the Java programming language is due to Jonathan Gibbons, based on a C version by Mick Jordan. This character of this translation is very similar to that of the original BCPL program by Martin Richards, and does not take advantage of the object-oriented features of Java.

I wrote a different version in Java, derived from a Smalltalk implementation by L. Peter Deutsch. Like the Smalltalk version, this is much more object-oriented, and has quite different performance characteristics. This version, in C++ (by Urs Hölzle), Smalltalk and Self, has been used as the basis for comparison in all the papers published by the Self project.

After creating the latter version, I was intrigued by the performance difference between the two Java translations, and decided to study the two versions more closely. In the end I created seven different versions of this benchmark, each exhibiting a different stylistic element and performance characteristic.

The different versions span a range in object-orientedness. One version is the `natural' translation from C code, without particular attention to source-level optimization. Another version uses features of the Java language to improve performance. A third is like the C-based version except that the program state is not as highly encoded.

The other four versions are based on the Smalltalk original, from which a C++ version was also derived. One is a `natural' translation from the C++, taking the most obvious mapping from C++ to Java. The others are closer to the Smalltalk version, in that all instance variables are accessed via methods; they vary in how those methods are declared.

What does Richards do?

At the heart of Richards is a task dispatcher (Richards.schedule). Tasks come in 4 different flavors, each represented by a class (DeviceTask, HandlerTask, IdleTask, WorkTask, all subclasses of Task). Each kind of task has an associated work function (fn). At startup (Richards.run), a particular task mix is created, and then the tasks are scheduled, each having its work function invoked. The work functions manipulate work packets and packet queues. At the end of the benchmark (in Richards.run) the number of queued and held packets is checked against the correct value to assist in verifying that the benchmark ran correctly.

1) The Gibbons translation from C (richards_gibbons)

The first version is due to Jonathan Gibbons, and is close to the original BCPL code in operation. The C version is by Mick Jordan. It is very close to the original version in BCPL, and is not very object-oriented. It uses a switch statement (line 127) to distinguish between one of 8 states, encoded in an integer variable (Task.state).

2) Using final to avoid call overhead (richards_gibbons_final)

All calls in richards_gibbons are virtual; richards_gibbons_final defines classes and methods to be final where possible, to diminish the call overhead.

3) Gibbons, but without a `switch' (richards_gibbons_no_switch)

The third version is a simple modification of the first: the 8-valued integer (Task.state) is replaced with three booleans (packetPending, taskWaiting and taskHolding), and the switch in Richards.schedule is replaced by tests of these variables and an extra function, Richards.runTask. This inner loop now is used in all subsequent versions.

4) The Deutsch version without accessors (richards_deutsch_no_acc)

The implementation by Deutsch moves some of the task state into separate objects (DeviceTaskRec, IdleTaskRec, HandlerTaskRec and WorkerTaskRec, all subclasses of TaskRec). This version follows that scheme, requiring more, finer classes and objects, and more calls.

The state within these objects is represented by public instance variables, which are directly accessed where necessary.

5) Deutsch with virtual accessors (richards_deutsch_acc_virtual)

In the remaining versions based on Deutsch, all object state is accessed via methods, according to good encapsulation practices.

This is a natural translation from the C++ version of Deutsch, which in turn was based on the Smalltalk version. Note, however, that where C++ methods default to being statically bound, Java methods default to being virtually bound. This is also the closest to the versions in Smalltalk and Self.

6) Deutsch with final methods and classes (richards_deutsch_acc_final)

In an attempt to regain any efficiency lost by the virtual calls, in this version we make the calls non-virtual by making classes and methods final where possible. Note that there is still a virtual method (Task.fn, overridden in subclasses) at the heart of the program, called on line 247.

7) Deutsch with virtual methods and interfaces (richards_deutsch_acc_interface)

Finally, we make all classes inherit from an interface class, to simulate the most object-oriented framework-like behavior. All methods are virtual.

Performance results and interpretation

Below is a typical performance graph for the 7 versions of Richards run on a JavaTM Virtual Machine (lower points are faster).

Versions 1 and 2 (richards_gibbons and richards_gibbons_final) are almost the same in performance, the extra finals buying little speedup. Version 3 (richards_gibbons_no_switch) is slightly slower. Version 4 (richards_deutsch_no_acc) is slightly slower again. However, when virtual accessors are introduced in #5 (richards_deutsch_acc_virtual) performance degrades significantly. Using finals (#6, richards_deutsch_acc_final) can recover this performance. Using interfaces (#7, richards_deutsch_acc_interface) is as bad as using virtuals, if not slightly worse.

Compare this with the results for C++ (below). There are no results for #2 (it is the same as #1) and #7 (there are no interfaces in C++).

Note that an all-virtual program is much slower in C++ compared to the non-virtual versions (#5 vs #6). Because of this, C++ programmers tend to use virtuals sparingly. When translating from C++ to Java extra effort must be expended to make methods final, as the default binding in Java is virtual.

Contrast this with an experimental implementation of the JavaTM Virtual Machine, called Pep, which was implemented on the Self Virtual Machine (do not compare absolute times as these runs are on a much slower machine).

The Self Virtual Machine performs adaptive optimization and is even capable of inlining virtuals. Note that the curve is essentially flat, with the exception of #1 (Self has no good way of implementing a switch statement). This suggests that adaptive optimization will work for Java (some programs, at least). More details can be found in "Design and Implementation of Pep, a Java Just-In-Time Translator", by Ole Agesen, published in the Theory and Practice of Object Systems, 3(2), 1997, pp.127-155. Similar techniques are used in the `HotSpot' Java VM.

Back to benchmarking overview


Maintained by Mario Wolczko
Last modified: Wed Feb 9 10:55:44 PST