<== Previous | Contents | Next ==>

The Virtual Machine (cont.)

A Micro Benchmark

To illustrate the impact of adaptive optimization on Smalltalk performance, let's look at a very small micro-benchmark. A micro-benchmark is too small to predict performance very accurately, however, it can be used to demonstrate the basic points, and show the compilation process in action.


A micro-benchmark in the Test class (spawn)

The above micro-benchmark, #simpleArray:, simply loops 10 million times, each time storing an integer into an array. A really good optimizer could probably even optimize away the actual store itself, but Strongtalk does not currently do this, so the benchmark really does do the work it appears to. Run the benchmark by clicking on the following doIt: Test benchmark: [ Test simpleTest: 10000000 ]. The benchmark is run 10 times, with the number of milliseconds for each run displayed in the Transcript, followed by the best time. I intentionally did not use any type annotations in this benchmark, so that it is clear that the type-system is not used for optimization.

The first thing to notice is that after one or two runs, the runs are dramatically faster. This is because initially the code is not determined to be performance-critical, so it is run by the interpreter. At some point, it becomes compiled, and subsequent runs are much faster. The compiler actually runs in the middle of the first run or two, but the current version of the VM doesn't use the compiled version until the next run, which is why it is important to run benchmarks multiple times. It is technically possible for the VM to switch to fully optimized code in the middle of the loop, but that work was not completed on this VM.

On my machine, an Intel Pentium III (Tualatin) running Windows XP at 1Ghz, this benchmark produced a best time of 100 milliseconds. Under VisualWorks V5i.4 (non-commercial), which does dynamic translation to native code and is the fastest previous implementation of Smalltalk (as far as I know), it produced a best time of 456ms. This is a speedup factor of 4.6. As mentioned before, micro-benchmarks are not that accurate, and this is probably an overstatement of the actual speedup that Strongtalk would produce for a real program. On a large, more representative set of benchmarks, we computed a number of years ago that the Strongtalk speedup was approximately 3.5 over VisualWorks, on Smalltalk code written in a normal style, although this has not been rechecked recently.

Adding Sends and a Block Closure

Now, let's do something more interesting. One of the main points we have been making is that not only is Strongtalk much faster in an absolute sense, but that it dramatically reduces the relative cost of writing well-structured code (i.e. code that is more finely factored and uses block closures freely to implement custom control structures). So let's add some message sends and a block closure to our benchmark, and see what the effect on the performance is.

Open the method #notSoSimpleTest: in the browser above. Follow its execution path down into #fancyStoreIntoArray: and then into #evaluateBlock:. You can see it does the same basic computation that #simpleTest: did, but it moves the array store down into a block closure in another method, that is then passed to yet another method and finally evaluated. In other Smalltalks, this adds a large amount of additional work, because not only do we have three additional message sends, but we are forcing a block closure (of the copying type) to be created in the intervening method, which normally requires an actual closure object to be allocated every time #fancyStoreIntoArray: is called (to hold the array reference). This is a big cost in performance-critical code. Here is a doIt that will run the #notSoSimpleTest: Test benchmark: [ Test notSoSimpleTest: 10000000 ].

On my machine, VisualWorks runs this new benchmark in 1232ms, which is 270% slower than the first version of the benchmark. Strongtalk runs it in 136ms, which is only 36% slower than the first version of the benchmark. And in fact, with a small amount of improvement to the Strongtalk code generator, it should be able to run this benchmark without any slowdown at all, since the adaptive optimizer is already completely eliminating the additional message sends and the block closure.

The Virtual Machine, cont. ==>