Although our HLVM project is intended to bring modern VM techniques to modern programming languages without requiring any new research, we have ended up producing several enlightening new results. This article takes a look at some of the weird and wonderful techniques we used in HLVM.
GC as a metaprogram (good!)
Whereas most VMs use a garbage collector written in C or C++, the GC in HLVM is written in its own intermediate representation. This unusual design has a variety of benefits:
- The GC acts as a test for the compiler.
- The GC benefits from improvements we make to the compiler.
- The GC is simplified by being able to use features like tail call elimination and higher-order functions.
Overall, implementing the GC in HLVM's own IR proved to be hugely beneficial: the GC was very easy to write and is easy to maintain.
Fat references (bad!)
Most VMs store header data in each heap-allocated block that describes the size and type of a value in order to allow the garbage collector to traverse it. In HLVM, we moved this header data into the reference itself. This allows C data structures to be used directly without having to be copied in order add the header, simplifying and improving the efficiency of the foreign function interface (FFI).
This adds a 4-word space overhead for duplicated references. Fortunately, duplicate references are rare and no significant space overhead has ever been observed in practice.
However, there are two disadvantages. Firstly, we have observed significant performance degradation as fat references are pushed onto the stadow stack, requiring 4× more bandwidth than 1-word references would. Secondly, our fat references cannot be updated atomically so our VM is not memory safe.
Lack of memory safety is a major concern and the best solution we have come up with to date is to resort to 1-word references as pointers to heap-allocated header information. Note that HLVM currently calls malloc twice for each allocation (once to allocate a mark bit and again to allocate the actual data) so moving the header information back into the heap need not require more allocations. The obvious solution of placing the header alongside the mark bit would incur false sharing when the GC thread writes to the mark bit and a mutator reads the header information. If that proves to be a problem then a better solution might be to store an index that can be used to look up the mark bit and header data from two separate arrays.
Shadow stacks (adequate!)
Xavier Leroy dismissed our proposed shadow stack, saying "probably the worst approach performance-wise; even a conservative GC should work better". Our motivation for using shadow stacks was always simplicity rather than performance. Shadow stacks are simply because LLVM's GC support is highly experimental (to date, only two people claim to have managed to get it to work) and shadow stacks facilitate accurate GC in an uncooperative environment). In fact, our results have already proven that Xavier's statement is not true in general and, in fact, we believe it may even be wrong in general.
Specifically, of the six benchmarks from our benchmark suite than handle references in the inner loop (thus incurring manipulation of the shadow stack), HLVM is significantly faster than OCaml on four of them. So our shadow stack cannot be degrading performance that much! Moreover, OCaml is 9× faster than HLVM on our n-queens benchmark only because its generational collector allows short-lived values to be recycled very efficiently and that has nothing to do with the shadow stack.
So we believe shadow stacks are an adequate solution.