Path: blob/master/doc/compiler/jitserver/CHTable.md
6000 views
JITServer CHTable design and implementation
To begin, this document will describe the vanilla (non-JITServer) implementation of the CHTable, then will go on to describe the changes made to support JITServer.
Vanilla CHTable
The CHTable (Class Hierarchy Table) is used to perform codegen based on assumptions about the structure of an application's class hierarchy. It's split into two (perhaps poorly named) components: the TR_CHTable
and the TR_PersistentCHTable
. The CHTable can be disabled using the JIT option disableCHOpts
. It is not used for AOT except under rare circumstances.
The TR_CHTable
(found in env/CHTable.{hpp,cpp}
) isn't really a class hierarchy table, but instead stores information about which assumptions have been made during the codegen of a particular method. However, not all of that information is stored in the CHTable
. Some of it (virtual guards and side effect guard patch sites) is instead stored in the Compilation
object, for reasons which are unclear since they are only ever used by the CHTable. At the end of a compilation, TR_CHTable::commit
is called by the compilation thread. The commit operation verifies that all the assumptions made during the compilation still hold, and if so, it converts them into runtime assumptions which are stored in the method metadata. These assumptions generally cause a recompilation if they are violated, but can sometimes be fixed by patching the affected method. If any assumptions have been violated during the compilation before the commit operation, the compilation will be failed.
The TR_PersistentCHTable
(found in env/PersistentCHTable.{hpp,cpp}
) is where the class hierarchy is actually stored. There is a single global instance of it stored in the persistent info, which is created in rossa.cpp. It contains a hash table holding TR_PersistentClassInfo
for all loaded classes, keyed by the corresponding OpaqueClassBlock*
. The persistent class info holds a list of subclasses as well as various other flags and metadata. It also optionally holds a TR_PersistentFieldInfo
for each field in the class, but that is only set by class lookahead, which is disabled under normal circumstances. The PersisitentCHTable has methods to
directly find info about a specific class (
findClassInfo[AfterLocking]
), which are dangerously low-level and should probably be removed (details provided later).query specific properties of a class (
findSingleInterfaceImplementor
,isKnownToHaveMoreThanTwoInterfaceImplementers
, etc).update the table with info about newly loaded/unloaded classes (
classGotLoaded
,methodGotExtended
, etc).commitSideEffectGuards
, which is related to the commit operation ofTR_CHTable
and seemingly has no good reason to be a member ofTR_PersistentCHTable
instead of being along with the rest of the commit functionality inTR_CHTable
.update the
visited
flag on classes (resetVisitedClasses
), which is used inTR_CHTable
for building up lists of classes and iterating over them, and again should probably be in that class instead.
Most of the operations performed on the TR_PersistentCHTable
require the class table mutex to be held. The class hash table is implemented in a peculiar way, where TR_PersistentClassInfo
inherits from TR_Link0<TR_PersistentClassInfo>
(CRTP in action!), effectively making every TR_PersistentClassInfo
object into a linked list. This object cannot be represented without being in a linked list! These linked lists are only used in the existing code to store entires within a bucket of the hash table. This is part of why I claim findClassInfo
is dangerously low level - it exposes the internal structure of the hash table in an extremely unusual way, in which it could easily be mistaken by a programmer for being a different data structure. Use caution when working with this code.
JITServer CHTable
4 new files were added:
env/JITServerCHTable.{hpp,cpp}
env/JITServerPersistentCHTable.{hpp,cpp}
The architecture of CHTable functionality remains similar with JITServer support, with env/PersistentCHTable.{hpp,cpp}
containing JIT{Server,Client}PersistentCHTable
, and env/JITServerCHTable.{hpp,cpp}
containing commit related functionality. Some changes were made elsewhere in the codebase to accommodate for JITServer support.
In rossa.cpp
, TR_PersistentCHTable
allocation was modified so that the correct JITServer subclass is allocated when in JITServer mode. The reason why a client override is necessary is due to the heavy use of caching by the JITServer CHTable implementation. The server holds a cache of each client's TR_PersistentCHTable
, which is updated with a delta once per compilation. The cache is initialized with any classes that are loaded early on upon the first interaction with the CHTable. To send the delta, the client needs to keep track of what changes have been made to the table since the last delta was sent. JITClientPersistentCHTable
simply keeps track of changes while deferring functionality to the base class. (De)serialization of deltas is performed by methods matching [de]serialize*
in env/JITServerPersistentCHTable.cpp
and is currently done manually by writing data into a string because the new tuple serializer had not yet been written at the time (TODO: use the tuple serializer). This is also where the other issue with findClassInfo
shows up - we don't know if the class info will be modified following a call to findClassInfo
, so we have to conservatively assume it will be modified, which may lead to larger than necessary deltas. These deltas are requested by the server before within wrappedCompile
before codegen starts.
Since different clients can have different class hierarchies, a server needs to keep a seperate cache for each client. This is done by associating caches with client IDs within the JITServerPersistentCHTable
. The server persistent CHTable overrides the findClassInfo
methods to return cached class info, which gets the other query methods working for free. The table update related methods will never be called on the server since it never loads classes, so we ignore them. The server keeps track of the last time it contacted each client, and will free the associated memory if it hasn't heard from a client in over 5 minutes.
Normally, the commit
operation is performed by the compilation thread following a compilation. For JITServer, we cannot perform it in the same place because creating runtime assumptions on the server is meaningless; the server never loads any class and thus would not have anything to check the assumptions against. So, we need to perform the commit on the client, using the data generated by the server during compilation. However, the point at which the server would normally call commit
is before the client has loaded and relocated the code, which prevents the runtime assumptions from being relocated. But, the client loads and relocates the code after the final message sent by the server. So we have to send the data required to perform the commit along with the final message and have the client perform the commit after loading the code. This is implemented in JITServerCompilationThread.cpp
. The call to computeDataForCHTableCommit
is performed on the server and the data is transferred to the client where JITClientCHTableCommit
is called.
Because the vanilla commit
code uses a bunch of linked lists of complex types including symbol references (which only exist on the server) and networking code gives us vectors when deserializing, the commit code has been duplicated to make use of stripped down data types which only include relevant information. This is unfortunate, but it would be rather complex to support a single templated implementation which handles both data formats, and it would be inefficient to convert between the two data formats. So for this "protoype" CHTable implementation, code duplication is the only reasonable way to make things work. Still, the data is quite complex, so the serialization infrastructure was upgraded to support arbitrarily nested tuples and vectors of data.