Arm Limited, the company behind the ARM architecture, publishes AARCHMRS dataset – “Arm Architecture Machine Readable Specification”, a detailed description of an ARM processor as a set of JSON files.
Today I published a set of crates:
aarchmrs-parser: a set of definitions that allows parsing the AARCHMRS dataset, but only instructions description file.aarchmrs-instructions: a set of functions derived from the AARCHMRS that allow you to build instruction codes.aarchmrs-types: a helper crate with useful types foraarchmrs-instructions, which now contains only generated code.
The license is the same as the original dataset: BSD-3-Clause.
The sources and generation tooling is available at https://github.com/monoid/harm/.
Bits of history
When I experimented with this dataset, my initial version simply generated functions for instruction synthesis1.
Then I thought about a more flexible approach: what if, instead of a function, we used a struct that holds the arguments, and we can change these arguments and then finally build it?
My third approach was using various bit-field crates (every ARM instruction is
32-bit value, and using bit-fields seems to be a natural approach).
Unfortunately, it proved to be extremly slow: bit-field structs are implemented
with a proc-macro, and for each definition there is subprocess invocation that
handles the input code and produces output. It also generated huge amounts of
target data.
After some experiments I decided to switch back to function definitions to reduce compilation time as much as possible.
But what impacted the compilation time most is build.rs file. My original
idea was to generated the code with build.rs. However, there is no point
generating it on every compilation: it would produce the same output. So,
generation was started only when a specific environment variable was manually
set, but lot of dependencies for an HTTP client (because the build.rs would
automatically download the AARCHMRS file if it is not available or broken) would
compile anyway, even if it was not needed. This is a huge dependency, so
eventually I removed the build.rs and created a tool (aarchmrs-generate in
the GitHub repository above) that generates code only when it is really needed
(when generator changes or when new AARCHMRS is released).
So, it takes now just a few seconds to compile aarchmrs-instructions on my good
old Macbook Air M1. I’m quite satisfied with the result.
while “synthesis” sounds really smart, here it simply means “computing something from set of inputs”. ↩︎