Thursday, 2nd April 2023
Back when I started my career in tech I was hired as an Erlang developer. At the time I didn’t know the language but quickly fell in love with its pattern matching and concurrency mechanism.
In a nutshell, Erlang spawns lightweight processes to handle concurrency with message passing between them and the main process. This allows it to parallelize code in a significant way, not to mention handle incoming HTTP/HTTPS connections on a very large scale. However, Erlang syntax can be tricky especially for developers without a ton of experience. Elixir exists and does have a much more user-friendly syntax, however it’s not quite at the level of ease-of-use that I would like. Because of this, I was inspired to implement this concurrency in a language which is more plain-English and reduces the barrier of entry for programmers coming from other languages. It’s python, I was essentially inspired to write a Vm which runs a Python dialect.
There are essentially three components I have to implements:
To start with, I decided I didn’t want to use an existing instruction set. I figured, I’m controlling the VM so might as well make the assembly easier on myself (especially so I can write Quiver assembly (QASM) more easily for testing purposes). Parsing was easy, take each line, ignore comments and blank lines, then split the remaining into slices of string (I’m using Golang for this just because I like the language, will probably port it over to C or C++ at some point for performance, maybe even Rust!). For the splitting we want to respect things like curly braces, quotes, etc. but that’s easy to add into a parser. This takes us something like:
# Store data in memory
.literal newline "\n"
.literal message "Hello, "
.literal name "World!"
# Output our message
output message
output name
output newline
# Explicitly bail out of the program
stop 0
[".literal", "newline", "\"\n\""]
[".literal", "message", "\"Hello, \""]
[".literal", "name", "\"World!\""]
["output", "message"]
["output", "name"]
["output", "newline"]
["stop", "0"]
We can now go through each of these lines and process them into bytecode. To start with, we look for “dotcodes”, These are opcodes (each line starts with one) that start with a period. In this case, the .literal
dotcode denotes variables that should be initialized with a value at the start of the program.
After that, we process the other opcodes and their arguments and turn those into bytecode as well.
The structure of a Quiver compiled (QVC) program is as follows:
[ ............................. ] [ ............................... ]
| | | |
+---- initial variable data ----+ +---- bytecode for operations ----+
Initial variable data is encoded as such:
[ 1 byte ] [ 1 byte ] [ ? bytes ] [ 1 byte ] [ ? bytes ]
| | | | | | | | | |
+---- datatype ----+ +---- variable name length ----+ +---- variable name ----+ +---- data length ----+ +---- data ----+
All the variable are strung together into a single block at the start of the bytecode which has a prefix denoting how mny bytes the variable data takes up.
After the variable data, we encode the opcodes and their arguments.
These are encoded as:
[ 1 byte ] [ 1 byte ] [ ? bytes ] [ .......................................... ]
| | | | | | | |
+---- opcode ----+ +---- argument length ----+ +---- argument ----+ +---- additional arugments if applicable ----+
We then append this to the variable data to get a final bytecode with the structure
[ 8 bytes ] [ ? bytes ] [ ? bytes ]
| | | | | |
+---- initial variable data length ----+ +---- initial variable data ----+ +---- opcodes ----+
Now we get to the real fun part.
First, we have to parse the bytecode from our input file. To do this, we look at the first 8 bytes and get the amount of data we need to load in. After that, we go through that many bytes and read in each variable, storing them in a data struct which our VM holds.
Next, we read the remaining bytecode loading them into a slice of Instruction structs. Finally, we start at the first instruction and run through each one, keeping track of where in the program we are (so that jumps are able to go to another line if the are called). Each operation is called based on the opcode with its arguments parsed and used in the operation. For example, calling:
output message
or, represented in our bytecode as:
[ 15 ] [ 7 ] [ 109 101 115 115 97 103 101 ]
| | | | | |
+---- output opcode ----+ +---- length of variable name ----+ +---- variable name ----+
the VM performs the following actions:
Access our variable struct and check to see if that variable name
Grab the value from our variable struct at that variable name
Print the value to the screen
Each opcode is run in some manner similar to that, with each one performing different actions.
Currently concurrency (and a bunch of other features) haven’t been implemented yet, however with this structure it should be relatively simple to implement.
Stay tuned for more updates on Quiver as I implement concurrency and a standard library!