Programming an Assembly Code Simulator

As a DSL in Julia

In the past I’ve played a lot around with the AVR microprocessor used on the Arduino board. For people hobbyists it is a great board. I’ve used it to control a toy factory I’ve built with Fischertechnik parts. It is somewhat similar to Lego Mindstorm.

Image for post
The Arduino board which popularized the 8-bit AVR micro-controller. Essentially it is a 20 dollar computer which fits in your hand, which you can use to control electronics and hardware projects easily.

Arduino is usually programmed with C/C++, but I quite liked using assembly language. I’ve always had a fascination for assembly language ever since I played around with Motorola 68000 assembly on my Amiga 1000 computer.

I was curious to see how hard it would be to create a simulator of AVR assembly code using the Julia programming language. While you can do this with almost any language, Julia has a number of powerful features which makes this job a lot easier.

Lets first discuss how you would normally do this in a normal programming language.

From Lexer to Parser to an Abstract Syntax Tree

Typically, we would divide the task of compiling and running a program into several parts. First we use a Lexer to turn all the text representing code into a list of tokens. Next the parser finds out which tokens belong together grammatically. Eventually this is used to create an abstract syntax tree.

Image for post
Image for post
Example of an abstract syntax tree for the expression l + p — 10.00

If we are making an interpreter we can evaluate nodes in this syntax tree, recursively to effectively run the program.

So if we were to do this in C/C++, Swift or Java e.g. we would have to create the lexer, parser and define the individual nodes in the abstract syntax tree. We could implement this using the interpreter pattern. I got a previous article where I implement this using the Swift programming language.

However with Julia, we can cheat and avoid doing all of this by using Julia’s own parser, abstract syntax tree and machinery for evaluation of this. Julia is much like LISP, allowing you to treat code as data.

Assembly Code as a DSL

Here is an example of how assembly code can be represented in Julia, using Julia syntax. You can find the whole project on my github repo.

It is a super simple program where we repeatedly add numbers utilizing a loop.

    ldi(r3, 4)        # Load 4 into register R3
ldi(r1, 2) # Load 2 into register R2
:loop # A label to jump to
add(r2, r1)
dec(r3) # Decrement register R3 with one
brne(:loop) # Branch to loop if R3 isn't zero

If this had been in normal assembly syntax it would not have looked that different:

    ldi r3, 4         # Load 4 into register R3
ldi r1, 2 # Load 2 into register R2
loop: # A label to jump to
add r2, r1
dec r3 # Decrement register R3 with one
brne loop # Branch to loop if R3 isn't zero

Implementing Assembly Opcodes

How would you implement this? A naive first approach would be to implement each assembly instructions such as ldi and add as functions in your language of choice.

function ldi(Rd::Register, K::Integer)
Rd.value = tobyte(K)

I have made a register as a variable with a single member, which allows us to store the value of the register. We can’t assign directly to registers because Julia as many other language don’t have pointers. Because had this e.g. been a C program, I could have utilized pointers like this:

void ldi(char *Rd, int K) {
*Rd = tobyte(K);

The tobyte function in to deal with the fact that we want to store integers in a single byte regardless of whether they were specified as signed or unsigned. There is no way of specifying this in C, as C only works with concrete types, while Julia can handle abstract types such as Integer which represents integer types of any bit size, signed or unsigned.

However in C/C++ we could probably have solved the problem with a union type:

union Integer {
signed char uint8;
char int8;

Below is how we have implemented add. Interestingly it takes a bit of code to simulate adding on a microprocessor because we need to simulate the updating of status flags, represented as single bits in the status registry. They have name such as:

  • C Carry flag. Set to one if adding two 8 bit numbers is larger than 255.
  • Z Zero flag. Set to one if the result of the last operation was zero.
  • N Negative flag.
  • V Two’s complement.
function add(Rd::Register, Rr::Register)
sum::Int8 = Rd + Rr
if (Rd >= 0 && Rr >= 0 && sum < 0) ||
(Rd < 0 && Rr < 0 && sum > 0)
Rd.value = sum

Our update_sreg() function deals with updates status bits which are the same for most assembly opcodes.

How to Simulate Control Flow

The implementation of ldi and add would be easy to do in any programming language, not just Julia.

However trying to implement an assembly code simulator as a DSL this way, will only allow you to create programs where every instruction is executed in sequence exactly once.

But that would be useless. We want to execute some instructions multiple times and others conditionally.

In our program example we jump to the :loop label repeatedly until register r3 has reached zero.

We can to this by putting our instructions in an array, and then deciding which instruction should be executed next by using an index into this array of instructions.

function run_program(cpu::CPU, pcode::Vector{Expr})
const max_no_instructions = 100
cpu.pc = 1 # Set Program Counter to Beginning
for i in 1:max_no_instructions
eval(pcode[cpu.pc]) # Run a single assembly instruction
cpu.pc += 1 # Advance to next instruction
if cpu.pc > length(pcode)
println("Out of bounds: $(cpu.pc), exiting program.")

That is what I do in this code. The code is in an array called pcode where there is a single instruction at each index. A simple struct called cpu holds various status about the simulated state of my virtual AVR microprocessor. pc simulates the program counter in a microprocessor. I use it to pick an instruction to evaluate next with eval().

By default pc advances by one after the execution of each instruction. But we can manipulate this to change the order of execution.

This is how I implement the BRanch if Not Equal (BRNE) opcode:

function brne(k::Integer)
if !is_set_Z()
cpu.pc += k

It works by checking if the previous operation, dec(r3) in our case, resulted in zero. This would cause the status register bit Z to be set to 1.

You can see this does a relative jump. We don’t set the program counter to an absolute address, but update it with an offset.

Relative branching is normal on microprocessors because otherwise you would need larger instructions. Each instruction on the AVR are 16 bit long, but so is the addressable memory on the chip as well. Obviously we can’t fit a 16 bit address into a 16 bit instruction, hence the usage of relative addresses.

Anyway, this instruction rely on people manually computing offsets to the place they want to jump to. Normally people writing assembly code don’t want to wast time on that, so we use labels:

brne(symbols[label] - cpu.pc)

The version using labels, just look up the position of the label in a dictionary. Then we have to compute the offset to this label.

Creating an Abstract Syntax Tree the Lazy Way

So we know how to jump around in the code, but how do we actually get the objects representing individual instructions, which we can evaluate whenever we want to?

Previously I’ve shown how to create an abstract syntax tree in Swift, but we’ll cheat and use the syntax tree produced when Julia code is parsed.

Let’s write a simple expression at the Julia REPL:

julia> 2 + 3

You see it is evaluated immediately to 5. But we want to only evaluate at our chosen time. We can quote expressions using :() to turn them into expression objects:

julia> exp = :(2 + 3)
:(2 + 3)

These objects can be evaluated using Julia’s eval() function.

julia> eval(exp)

However since we are going to keep our assembly code in a separate file, we want to be able creating expressions from strings:

julia> exp = parse("2 + 3")
:(2 + 3)

To do that, I use this function which takes a filename, reads every line in the file as a string, and applies the parse() function to each of the lines.

function assemble_program(filename::String)
expressions = map(parse, readlines(filename))

This gives us an array of expression objects in expressions which we pass onto an assemble_program() function dealing with an array of expressions.

Before we can process expression objects, we need to know what they look like.

julia> exp = :(2 + 3)
:(2 + 3)

We can use dump() to look at the structure of an object.

julia> dump(exp)
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Int64 2
3: Int64 3
typ: Any

head and args are the most important fields. head says what type of expression we got, while args, in the case of function calls contains the function being called, and its arguments.

So this is what I utilize in my assemble_program() function processing an array of expressions and or symbols.

function assemble_program{T}(expressions::Vector{T})
pcode = Expr[]
symbols = Dict{Symbol, Int16}()

for exp in expressions
if isa(exp, Expr) && exp.head == :call
push!(pcode, exp)
elseif isa(exp, QuoteNode) && isa(exp.value, Symbol)
symbols[exp.value] = length(pcode)
(pcode, symbols)

In the for-loop we check the types of the objects in the array to check whether we are dealing with an expression or symbol (used as labels).

If we got an array we just add it to our code array. If we got a symbol we figure out the current index in the code array, so we can store that as the “address” the label represents.

This is essentially how my simulator works. But let me elaborate a bit on how Julia simplifies the work of doing this kind of work, using code generation and macros.

Creating Functions for Handling Status Register Bits

Remember we got status register bits saying something about the status of the last assembly instruction.

const C  = 0 # Carry flag
const Z = 1 # Zero flag
const N = 2 # Negative flag
const V = 3 # Two's complement
const S = 4 # For signed tests
const H = 5 # Half carry
const T = 6 # Transfer bit

I got functions such as set_C() and clear_C() to deal with this.

set_C()    = sreg |= 1 << C
clear_C() = sreg &= ~UInt8(1 << C)

In fact I got more functions for each bit. I also got is_set_C() and get_C() as well. Writing that for every status bit is cumbersome. I can get Julia macros to solve that problem.

for flag in collect("CZNVSHT")
# Get which bit number the flag is at
flag_bit = eval(Symbol(flag))
@eval begin
$(Symbol("set_", flag))() = sreg |= 1 << $flag_bit
$(Symbol("get_", flag))() = (sreg >> $flag_bit) & 0x01
$(Symbol("is_set_", flag))() = sreg & (1 << $flag_bit) != 0
$(Symbol("clear_", flag))() = sreg &= ~UInt8(1 << $flag_bit)

Going through everything about how this works, is probably a bit outside the scope of this post. But let me explain roughly what happens.

  1. I turn the string “CZNVSHT” into an array of individual characters, and loop over each of them.
  2. I turn the character into a symbol, which when evaluated gives us the value of the variable with the same name, hence the number of the bit.
  3. With the @eval macro I can define a block of code, where I can interpolate variables, just as if this was a text string. So e.g. $flag_bit turns into the actual number of one of the status bits.
  4. I can use Symbol() to combine strings into a symbol. The names of variables and functions are symbols in Julia.

Defining Registers as a Special Number Type

The AVR CPU has registers in the range r0 to r31. I don’t want to accidentally mix up registers with numbers representing addresses or immediate values. I also want to make sure I don’t accidentally specify a number for a register out of range. I can solve this by making a special register type:

mutable struct Register
Register() = new(0)

The value field holds the current value of the register. I can create each register like this.

const r0 = Register()
const r1 = Register()

But that is cumbersome and boring when you got powerful macros. So instead we do:

for i in 0:31
@eval const $(Symbol("r", i)) = Register()

If you’ve ever worked with C/C++ macros using a preprocessor you probably think all this looks very scary. But I promise you it is nothing like that. For one thing it is not based on primitive string substitution, and Julia actually keeps track of where generated code was generated. So if I try to lookup a generated function at the interactive Julia REPL, it will take me straight to the code generating that particular function.

Typically I want to perform operations on registers such as adding or subtracting and I don’t want to write this every time:

r1.value + r2.value
r2.value - r1.value

So again we can use Julia code generation to define all these operations.

for op in [:+, :-, :*, :/, :>, :<, :(==), :(!=), :(<=), :(>=)]
@eval $op(x::Register, y::Register) = $op(x.value, y.value)

To understand how this works, you need to know that by prefixing with a colon, we turn, a variable or function into it’s corresponding symbol. Also observe that operators are just functions in Julia.

3 + 2
+(3, 2)

Are equivalent forms, which means you can define the plus operator like this:

+(x::Register, y::Register) = +(x.value, y.value)

Next Time

This got longer than I expected. The problem is that when you get used to writing Julia code, you forget how much “magic” in a sense you are using, which would not necessarily be obvious to a reader with just casual interest in Julia. I will try to follow up this later by writing about how you would implement the interpreter pattern in Julia, as well as more about the AVR microprocessor and how we can built an actual assembler for it in Julia, not just a simulator.

We can use many of the same tricks for that.

Written by

Geek dad, living in Oslo, Norway with passion for UX, Julia programming, science, teaching, reading and writing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store