What the Heck is a Micro-Operation?
When reading about Microprocessors, RISC and CISC, you will stumble upon this terminology. What does it mean?
If you have been reading my various microprocessor related stories, you have probably come across the term micro-operation more than once. Here I will try to go a bit deeper and explain better what a micro-operation is.
Read more: How Does a Modern Microprocessor Work?
Read more: What Does RISC and CISC Mean in 2020?
The diagram below zooms in on just a particular part of what the microprocessor does. It shows instructions coming from memory and moving into the instruction decoder inside the microprocessor (CPU). The decoder chops the instruction up into what we call micro-operations which gets fed to the controller unit.
That is a bunch of stuff, which we need to unpack. First of all let me remind you what an instruction is. A computer program is made up of multiple instructions stored in memory. When the program runs, the CPU pulls instructions, one at a time, into the decoder.
Okay that is a simplification. In reality there could be CPU cache in between memory and the decoder. There cold be multiple decoders. But let us ignore all that. You don’t need those details to understand the principles going on. Cache e.g. is really just another form of memory. In an idealized world there would be no cache, just a bunch of super fast memory (RAM).
My diagram is actually supposed to show a CISC processor such as an x86 from AMD or Intel. The yellow little boxes representing instructions are of variable length because that is what they are on x86. They could be anything from 1 to 15 bytes. On most RISC processors such as ARM, RISC-V, MIPC and PowerPC, they have a fixed length: 4 bytes.
But the decoder chops all these variable length instructions into equal length micro-operations.
How is a Micro-Op Different From an Instruction?
In my illustration I am trying to get across two important points about micro-operations:
- They do one tiny little thing. Typically a 1 clock cycle operation. That is why I made them short. One instruction will often turn into multiple micro-operations.
- They are wider. Meaning they consume more bytes than a regular instruction.
This is why I drew the orange boxes thinner than the yellow ones: They do much less. But they are taller than the yellow instruction boxes because they consume far more space.
Hence the word “micro” comes from doing very little, not from being small in size. Going by size, they should have been called bigly-ops.
Interface vs Implementation
The key difference between instructions and micro-operations is still that instructions represent the interface to a CPU, while a micro-operation is part of the implementation.
An instruction is what you want to do. Micro-ops are how you actually do it. Interfaces are important. Just look at the ports on the back of your computer. The makers of your mouse, network equipment, headphones and keyboard don’t really need to know anything about how your computer actually works. They just need to have a plug physically compatible with the ports in the back of your computer. Further the need to follow the electrical specification and communication protocols defined.
However stuff such as how much memory your computer has, what kind of CPU it has, what cooling system is used, is totally irrelevant.
Likewise when making a CPU, we don’t want programmers to have to know how many decoders it has, Arithmetic Logic Units (ALUs), type of branch predictor and whether it does in-order or out-of-order, execution, micro-op caching or whatever black magic the CPU architects have dreamt up.
The benefit of a standardized instructions is that as long as people write programs using these instructions they know the program will run on your microprocessor. CPU architects are free to change the internals of the CPU to make it run faster as long as the make sure it still understands the same instructions.
Micro-operations are just an implementation detail. A CPU architect can change its format to whatever he or she wants. In fact he doesn’t even need to let there be any micro-operations. Earlier CISC microprocessors did no use micro-ops. Many RISC processors, in particular in the lower end, still don’t use micro-operations.
A very simple CPU without micro-operations will work as shown in the diagram above. Instructions are fetched from memory and brought to the instruction register, which feeds it to the decoder. You see all the red arrows? That is the decoder bossing around all the other parts of the CPU, telling them to activate and do stuff. In a simple CPU, an instruction is decoded an this causes various control lines to be turned on and other to be turned off. The combination of stuff turned on and off decides what the CPU actually ends up doing.
E.g. the control lines may tell the Registers to send out the numbers held in register
r2 and register
r4. Another control line may enable the ALU so it receives those numbers and adds them.
Micro-ops and the Control Unit
With micro-operations this get more complicated. The control unit takes over the job of the decoder. In this setup the decoder only turns the instructions into micro-ops.
However this is not as fundamentally different as it seems. The decoder would normally have toggled on a number of outputs (control lines). In many ways it still does, but these outputs now get canned in a micro-operation. Hence the micro-operation is really an encoding of all the various parts of the CPU which should be activated and deactivated.
Let us try to clarify this with some examples. I will use pseudo assembly code for a CISC instruction:
add r3, 7 ; r3 ← r3 + memory
This instruction basically says fetch a number from memory location
42 add number of number stored in register
r3 and store final result in register
How could we encode this instruction? We need a number to say what
opcode we are using. Is it
divide? Computers can only work with numbers so this will have to be a number.
Next we need a number for the register we are using. We could use the number 3 for that. Finally we need a number for the address. If we encode all of this in binary numbers, we get this:
Conceptually this is straight forward to explain. But for the hardware this is not easy. This involved multiple operations:
- It has to fetch the content of memory address 7 and store in a temporary register called memory data register (MDR). Not shown in diagram.
- Set register r3 and MDR to to be the two inputs to the ALU. Enable ALU and set it to perform addition.
- Store result of operation in register
This means our humble little
add instruction would have to be turned into 3 different micro-operations. And these have to encode all sort of details about the CPU operations itself. Which registers are opened, whether we are reading or writing to it. We have to enable or disable the MDR register which receives data from memory. Below you can see each of the three micro-operations with a sort of pseudo encoding:
I had to simplify how I show this. In reality we would need fields to specify enabling and disabling of the program counter register e.g. The output of the ALU would likely be stores in a temporary register. To perform storing of the result, we would need to enable this register for reading, while at the same time enabling the
r3 register for writing.
But let us walk through what we have encoded for each row:
LSU(Load/Store Unit) is enabled, to transmit an address to the memory to select a particular memory cell. We specify the address to read. The
r/wis set to 0 to indicate that we are reading from memory. The memory data register
MDRis enabled so data fetched from memory can be stored in it.
ALUgets configured to perform addition, while the
LSUis disabled. There are potentially two general purpose registers which could serve as input, but we only pick one of them, number 3.
MDRis enabled so it can be input to the
MDRis disabled since we should not read from it. Instead we enable register 3, again to write result of ALU into it.
This is a simplification. In reality there would likely be some output register for the
ALU. Thus for the third micro-op, we would need to enable reading from the
ALU result register and writing to register
r3 to make transfer of final result happen.
Why Do We Have Micro-Operations?
As I previously said, we don’t actually have to have micro-operations in a CPU. CPUs where made without them before and still are.
For CISC and RISC microprocessors the needs are a bit different. With CISC processor each instruction can be of variable length and complexity. This is not exactly the same. E.g. a four byte instruction could in principle take 20 clock cycles to perform while a 15 byte instruction could take 10 cycles.
This irregularity in the clock cycles and the fact that one CISC instruction could need to use a multitude of functional units in the CPU, make it very hard to pipeline CISC instructions. By splitting complex CISC instructions into micro-operations one can pipeline them in similar fashion to RISC instructions.
In fact RISC instructions typically map to single micro-operations.
But this does not explain why RISC processors may need micro-operations. I key reason is to support superscalar processors. That is what we call processors which execute more than one instruction at the same time. There are many ways of doing that. The most common way today is to use Out-of-Order Execution.
Then we have multiple decoders to create a lot more micro operations, we put in an instruction queue. Then a scheduler assigns the micro operations to control unit to actually execute them. With Out-of-Order execution some micro-operations may skip several places in the queue and run in parallel with other micro-operations further in front of the queue. In this case it is important to keep track of what order the micro-operations belong to, so results can be written back to registers and memory in the correct order later.
In short, RISC processors need micro-operations to be able to run multiple instructions in parallel.
Are RISC instructions and Micro-Operations The Same?
This confusion frequently pops up. But from what I have covered thus far I hope it is clear that RISC instructions and micro-operations are not the same thing. RISC instructions tend to cleanly map to single micro-operations which may give this impression.
But as we have seen, they are quite different. A RISC instruction is still a high level compact description of what you want done. A micro-operation is a low level detailed specification of how to do it, requiring a lot more bytes to encode.
Thus as I have remarked on previous article about RISC vs CISC, claiming modern CISC processors just have a RISC CPU inside and an external translator on the outside, is highly inaccurate. That is really just marketing speak. An Intel x86 does not translate its CISC instructions to RISC instructions which some kind of normal RISC processor runs. No, it translates them to micro-ops, just like any high-end superscalar RISC processor.
I skipped over saying much about what pipelines are here, so I will attempt to give a similar more thorough coverage of pipelines in the future.