Apple M1 foreshadows Rise of RISC-V

The M1 is the beginning of a paradigm shift, which will benefit RISC-V microprocessors, but not the way you think.

  • Neural Engine. Specialized hardware for doing machine learning.
  • Digital Signal processing hardware for image processing.
  • Video encoding in hardware.
Google TPUs are Application Specific Integrated Circuits (ASIC). I will refer to them as coprocessors.

What is a Coprocessor?

Unlike a CPU, a coprocessor cannot live alone. You cannot make a computer by just sticking a coprocessor into it. Coprocessor as special purpose processors which do a particular task really well.

Intel 8087. One of the early coprocessors used for performing floating point calculations.
    loadi r3, 0         ; Load 0 into register r3
add r3, r1 ; r3 ← r3 + r1
dec r2 ; r2 ← r2 - 1
bgt r2, multiply ; goto multiply if r2 > 0

How Data is Transmitted to and from Coprocessors

Let us look at the diagram below to get a better sense of how a coprocessor work together with the microprocessor (CPU), or general purpose processor, if you will.

Overview of how a Microprocessor works. Numbers move along colored lines. Input/Output can be coprocessors, mouse, keyboard and other devices.
Overview of how a Microprocessor works. Numbers move along colored lines. Input/Output can be coprocessors, mouse, keyboard and other devices.
You can think of data buses as pipes with valves opened and closed by the red control lines. In electronics however this is done with what we call multiplexers not actual valves.
  1. Another valve is opened on the Memory box, so it can receive the address. It gets delivered by the green pipe (address bus). All other valves are closed so e.g. Input/Output cannot receive the address.
  2. The memory cell with the given address is selected. Its content flows out onto the blue data bus, because the Decoder has opened the valve to the data bus.
  3. The data in the memory cell could flow anywhere, but the Decoder has only opened the input valve to the Registers.

Hardware is accessed just like memory locations by specifying addresses.

What exactly do I mean by that? Well let me just make up some addresses. If you processor attempts to read from memory address 84 that may mean the x-coordinate of your computer mouse. While say 85 means the y-coordinate. So to get a mouse coordinates you would do something like this in assembly code:

load r1, 84   ; get x-coordinate
loar r2, 85 ; get y-coordinate
loadi r1, 1024  ; set register r to source address
loadi r2, 50 ; bytes to copy
loadi r3, 2048 ; destination address

store r1, 110 ; tell DMA controller start address
store r2, 111 ; tell DMA to copy 50 bytes
store r3, 113 ; tell DMA where to copy 50 bytes to
char *video_buffer = 0xB8000;    // set pointer to CGA video buffer
video_buffer[3] = 42; // change color of 4th pixel

How does an Interrupt Work?

Various cards you stick into your PC, whether they are graphics cards or network cards will have assigned some interrupt line. It is kind of like a line that goes straight to your CPU. When this line get activated, the CPU drops everything it is holding to deal with your interrupt.

RISC-V based board from SiFive capable of running Linux

The Rise of RISC-V

Back in 2010 at UC Berkley the Parallel Computing Laboratory saw the development towards heavier use of coprocessors. They saw how the end of Moore’s Law meant that you could no longer easily squeeze more performance out of general purpose CPU cores. You needed specialized hardware: Coprocessors.

Transistor Budget: CPU Cores or Coprocessors?

You can keep playing that game and eventually you have 128 general cores like the Ampere Altra Max ARM processor. But is that really the best use of our silicon? For servers in the cloud that is great. One can probably keep all those 128 cores busy with various client requests. However a desktop system may not be able to effectively use more than 8-cores on common desktop workloads. Thus if you go to say 32 cores, you are wasting silicon on lots of cores which will sit idle most of the time.

Transistor Abundance Change Strategy

Thus in early designs one needed to focus on general purpose computing. But today, we can stuff chips with so many transistors, we hardly know what to do with them.

RISC-V Was Tailored Made to Control Accelerators

This is exactly what RISC-V got designed for. It has a bare minimum instruction-set of about 40–50 instructions which lets it do all the typical CPU stuff. It may sound like a lot, but keep in mind that an x86 CPU has over 1500 instructions.

What is the Benefit of Sticking with RISC-V for Coprocessors?

Making chips have become a complicated and costly affair. Building up tools to verify your chip. Run tests programs, diagnosis and a host of other things requires a lot of effort.

Nvidia using RISC-V Based Controllers

Why is that such a benefit? Nvidia’s use of RISC-V offers some insight. On their big GPUs they need some kind of general purpose CPU to be used as a controller. However the amount of silicon they can set aside for this, and the amount of heath it is allowed to produce is minimal. Keep in mind that lots of things are competing for space.

The small and simple instruction-set of RISC-V makes it possible to implement RISC-V cores in much less silicon than ARM.

Because RISC-V has such a small and simple instruction-set it beats all the competition, including ARM. Nvidia found they could make smaller chips by going for RISC-V than for anybody else. They also reduced watt usage to a minimum.

RISC-V Machine Learning Accelerator (ET-SOC-1)

Esperanto Technologies is another company that found a value in RISC-V. They are making an SoC, called ET-SOC-1, which is slightly larger than the M1 SoC. It has 23.8 billion transistors compared to the 16 billion on the M1.

The Esperanto ET-SoC-1 die plot. Image: Art Swift.

ARM Will Be The New x86

Ironically we may see a future where Macs and PCs are powered by ARM processors. But where all the custom hardware around them, all their coprocessors will be dominated by RISC-V. As coprocessor get more popular more silicon in your System-on-a-Chip (SoC) may be running RISC-V than ARM.

ARM Commanding an Army of RISC-V Coprocessors

General purpose ARM processors will be at the center with an army of RISC-V powered coprocessors accelerating every possible task from graphics, encryption, video encoding, machine learning, signal processing to processing network packages.

Raspberry Pi 4 Microcontroller, currently using an ARM processor.
NVIDIA Jetson Nano Developer Kit.

RISC-V as Main CPU?

Many ask: Why not replace ARM entirely with RISC-V? While others claim that this would never work because RISC-V has a “puny and simple” instruction-set which cannot deliver the kind of high performance that ARM and x86 offers.

Share Your Thoughts

Let me know what you think. There is a lot going on here which is hard to guess. We see e.g. now there are claims of RISC-V CPUs which really beats ARM on watt and performance. This also makes you wonder if there is indeed a chance that RISC-V becomes the central CPU of computers.

Geek dad, living in Oslo, Norway with passion for UX, Julia programming, science, teaching, reading and writing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store