Part I: Ampere Altra ARM Microprocessor Notes
This story is my own research and notes on things I find interesting about the Ampere Altra. As such this isn’t much of a story driving towards any important point. However I put it out here in case others find the various collected factoids interesting.
Hopefully I will get down to writing a proper story on Ampere Altra, when I get a better sense on what story there is to tell about it.
Ampere Computing (the Company behind Chip)
Ampere Computing is a relatively new company. Just 3 years old started in 2017. Can we trust that they have a clue what they are doing? Well they are founded by former Intel president Renée James, so that must count for something.
The Altra is based on the ARM’s new Neoverse-N1 CPU core. This core was specifically designed for servers unlike most other cores made for embedded devices, cell phones etc.
The server chip from Amazon Graviton2 is also based on using Neoverse-N1 cores.
The Altra product line not just something they slapped together that ARM pushed out. Apparently it is the result of several years of cooperation with ARM to create a server chip. So former Intel president talking with ARM over several years to get a good design for a server chip. That sounds promising.
They are not just making a chip. Ampere is also making a whole computer designed to fit in a server rack. AnandTech says this is a 2-socket 2U rack unit sever.
So what does “2U rack unit” mean anyway? It is a standard for the size of rack mounted servers. So basically you got a metal frame, where you can slot in different electronic components such as whole servers. 1U was the earlier ones. 2U means they are twice as tall. The rational for this is apparently that more volume makes cooling easier. Hence that seem to be the new standard for rack mounted servers. Here is an illustration:
AnandTech got their hands on one of these rack unit servers:
The server came supplied with two Altra Q80–33 processors, Ampere’s top-of-the-line SKU with each featuring 80 cores running at up to 3.3GHz, with TDP reaching up to 250W per socket.
Again a bunch of abbreviations to look at. Ampere use a simple naming convention so Q80–33 refers to it having a CPU with 80 cores and clock frequency of 3.3 GHz.
What is an SKU then? SKU stands for: Stock Keeping Unit. It is a unique identifier for a product or service a company sells. So Q80–33 is a unique identifier for a chip Ampere sells. Q80–33 is the SKU of the product.
Next we got TDP which means Thermal Design Power. This is how much heath a CPU produces under realistic max loads. Hence it could go higher under unrealistic scenarios and is likely lower in under normal operating conditions. Might be interesting comparison that Apple’s M1 has a TDP of 10W. But they want their stuff in laptops, so the needs are very different from Ampere. An Ampere Altra is not going into any laptop any time soon.
250W is not an arbitrary numbers however. Rack mounted servers typically have to hit this number, because that is the max heath a typical data center can deal with in terms of regular air cooling. If you go above 250W to 300W then more elaborate and expensive cooling solutions will have to be installed in the data center. This will add cost.
One has to consider that the hardware in a data center is usually cheaper than the operating expenses over time. Thus going for max performance without considering electricity costs for cooling and running the servers is bad for business. Data centers was as much performance per watt as possible. But usually there is a sweet spot. If performance/watt is really high but total performance is simply really low then too much space will be required to install the hardware which also costs money.
Thus you want as much performance out of a 250W rack mountable server as possible. It would not be good for business to install 10W severs in the rack even if they had great performance per watt.
Ampere has a reference board called “Mount Jade” to fit the CPUs, memory, hard drives and everything else on. It being a reference board I assume it is meant for customers to buy this and modify to fit their particular hardware needs.
The Ampere-branded Mount Jade DVT reference motherboard comes in a typical server blue colour scheme and features 2 sockets with up to 16 DIMM slots per socket, reaching up to 4TB DRAM capacity per socket, although our review unit came equipped with 256GB per socket across 8 DIMMs to fully populate the chip’s 8-channel memory controllers.
Again a bunch of jargon to unpack. DIMM stands for dual in-line memory module. If you ever built a PC, this is the standard form that memory comes in. Not an option on e.g. the M1 where memory is on the chip itself.
What does AnandTech mean by socket in this context? They mean the socket you slot an Altra chip into. There are two slots because you can put two Altra chips on the Mount Jade reference board. Each of these chips I guess gets their own memory since AnandTech talks about number of slots per socket. So you can put 16 of these memory modules for each chip. That sounds like a lot! And indeed each chip can get access to 4TB.
Next what is the deal with the 8 DIMMs and 8-channel memory controller? Basically if you got a 1-channel memory controller you can access only one memory module (DIMM) at a time from one of the slots at a time. However with 8-channels the CPU can access eight memory modules at the same time. Basically that gives an 8x increase in memory transfer speeds. Of course that depends on their actually being stuff you want on each of these modules.
This whole thing is organized into what is dubbed banks. This Mount Jade board has eight banks. In each bank you can fit two DIMM memory modules. Only one of the the modules from a bank can be accessed at any given time. Thus if you got 8 memory modules you can spread them over 4 banks by putting two modules in each bank. However by spreading them over all 8 banks you can theoretically get twice the transfer speed.
So how big of a deal is this? Well if you buy some high end gaming PC it will most likely have a dual channel memory controller. So this is 4x the kind of transfer speed you can expect from desktop PC.
So how unique is this in the server space? This is not highly unusual. Wikipedia says this about AMD’s chips for server EPYC:
The platform includes one- and two-socket systems. In multi-processor configurations, two Epyc CPUs communicate via AMD’s Infinity Fabric. Each server chip supports 8 channels of memory and 128 PCIe 3.0 lanes, of which 64 lanes from each are used for CPU-to-CPU communication through Infinity Fabric when installed in a dual-processor configuration.
So it also has 8-channel of memory. This is based on the Zen2 micro architecture and in 2020 there will be EPYC chips with Zen3 cores. This is actually quite interesting to reflect upon. Because this means AMD us using the same cores as on say their gaming PCs. That is very different from what is going on in the ARM space. E.g. an Apple M1 chips has Firestorm cores designed for maximum desktop performance not server performance. While Ampere is using Neoverse-N1 cores specifically made for server workloads. This makes me speculate that AMD will get into trouble competing against as it seems unlikely that Zen cores can be both awesome for servers and for desktop workloads.
If you look at the AnandTech article you will see that the chip is huge:
Now you may wonder why it is so large? Is it because it has so much silicon? So many transistors? Well that is probably part of it, but the key reason is because of all the IO. This chip can connect to a lot of memory and PCIe cards. Every connect will require pins on the chip. So it is the pins taking up space not the silicon per say. This also explains why the cooling solution looks unusual. The cooling is not on across the whole exterior surface of the chip but only the area where the silicon is beneath because that is where the heath will primarily get generated.
This is really just notes from reading first page of AnandTech’s review. It will take some more time to wade through all the other pages of dense info.