Thanks for the feedback Yichao! I am by no means and expert. I am a popularizer. So sometimes I get stuff wrong. But helpful feedback like yours help me make corrections.
I did in fact not know about ARM SVE. I actually know RISC-V Assembly better.
Just having a quick look at your at the ARM SVE manual make it look like it is indeed the same thing as the the RISC-V extenions. It is even remarked that it is intended for High Performance Computing (HPC), so that makes it sound very much like Cray-1 vector processing stuff. I will have to read more upon this. Either make some additions to this article of write a new one.
Do you know any of the story around ARM SVE? How it came about? Like the discussions leading up to it and the reception by the industry?
3. I rewrote that part to make it clearer. In this case the max number of elements for the vector registers was 64.
So yes your remarks on it being similar to SVE seems reasonable.
4. Ah you should really check out Julia. I have written a number of articles on Julia (and a book). I should perhaps have made some links to it or clarified as it may not be clear here based on people's prior experience with high level langauges.
Julia is a multiple-dispatch language. It does not take arbitrary objects as arguments.
Basically a function is just a name. To this function you attack any number of methods. Not methods in the OOP sense. Each method is basically a function signature with different number of arguments and types.
The Julia JIT will compile a unique method for each unique combination of arguments.
It is a bit too big topic to get into here but Julia has very advance meta programming facilities and ability to hook into the JIT code generator. This is already use in Julia to allow you to write code that runs on Google's Tensor Processing Units as well as on GPUs.
You can look at Julia GPU programming example here: https://juliagpu.github.io/CUDA.jl/stable/usage/overview/
Btw I am not saying this will be possible. I don't actually know well enough how vectorizaton currently works in Julia. But I know it is being done.