Yeah I have harped on about the importance of volume in many earlier articles, but I think there is another key element which I probably haven't highlighted enough which I noticed Jon Stokes remark on in his book "Inside the Machine":
The releative overhead of x86 decoder complexity shrank as overall transistor count increased.
For early chips with fewer transistors the x86 decoding ate up a large amount of the total number of transistors available in your transistor budgets.
But as those transistor budgets grew much larger, the relative cost of complex x86 decoding fell.
Thus time was always on the side of x86. If not for this effect I think the RISC guys would have managed to keep an edge on x86 even while Intel enjoyed much higher volume. After all that volume difference existed even at the very start.
Now the volume games is changing as you remark on. That means x86 has less of an obvious advantage.
But I think there is maybe even more to this. For a long time our chips got fatter and fatter. We didn't get more cores. That meant the relative cost of decoding x86 shrank.
Yet, when we change our strategy completely to focus on parallel processing rather than high single thread performance, we end up with lots of cores. To fit more cores on a die, it becomes interesting to make really small cores. But with really small simple cores you get back to the early 1990s again. We get to cores where x86 decoding overhead becomes a problem again.
Now this isn't an issue for something like having 64 to 128 cores. But if you want to do what Esperanto Technologies does on their SOC and place over a 1000 RISC-V cores on a die, then those cores got to be really small. When cores get that small decoder complexity will start to be a big deal again.
And here I think RISC-V has an edge over Arm as well. Arm today has gotten fairly complex. If you want to make really small cores I am not sure if 64-bit Arm is a good choice.
But who knows how this will play out. It is both a question of the volume game but also about how aggressive will we pursue parallel processing. Trying to push ever higher single thread performance with deeper pipelines, more sophisticated branch predictors, OoO got to hit a wall sooner or later. Or at least that is what I assume.