A lot of that is about matrix multiplications, and there Apple's Neural Engine and their secrete Matrix Coprocessor AMX likely gives them an edge. I covered that here: https://medium.com/swlh/apples-m1-secret-coprocessor-6599492fc1e1
How much performance you can get in PyTorch and TensorFlow I suppose depends on whether they are taking advantage of the specialized Apple hardware. It may be that you need to use Apple's own APIs to benefit right now. I am not sure if ML libraries have taken advantage of these yet.
E.g. I think you need to use the Accelerate framework. With Julia, which I know better than the Python side of things for machine learning, I believe porting to Accelerate has been a fairly recent endevour.