gRPC Load Balancing
Normally, Kubernetes doesn't understand gRPC. So, on a single TCP connection, every RPC you make goes to the same Pod, and not load-balanced. This results in unevenly distributed load between Pods of an app.
With Knative, each RPC is load-balanced between Pods, and used to make autoscaling and activation. This works despite you keep reusing the same underlying TCP connection.
As described above, each unary gRPC call is load balanced.
Streaming gRPC is a stateful request where a single backend needs to receive all messages over the connection.
That's why during a streaming RPC, you will stay connected to a single Pod. However, when you finish an RPC call and start a new one, it will be load-balanced between Pods (despite reusing the same TCP connection).