Knative: Operator’s Handbook

gRPC Load Balancing

Normally, Kubernetes doesn't understand gRPC. So, on a single TCP connection, every RPC you make goes to the same Pod, and not load-balanced. This results in unevenly distributed load between Pods of an app.

As you learned in request-based autoscaling, Knative can recognize each request (HTTP or gRPC), thanks to its application-layer (L7) load balancer.

With Knative, each RPC is load-balanced between Pods, and used to make autoscaling and activation. This works despite you keep reusing the same underlying TCP connection.

Unary RPCs

As described above, each unary gRPC call is load balanced.

Streaming RPCs

Streaming gRPC is a stateful request where a single backend needs to receive all messages over the connection.

That's why during a streaming RPC, you will stay connected to a single Pod. However, when you finish an RPC call and start a new one, it will be load-balanced between Pods (despite reusing the same TCP connection).