Knative: Operator’s Handbook

Limiting concurrent requests

If you have applications that can process maximum N requests at a time, you can tell Knative to stop sending traffic to a Pod if it's already processing N requests, by specifying containerConcurrency field. This is a “hard limit”.

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: hello
spec:
  template:
    spec:
      containerConcurrency: 20
      containers:
      [...]

Knative will autoscale the app and add more Pods, or hold onto the request until one of the Pods become available.

Set containerConcurrency: 0 or omit this field to use the system-wide default set as container-concurrency-target-default: "100".

It's recommended to set this value as it feeds the autoscaling system.

Watch: How Knative Uses Concurrency and Rps (Requests per Second) For Autoscaling - Tara Gu