Limiting concurrent requests
If you have applications that can process maximum N requests at a time, you can
tell Knative to stop sending traffic to a Pod if it's already processing N
requests, by specifying
containerConcurrency field. This is a “hard limit”.
apiVersion: serving.knative.dev/v1alpha1 kind: Service metadata: name: hello spec: template: spec: containerConcurrency: 20 containers: [...]
Knative will autoscale the app and add more Pods, or hold onto the request until one of the Pods become available.
containerConcurrency: 0or omit this field to use the system-wide default set as
It's recommended to set this value as it feeds the autoscaling system.