Configuring concurrency target
Concurrency target is a “soft limit” for in-flight requests each Pod is serving. It's used for request-based autoscaling.
This is different than the hard concurrency limit.
After the “average concurrency per Pod” of a Service exceeds a certain rate (reaches 200% of the target by default), Knative autoscaler will scale up the number of Pods to bring the average back to the original target.
You can also override this per
Service using an annotation in
its RevisionSpec section (
apiVersion: serving.knative.dev/v1alpha1 kind: Service metadata: name: hello spec: template: metadata: annotations: autoscaling.knative.dev/target: 20 spec: [...]