Knative: Operator’s Handbook

Configuring concurrency target

Concurrency target is a “soft limit” for in-flight requests each Pod is serving. It's used for request-based autoscaling.

This is different than the hard concurrency limit.

After the “average concurrency per Pod” of a Service exceeds a certain rate (reaches 200% of the target by default), Knative autoscaler will scale up the number of Pods to bring the average back to the original target.

Autoscaler has a default value of “100” requests per Pod. (You can edit this global value.)

You can also override this per Service using an annotation in its RevisionSpec section (spec.template):

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: hello
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: 20
    spec: [...]