Knative: Operator’s Handbook

Configuring concurrency target

Concurrency target is a “soft limit” for in-flight requests each Pod is serving. It's used for request-based autoscaling.

This is different than the hard concurrency limit.

After the “average concurrency per Pod” of a Service exceeds a certain rate (reaches 200% of the target by default), Knative autoscaler will scale up the number of Pods to bring the average back to the original target.

Autoscaler has a default value of “100” requests per Pod. (You can edit this global value.)

You can also override this per Service using an annotation in its RevisionSpec section (spec.template):

kind: Service
  name: hello
      annotations: 20
    spec: [...]