Plans and concurrency

When you deploy a function to STACKIT, STACKIT Functions takes care of launching instances of your function in response to incoming requests.

STACKIT Functions offers a lot of flexibility in for configuring the scaling and resource usage of your functions.

Available configuration options

Plan

Your function’s plan determines the memory available to individual instances of your function.

For example, if you use the plan with 128 MB of RAM, then when a request comes, the STACKIT Functions Runtime would allocate 128 MB to running an instance of your function. Further requests will be processed by the same instance, unless the STACKIT Functions Runtime determines it needs to launch a second instance (as described in the Concurrency section), or the STACKIT Functions Runtime scales your function back to zero instances after sufficient time has pass without requests.

All plans come equipped with 1 shared vCPU core. Please contact us if you want to deploy functions that require multiple vCPU cores per instance or dedicated vCPU cores.

Concurrency

Your function’s concurrency determines the maximum number of in-flight requests it can process at the same time.
An in-flight request is a request that the function has not fully responded to yet.

Note that concurrency is not the number of requests per second an instance of your function can process.
For example, if your function takes 0.2s to process every request, then to handle 10 requests per second, it only needs to process 2 in-flight requests at the same time.
More generally, request rate = concurrency / time per request.

Concurrency should be set so that your function is able to process at least that many requests at the same time within the memory limit set by the plan.

Maximum scale

(Not yet available) Your function’s maximum scale determines the maximum number of instances it can have running at the same time within the STACKIT Functions platform.

You can use this to control the maximum spending of a function deployment.

Selecting correct scaling parameters

The main factor when selecting a plan and concurrency for your function is the memory used by that function.

Ideally, you can run your function locally, and measure its resource usage.

You care about two values:

The memory usage of your function “at rest”, when no requests are coming in.
The peak memory usage of your function per request, that is, the overhead associated with processing each request.

Then, you can select any plan which is larger than your function’s at-rest usage. Ideally, you would like the plan to be at least twice as large as the at-rest memory use, unless your function is seldom used, and you expect only a few concurrent requests.
For example, if your function takes 200MB of RAM at rest, you want to pick a plan with at least 400MB of memory.

From there, you can calculate a concurrency value by dividing the left over space in your plan by the per-request usage.
For example, if you use a 512MB plan for the function describe above, you would have 512 MB - 200 MB = 312 MB left over. If that function takes a maximum of 6MB to process a request, you can set concurrency to 312 MB / 6 MB = 52.