Skip to content

Load testing

Fleans ships a load-test suite under tests/load/. Two drivers are wired up: k6 (the primary suite) and Locust (port of the same scenarios for Azure Load Testing). This page summarises the most recent baseline measurements and the bottleneck profile they expose.

Four scenario scripts live in parallel under tests/load/scripts/ (k6) and tests/load/locust/ (Locust). They share fixtures under tests/load/fixtures/.

ScenarioPurposeFixture / process id
linearPure throughput — Start → ScriptTask → End. Measures /Workflow/start HTTP latency only.load-linear
parallel3-branch fork/join, each branch a ScriptTask. Same HTTP-latency surface, more state writes per start.load-parallel
eventsThree-phase event-driven loop: start → poll for waitMessage → POST /Workflow/message.load-events
mixed40 / 30 / 30 weighted blend of the above three.(composite)

The suite is documented in detail in tests/load/README.md.

Two reports exist in the repository for the current release. Each is anchored to its driving hardware/SKU and the date it was run:

  • Local Docker Compose (tests/load/results/local/report.md) — 2 silos in docker compose, 500 concurrent virtual users per scenario. Hardware: developer laptop. Originally produced for issue #243.
  • Azure Container Apps + Azure Load Testing (tests/load/results/azure-2026-04-29/report.md) — 2 silos behind the Container Apps built-in load balancer, Postgres Burstable B2s, Redis Basic C0. 500 VUs per scenario plus a 2 000-VU scale test against the same topology.

Both reports include per-scenario throughput / latency tables, peak container CPU, threshold compliance, and ranked recommendations. The summary below pulls the headline numbers; reach for the report files for raw CSVs and full bottleneck attribution.

Fleans v0.1.0 — 2026-04-28 — Local Docker Compose @ 500 VU

Section titled “Fleans v0.1.0 — 2026-04-28 — Local Docker Compose @ 500 VU”
ScenarioMax VUsIterationsThroughput (req/s)p(95) durationError rateFirst bottleneck
linear500344 8151 049 / s174.88 ms96.90 %Postgres write saturation
parallel500322 452989 / s175.22 ms98.89 %Postgres write saturation
events2002 1666.6 / s32 000 ms21.40 %Postgres + event coordination
mixed1001 0503.2 / s38 220 ms66.66 %Postgres + event coordination

The local laptop run is db-bound: PostgreSQL CPU peaks above 600 % across all scenarios. The error spikes are connection-pool exhaustion (max_connections=100 default vs. 200+ EF Core connections under burst), not slow queries.

Fleans v0.1.0 — 2026-04-29 — Azure Container Apps @ 500 VU (and a 2 000-VU scale test)

Section titled “Fleans v0.1.0 — 2026-04-29 — Azure Container Apps @ 500 VU (and a 2 000-VU scale test)”
ScenarioReqsRPSHTTP fail %p50p95max
linear @ 500 VU33 9071130.0 %1.9 s8.4 s11.4 s
parallel @ 500 VU22 701760.0 %1.9 s13.2 s15.5 s
events @ 500 VU (workflow_start)13 915470.0 %9.2 s18.5 s24.4 s
events @ 500 VU (poll_stall)13 76046100 %
mixed @ 500 VU (parallel branch)3 653126.8 %4.1 s30.7 s33.7 s
linear @ 2 000 VU32 492108.547.0 %13.6 s33.5 s39.8 s

The Azure Container Apps target is silo-CPU bound. With 2 × 0.5 vCPU silos the API tops out around 110 RPS regardless of VU count — adding load adds queue, not work. The 2 000-VU run is the same throughput as the 500-VU run, just with 47 % HTTP 500s instead of queued requests.

Two bottleneck profiles compared at the same VU count:

RunThroughputp95 (workflow_start)HTTP fail %
Docker laptop, 2 silos, 8-core host1 049 / s175 ms97 %
Container Apps, 2 silos × 0.5 vCPU113 / s8 405 ms0 %

Same code, same fixtures, same scripts. The bottleneck moves from Postgres to silo CPU when the silo SKU shrinks far enough; both regimes are valuable for shaping a sizing decision.

The events scenario reported a 100 % poll_stall rate in both the 2-silo and the 1-silo re-run on Azure. We chased it because that’s a striking result. After investigation:

  • The fixture and the Locust port are correct. A single-instance probe with no other load showed start → step1 → waitMessage in <500 ms; the message correlated and the workflow completed.
  • Under 500 VU on 0.5 vCPU silos, the silo cannot dispatch the step1 → waitMessage transition within the test’s 3 s poll budget. The test catches this correctly.

Three incidental engine bugs surfaced and are worth knowing about when you scale Fleans:

1. Stale Orleans membership after Container Apps scale-down

Section titled “1. Stale Orleans membership after Container Apps scale-down”

When Container Apps replicas decrease, the Redis clustering table keeps the dead silo’s endpoint until Orleans probes converge it (~1 minute). During that window, grain calls route to the dead silo and stateless-worker activations stay registered there:

warn: Orleans.Runtime.Messaging.NetworkingTrace
Connection attempt to endpoint S100.100.200.74:11111 failed
warn: Orleans.Runtime.MembershipService.SiloHealthMonitor
Did not get response for probe #18 to silo ...

If you actively scale Container Apps replicas in Fleans clusters today, expect a window where new starts complete but their script tasks hang. The mitigation is graceful-shutdown logic that explicitly tombstones the silo’s row in Orleans’s membership table; the fast workaround is to force a new revision (any env-var change triggers it) which causes a clean cluster re-election.

2. Default Fleans:Streaming:Provider=memory is unsafe for more than 1 silo

Section titled “2. Default Fleans:Streaming:Provider=memory is unsafe for more than 1 silo”

In-memory Orleans streams are per-silo only. When the publisher and the consumer activation live on different silos, events are dropped:

warn: Orleans.Streams.StreamConsumerExtension
[GrainId worfklowevaluateconditioneventhandler ...]
got an item for subscription ..., but I don't have any subscriber
for that stream. Dropping on the floor.

Any non-memory provider (see Streaming providers) is multi-silo-safe; the default memory provider is dev-only despite being the default. Set Fleans__Streaming__Provider=Kafka or Fleans__Streaming__Provider=AzureQueue for any deployment with more than one silo.

WorkflowExecution.ProcessRegisterMessage extracts the correlation variable name with a literal "= " (= + space) prefix strip, not a real expression evaluator:

var variableName = messageDef.CorrelationKeyExpression.StartsWith("= ")
? messageDef.CorrelationKeyExpression[2..]
: messageDef.CorrelationKeyExpression;

Any expression that isn’t the exact form "= variableName" ("=requestId", "= request.Id", "= upper(requestId)") falls through and is looked up as a literal variable name, then throws InvalidOperationException at runtime when not found. Real Zeebe-style FEEL expressions are silently broken. The fixture used in our load tests happens to use the supported form, so we never hit it — but other BPMN you import almost certainly won’t.

In priority order, based on what we measured:

#ActionWhy
P0Bigger silo CPU. 0.5 vCPU is too small. Move to 1 vCPU / 2 GiB minimum, ideally 2 vCPU.Silo CPU is the first thing that pegs on Azure.
P1Add PgBouncer between silos and Postgres. EF Core opens a connection per DbContext; under burst this exhausts max_connections.The local Docker run failed at this exact wall (97 % errors).
P2Bigger Postgres SKU. Burstable B2s is fine for the events case at low VU; for sustained 500 VU+ pick General Purpose D2ds_v5 or larger.Removes the secondary cliff that follows P0.
P3Switch to a non-memory stream provider. Fleans__Streaming__Provider=Kafka or Fleans__Streaming__Provider=AzureQueue once you’re past 1 silo.The memory default silently drops cross-silo events.
P4Pre-warm before measurement. The first 30 s of each run mixes engine ramp-up with cold-start grain activation.Cleaner reported tails.

Both report files include a “How to reproduce” section that lists the exact az/docker commands. The Locust scripts under tests/load/locust/ are uploaded directly to Azure Load Testing as the test plan; the k6 scripts under tests/load/scripts/ are run locally against either the Aspire stack or tests/load/generated/docker-compose.yml.

For the Azure run specifically, you’ll need:

  • Azure subscription with access to Microsoft.ContainerRegistry, Microsoft.DBforPostgreSQL, Microsoft.Cache, Microsoft.App, and Microsoft.LoadTestService resource providers (auto-register on first use).

  • Either local az CLI (Python 3.13 may have a pyexpat issue depending on platform; if so, use mcr.microsoft.com/azure-cli from Docker), or just the Azure portal.

  • The released fleans-api container image, re-tagged into your ACR. docker pull the image from ghcr.io/nightbaker/fleans-api:<version>, then docker tag + docker push it to your ACR (e.g. myacr.azurecr.io/fleans-api:<version>). Optionally verify the upstream image via cosign verify first (see Self-host with Docker Compose for the canonical command).

    Terminal window
    VERSION=v0.1.0-beta
    ACR=myacr.azurecr.io
    docker pull ghcr.io/nightbaker/fleans-api:$VERSION
    docker tag ghcr.io/nightbaker/fleans-api:$VERSION $ACR/fleans-api:$VERSION
    docker push $ACR/fleans-api:$VERSION

    Add fleans-web only if you also want the management UI in Azure — it isn’t strictly needed for the tests themselves.

Estimated cost: ~$3 / day idle for the resource group, ~$5 in test runs.