Skip to content

Streaming providers

Fleans publishes domain events (EvaluateConditionEvent, EvaluateActivationConditionEvent, ExecuteScriptEvent, …) through Orleans Streams. The stream provider is pluggable at startup: the default is Orleans’ in-memory provider, and a Kafka-backed provider ships alongside it for cross-silo event durability.

The provider name is read once at silo startup from configuration:

Configuration keyValuesDefault
Fleans:Streaming:Providermemory, kafka, azurequeuememory
Fleans:Streaming:Kafka:Brokershost:port[,…]
Fleans:Streaming:Kafka:ConsumerGroupstringfleans
Fleans:Streaming:Kafka:TopicPrefixstringfleans-
Fleans:Streaming:Kafka:QueueCountinteger1
Fleans:Streaming:Kafka:NumPartitionsinteger1
Fleans:Streaming:Kafka:ReplicationFactorinteger1
Fleans:Streaming:AzureQueue:ConnectionStringAzure Storage connection string
Fleans:Streaming:AzureQueue:AccountNameAzure Storage account name
Fleans:Streaming:AzureQueue:QueueNamesJSON array of strings["fleans-stream-0"…"fleans-stream-7"]
Fleans:Streaming:AzureQueue:MessageVisibilityTimeoutTimeSpan string (e.g. 00:01:00)Azure SDK default (30 s)

Match is case-insensitive. Any other value throws ArgumentException at startup — add a new case to FleanStreamingExtensions.AddFleanStreaming if you ship another provider.

For azurequeue, exactly one of ConnectionString or AccountName must be set. ConnectionString is used for local dev (Azurite). AccountName enables Managed Identity (DefaultAzureCredential) for production Azure deployments.

Backed by AddMemoryStreams. Zero infrastructure. Events are lost on silo crash. Subscription state survives because Aspire wires PubSubStore to Redis, but in-flight stream messages do not.

Right for: local development, all default tests, and single-silo deployments where you accept the event-loss profile of a silo crash.

Backed by an in-repo Fleans.Streaming.Kafka adapter built on Confluent.Kafka 2.6.1 (librdkafka 2.6.x bundled).

Right for: production rollouts that need event durability beyond a single silo, CI environments that want broker parity with production, and topologies where a non-Orleans consumer may join the topics later (note: the codec is currently the Orleans codec — see limitations below).

azurequeue — opt-in cross-silo durability (Azure-native)

Section titled “azurequeue — opt-in cross-silo durability (Azure-native)”

Backed by Microsoft.Orleans.Streaming.AzureStorage — the first-party Microsoft Orleans adapter for Azure Queue Storage. This is a thin wrapper, not a custom adapter, so behaviour, retry semantics, and dead-letter handling are governed by the Microsoft package.

Right for: Azure-native deployments that need cross-silo event durability without running Kafka. Uses Azure Queue Storage (cheaper than Service Bus at scale; Orleans StreamSequenceToken provides per-queue ordering). In dev mode, Aspire auto-provisions Azurite so no Azure subscription is needed.

For auth options: set ConnectionString for Azurite or an explicit connection string; set AccountName to use DefaultAzureCredential (Managed Identity, recommended for Container Apps).

The Kafka adapter shipped in v1 is intentionally minimal. The following gaps are tracked under #474 — Production-ready Kafka streaming; pick the tier that matches your risk tolerance.

These are deploy-day blockers — your silo will fail to connect, period.

GapFailure modeAffected services
No SecurityProtocol knoblibrdkafka prints Disconnected: SASL authentication required and the silo health check goes redConfluent Cloud, Aiven, Redpanda Cloud, any TLS-only listener
No SASL/PLAIN, SCRAM, or OAUTHBEARERlibrdkafka cannot authenticate; broker rejects the connection during the AdminClient GetMetadata warm-upAll managed brokers requiring credential auth
No AWS MSK IAMlibrdkafka ssl.ca.location + token vendor not wiredAmazon MSK with IAM auth
No client-cert / mTLSNo ssl.certificate.pem / ssl.key.pem configSelf-managed clusters with mutual TLS

These are SLO trade-offs — connection succeeds but the durability profile is below production-typical.

GapFailure modeMitigation today
ReplicationFactor=1 defaultBroker crash with unflushed log segments → events permanently lostSet Fleans:Streaming:Kafka:ReplicationFactor=3 once you have a 3-broker cluster
At-least-once delivery (offset commit after handler success)Silo crash between handler success and offset commit → event replaysSafe by design — every consumer routes to a WorkflowInstance method guarded by HasActiveEntry. Do not remove that guard. See Delivery contract
No idempotent producerProducer retries can reorder messages within a partitionNone today; consumer-side guard above absorbs duplicates

These won’t break a deploy, but they will cost you operationally.

GapWhy it hurtsWorkaround
NumPartitions=1 defaultNew topics created with 1 partition — no broker-side write fan-out per topicBump Fleans:Streaming:Kafka:NumPartitions and re-create or kafka-topics --alter --partitions N existing topics (forward-only; cannot shrink). See Tuning throughput
No schema registry integrationCodec is the Orleans codec; non-Orleans consumers cannot decode the streamTreat Kafka topics as silo-internal until a follow-up adds Avro/Protobuf framing
No DLQ for poison messagesA consistently-failing handler replays foreverManual offset commit via AdminClient is your only escape today
No metrics/health-check for consumer lagLag is invisible to the Aspire dashboard’s health UIRun kafka-consumer-groups.sh --describe --group fleans against the broker

Manual test plan #35 exercises the happy path against a single-broker Aspire-provisioned fleans-kafka resource — it validates that the at-least-once contract holds across a silo restart, but it does not test any production failure mode listed above.

Production-readiness gaps — Azure Queue Storage

Section titled “Production-readiness gaps — Azure Queue Storage”

The Azure Queue provider is backed by the Microsoft-maintained Microsoft.Orleans.Streaming.AzureStorage package, which covers several production concerns automatically.

ConcernStatus
Managed Identity (AccountName path)✅ Covered — uses DefaultAzureCredential
Message retry / poison-message handling✅ Covered — MS provider retries up to configurable limit; exhausted messages move to *-poison queue automatically
OTLP monitoring✅ Covered — Orleans Streams telemetry emitted via Orleans.Telemetry; no extra wiring needed
Managed Identity token rotationDefaultAzureCredential handles refresh transparently
Cross-queue message ordering⚠️ Per-queue FIFO only — ordering across the 8 default fleans-stream-* queues is not guaranteed (same trade-off as Kafka partitions)

Manual test plan #44 exercises the happy path with Azurite.

The Kafka adapter checkpoints offsets after the consumer-side handler completes. On a silo crash between handler success and offset commit, the next consumer instance resumes from the last committed offset, which means the in-flight event will replay.

The replay is safe because every Fleans consumer eventually routes to a method on WorkflowInstance (CompleteActivity, FailActivity, …) that opens with a stale-callback guard:

if (!State.HasActiveEntry(activityInstanceId))
{
LogStaleCallbackIgnored(...);
return;
}

Once the activity instance has completed once, its entry leaves HasActiveEntry, so a second delivery is a logged no-op. Do not remove that guard without re-reading this page — it is what makes the at-least-once contract safe for the workflow engine.

KafkaQueueAdapterFactory.CreateAdapter() runs a one-shot topic-ensure step on first activation:

  1. Build the expected topic list ({TopicPrefix.TrimEnd('-')}-{0..QueueCount-1}).
  2. AdminClient.GetMetadata to discover what already exists on the broker.
  3. AdminClient.CreateTopicsAsync for missing topics with the configured NumPartitions and ReplicationFactor.
  4. TopicAlreadyExists and NoError are both treated as success (a peer silo may have just created the topic).

The broker-side allow.auto.create.topics flag is not required — it is documented here as a fallback for self-managed clusters that prefer it, but managed clusters (Confluent Cloud, MSK) disable it by default and the client-side path is what we ship.

Set Fleans__Streaming__Provider on every fleans-* workload via your deployment tool. The engine accepts four values: Redis (default), Memory, Kafka, AzureQueue. Both the Docker Compose bundle and the Helm chart ship with sensible defaults — you only need an explicit switch when moving off the default.

The bundle ships with Redis as the default streaming provider. You only need an override file to switch away from Redis — for the production-default Redis behaviour, no edits are required.

Drop a docker-compose.override.yaml next to the bundle’s docker-compose.yaml with the same env var on every fleans-* service:

services:
fleans-core:
environment:
Fleans__Streaming__Provider: Memory # or Kafka, AzureQueue
fleans-management:
environment:
Fleans__Streaming__Provider: Memory
fleans-mcp:
environment:
Fleans__Streaming__Provider: Memory
fleans-worker:
environment:
Fleans__Streaming__Provider: Memory

For Kafka, add Fleans__Streaming__Kafka__Brokers: "kafka:9092" (replace with your broker host:port). For Azure Queue, add Fleans__Streaming__AzureQueue__ConnectionString: "<your-connection-string>". Then docker compose up -d from the bundle directory — Compose merges the override automatically.

This streaming override is a pure service-level env addition (unlike the plugin-host override in Writing custom-task plugins, which depends on the fleans_default network being created by an engine up first). You can docker compose down && docker compose up -d from the bundle directory at any time to apply the change — engine-first ordering is not required.

(For the source-tree dev workflow — only relevant if you cloned the engine repo — see the engine CLAUDE.md’s Things to Know section for the AppHost-side FLEANS_STREAMING_PROVIDER knob.)

Redis (default): Reuses the same Redis instance the engine already runs for clustering and PubSubStore, so stream events cost zero extra infrastructure. Uses the third-party Universley.OrleansContrib.StreamsProvider.Redis package (MIT, Orleans 10.x-compatible), pinned in Fleans.ServiceDefaults.csproj. Production requirement: enable Redis persistence (AOF or RDB) — without it, a Redis restart loses in-flight messages (same caveat as PubSubStore). The chart’s bundled fleans-redis enables AOF by default; see Self-host with Helm for the Redis subchart values.

Memory: No external infrastructure. Drops in-flight events on silo restart — single-silo debug-only. Suitable for local dev and CI smoke tests; never for production.

Kafka: Supply your own Kafka cluster — the chart’s streaming.kafka.brokers value takes a Kafka bootstrap-server string. For an in-cluster Kafka, the Bitnami Kafka Helm chart is the conventional choice. The silo forwards Fleans__Streaming__Provider=Kafka + Fleans__Streaming__Kafka__Brokers to the Kafka producer.

AzureQueue: Supply an Azure Queue Storage connection string — see Microsoft’s Azure Storage connection-string docs for the format. For local dev, run Azurite as a sibling Compose service and point the connection string at it. The silo forwards Fleans__Streaming__Provider=AzureQueue + Fleans__Streaming__AzureQueue__ConnectionString to the Azure Queue producer.

Connection multiplexer aliasing (Redis only)

Section titled “Connection multiplexer aliasing (Redis only)”

The third-party Redis stream provider resolves a non-keyed IConnectionMultiplexer from DI. Aspire registers it as a keyed service (AddKeyedRedisClient("orleans-redis")). FleanStreamingExtensions.AddRedisStreams bridges the two with TryAddSingleton<IConnectionMultiplexer>(...) — reusing the same connection pool, no duplicate sockets. If another component needs a different non-keyed multiplexer, register it explicitly before AddFleanStreaming runs and TryAddSingleton will defer.

PubSubStore is the Orleans grain-storage location for subscription state (who is subscribed to which stream id). Aspire wires it to Redis at Fleans.Aspire/Program.cs, so subscription state already survives silo restarts under both providers.

What changes when you opt into Kafka is event durability — the bytes flowing between WorkflowEventsPublisher.Publish and WorkflowExecuteScriptEventHandler.OnNextAsync. With the in-memory provider, those bytes live in the silo’s process memory. With Kafka, they land in a broker topic before the handler runs.

Confluent.Kafka 2.6.1 / librdkafka 2.6.x are the supported versions. The adapter targets net10.0 (matching every other silo csproj in this repo).

Throughput across the queue-backed providers is governed by two independent classes of knobs. The first is an Orleans-level concept shared by every provider; the second is Kafka-specific and does not have analogues elsewhere.

Orleans parallelism (all queue-backed providers)

Section titled “Orleans parallelism (all queue-backed providers)”

Fleans:Streaming:Redis:TotalQueueCount, Fleans:Streaming:Kafka:QueueCount, and the length of Fleans:Streaming:AzureQueue:QueueNames all set the same thing: the number of Orleans pulling-agent grains that activate across the cluster. Stream IDs hash across this many partitions; each partition is consumed by one pulling agent. More partitions = more parallel consumption.

ProviderKnobDefaultNotes
RedisFleans__Streaming__Redis__TotalQueueCount8Configurable as of v0.2.0 (#567).
KafkaFleans__Streaming__Kafka__QueueCount8Was 1 before v0.2.0 (#567) — call out as a behavior change for deployments that left it unset.
AzureQueueFleans__Streaming__AzureQueue__QueueNames__0..N8 entries (fleans-stream-0..7)No scalar count — tuning above OR below 8 requires populating the explicit list by hand. Mid-deployment length changes rehash Stream IDs across queues.

Sizing heuristic. Start with max(8, 2 × silo_count × cpu_per_silo) for high-volume deployments and measure with the Orleans dashboard (wired at Fleans.Api/Program.cs) before raising further. In a multi-silo deployment, agents distribute across silos via the hash ring — total cluster-wide count = the configured value, per-silo count ≈ ceil(value / silo_count).

Rehash caveat (all providers). Bumping the count rehashes Stream IDs across queues. Consistent with the project’s pre-v1 stance, expect in-flight workflow stalls across the bump window — no formal drain procedure is provided. After the bump, new workflow instances distribute over the new queue count; in-flight instances that straddle the rehash may stall until reactivation picks them up on the new queue assignment.

Tradeoff. Each partition adds a pulling-agent grain per silo plus provider-side resource overhead (a Redis connection, a Kafka consumer, an AzureQueue receiver lease).

Production caveats already documented elsewhere. Redis streaming requires AOF or RDB persistence — see the Streaming section above; Kafka requires its own cluster lifecycle ops — see Production-readiness gaps — Kafka.

These knobs control the Kafka broker-side topology only. They are independent of QueueCount and do not multiply Orleans consumer parallelism.

KnobDefaultWhat it does
Fleans__Streaming__Kafka__NumPartitions1Partition count for each Kafka topic at topic creation. Adds broker-side write parallelism (the producer can write to N partitions concurrently) and partition-level ordering granularity within one Orleans queue. Does not add Orleans consumer parallelism — the consumer subscribes via consumer-group Subscribe mode (one consumer per topic, regardless of partition count). Forward-only: kafka-topics --alter --partitions N can grow but not shrink, and downgrading the engine after a bump leaves topics over-partitioned.
Fleans__Streaming__Kafka__ReplicationFactor1Per-topic replication factor at topic creation. Production deployments should set to 3 once a 3-broker cluster is available — see the durability gap above.
  • Observability — health checks, metrics, logging, tracing, dashboards, alerting
  • Deployment — how to wire Fleans:Streaming:Provider / Fleans:Streaming:Kafka:Brokers into Docker Compose, Kubernetes, and bare-VM deployments.
  • Self-Hosting on Kubernetes — Kafka opt-in on the Helm chart.