Workflow Engines (Durable Execution)
Long-running, retry-safe, multi-step server workflows.
The "queue" question (see Queues) is "how do I run a job later?" The "workflow" question is "how do I run a 17-step process that takes 3 days, survives restarts, retries each step independently, and is observable end-to-end?" — that's durable execution.
TS / Node-friendly
- ★ Inngest — TS SDK, declarative steps (
step.run,step.sleep,step.waitForEvent), great free tier; runs your code on your infra. The most-recommended pick for new TS projects. - ★ Trigger.dev — open source + hosted; same niche as Inngest with its own ergonomic SDK; generous free tier; self-hostable.
- Restate — open source durable execution; SDK in TS, Java, Kotlin, Go, Rust, Python; deploy as a stateful service.
- DBOS (database-backed durable execution) — TS + Python SDKs, durable on Postgres; relatively new.
- Cloudflare Workflows — durable, multi-step workflows on Workers; native to Cloudflare.
Polyglot heavyweight
- Temporal — the gold standard for serious enterprise workflows; SDKs in TS, Go, Java, Python, .NET. Powerful, complex; self-host or Temporal Cloud.
- Cadence — Uber's predecessor to Temporal; less common today.
- Conductor (Netflix) — JSON workflow definitions; OSS.
- Camunda / Activiti — BPMN-based; for big-enterprise process modeling.
- Argo Workflows — Kubernetes-native.
Lighter / event-driven
- Mastra — TS framework; has workflows for AI agents; see AI/LLM.
- Zeebe — workflow engine often paired with Camunda.
- Prefect / Dagster — Python-flavored workflow tools (data pipeline focused).
Patterns to know
- Determinism — workflow code must be deterministic; non-determinism (random, time, network) goes through SDK helpers (
step.run,ctx.now). - Idempotent steps — required so retries don't double-charge / double-send.
- Side-effect isolation — wrap calls to external systems in steps; the engine remembers the result.
- Wait-for-event — pause until "user clicked link" or "stripe webhook arrived"; survives weeks.
- Compensation / sagas — if step 5 of 7 fails, run rollback steps 4..1.
When to use a workflow engine vs. a queue
- Queue: "send this email later." Queue + retry is enough.
- Workflow: "send email, wait 3 days for click, if no click send reminder, after another 7 days mark dormant." Reach for durable execution.
- State machine in your DB + cron: works up to ~5 steps; falls over fast.
Pick this if…
- Default new TS project, hosted free tier: Inngest or Trigger.dev.
- Self-host, OSS, polyglot: Restate or Temporal.
- Enterprise scale, mature ecosystem: Temporal.
- All-Cloudflare: Cloudflare Workflows.
- Postgres-backed, simplest possible: DBOS.