Skip to main content

Blog · Spoke

n8n Dead Man's Switch and Heartbeat, Without the DIY

A dead man's switch alerts you when a job stops running. Here is why n8n workflows need one, why the DIY version quietly rots, and how to watch for the missing run without building it.

By Dima K. Published

A Dead Man’s Switch for n8n, Without Building One

Most alerts tell you something happened. A dead man’s switch tells you something didn’t.

That is the whole idea. You set up a signal that has to keep arriving on a schedule, and the moment it stops, you get told. It is the only kind of check that fires on silence instead of on noise, which is exactly the failure n8n is worst at showing you: the run that simply never came.

The catch is that the DIY version is more fragile than the problem it solves. So most people either skip it or build one that quietly rots.

Why n8n needs a switch that fires on silence

n8n is good at telling you a run failed. It is bad at telling you a run never started.

A failed execution leaves a record: a red node, an error, something to click. A missed run leaves nothing. The workflow still reads Active, the executions list looks calm, and the report that should have gone out at 09:00 just isn’t there. There is no failed execution because there was no execution at all.

Every normal alert is built around an event that happened. None of them fire on the absence of an event. That is the gap a dead man’s switch is meant to fill, and why it matters more for scheduled work than almost anything else. We went deeper on the missed run itself in the scheduled run that never fired; the switch is the other half of that story.

The DIY heartbeat, and why it rots

The do-it-yourself version is well known. You make your workflow send a ping on every run, to a small service or a cron-checker, and you configure that service to alert you if the ping does not arrive in time. When the workflow stops, the ping stops, the alert fires. A heartbeat.

It works, right up until it doesn’t, and it usually fails in quiet ways.

The first problem is that the heartbeat rides inside the same workflow it is supposed to watch. If the workflow never starts, the ping never sends, which is the point, but it also means the heartbeat shares every weakness of the thing it watches. If the instance is down, the heartbeat is down too.

The second problem is the interval. You have to tell the checker how long is too long, and a real workflow’s timing drifts. So you either set the window tight and get false alarms on every slightly-late run, or you set it loose and the alert arrives hours after it mattered.

The third problem is the one that actually gets you: it depends on you remembering to add it, correctly, to every workflow that matters, and to maintain it as schedules change. Across one workflow that is fine. Across a few dozen client workflows, “remember to wire and tune a heartbeat on each one” is not a plan. It is a wish.

So the DIY switch is real engineering applied to a problem that keeps shifting under it. For an agency, that is rarely the right place to spend the hours.

What a switch actually has to get right

Strip it down and a dead man’s switch for n8n has to do three things well.

It has to know what “on time” means for each workflow, not as a single global timeout but per workflow, because a once-a-week sync and a near-realtime feed are not late at the same point. It has to tolerate normal jitter, so a run that lands a little behind does not page you, while a run that genuinely vanished does. And it has to watch from outside the workflow, so the thing doing the watching does not die with the thing it watches.

Those three together are the difference between an alert you trust and an alert you learn to ignore.

Watching for the missing run without building it

This is the part NoCrash (n8n reliability) is built to do, and it is worth being specific about how, because “it watches your workflows” is not an answer.

NoCrash connects to n8n through the API and watches from outside. For scheduled workflows it learns each one’s normal cadence from its own history rather than asking you to declare a timeout, and it only flags a missed run when the gap exceeds that learned interval by a real margin, so ordinary lateness is absorbed instead of paged. Event-driven and webhook flows are judged on their outcome instead, not on a schedule, so they do not throw false missed-run alarms.

The result is the switch without the wiring. You do not add a ping node to every workflow, you do not tune a window per job, and the watch does not go dark when your instance does. When an expected run does not appear, that becomes a plain-language event instead of a silence nobody noticed.

That same outside-the-run logic is what catches the other quiet failures too: a green run that did nothing, Continue On Fail hiding an error, and an AI agent answering green over a failed tool. A missed run is just the version where there is no run at all.

Why this matters more for agencies

For your own automations, a missing run is an annoyance you will probably catch eventually. For a client’s, it is a trust problem you find out about late.

The client does not know the difference between a failed run and a missed one. They know the leads stopped arriving, the report did not land, the sync went stale. And because nothing turned red, nobody looked, sometimes for days. A switch that fires on silence is what turns “the client told us” into “we told the client,” and for an agency that one reversal is most of the value.

Check your workflow before the next quiet failure

A green execution is not always proof that the workflow did the job.

Run your exported n8n workflow through the free NoCrash Workflow Grader and get a quick read on the spots worth watching first. No access needed. No signup needed for the first look.

Run the free n8n Workflow Grader

If you want ongoing coverage after that, start free and watch up to 3 things continuously.

— NoCrash

Common questions

Frequently asked

What is a dead man's switch for n8n?
It is a check that alerts you when an expected run stops happening, instead of when a run fails. It fires on the absence of a signal, which is the failure n8n's normal alerts cannot see.
Can I build an n8n heartbeat myself?
Yes, by having each workflow ping an external checker on every run. It works, but it rides inside the workflow it watches, needs a hand-tuned interval per job, and has to be remembered and maintained on every workflow, which is where it tends to rot.
Why doesn't n8n alert me when a scheduled run is missed?
Because a missed run creates no execution, and n8n's alerts are built around executions that happened. With no failed run to react to, there is nothing for an Error Workflow to catch.
How do I avoid false alarms on a slightly late run?
Use a per-workflow expectation with a grace margin rather than one global timeout, so normal jitter is absorbed and only a genuinely missing run is flagged.

Stop finding out from your customers.

One morning message telling you what ran clean and what didn’t. Free forever on 3 things to watch.