PagerDuty for device fleets

In any part of the process- from the first line of code to v1.0- it is in a the developers best interest to outsource commonplace tools instead of building and maintaining their own. Today the cloud has reached a maturity where DevOps is for the most part handled by a set of tools. This makes…

In any part of the process, from the first line of code to v1.0, it is in a the developers best interest to outsource commonplace tools instead of building and maintaining their own.

Today the cloud has reached a maturity where DevOps is for the most part handled by a set of tools. This makes tasks more automated and reliable. Meanwhile the embedded world lags behind. In part resin.io serves to bring IoT up to speed at least from a deployment and management perspective. For more on how we are doing that, read this.

But not everything is unique to IoT and as I mentioned it’s silly to reinvent the wheel. There are many tools that can be directly translated from the cloud to the hardware. PagerDuty being one of them. It’s a great alert system for incidents. Which is good for notifying you when your servers crash at 4am. But what about when your customers device looses connection to the internet? There are many use cases where this should cause state of emergency. Resin.io + PagerDuty to the rescue!

pagerduty incident

To minimize downtime, or atleast unacknowledged downtime, across your fleet I’ve created a Heroku worker process. The script polls the resin.io api to check if all devices are online. If a device is offline it will create a pagerDuty incident. From there you can escalate the incident as you ordinarily would on pagerDuty. Simple but effective.

Here is the repo, just push to Heroku and add a couple of environment variables to configure.

Thats it! One step closer to operational maturity for the embedded world!

Have questions, or just want to say hi? Find us on our community chat.


Posted

in

Tags: