Restart a failed service

Reading Time: < 1 minute

Sometimes my MariaDB database fails and crash. I am not sure why and I am not sure I want to find out why as I am guessing it would require some investigation and it may be above my pay grade.

It happens quiet rarely but every now and then it happens and when it does, my supervision is pinging on Slack and sending emails until it’s fixed.

One really bad option to hide this problem would be to automatically restart the database when it fails. I could run a cron every minute to check that the database is up.

It’s a very bad option. If we push this solution to its extreme, I could find myself in a position where the database is crashing every few minutes, and I restart it right away, before the supervision would detect the problem, and I would not be aware of the problem. No need to say that having a database crashing every few minutes is suboptimal…

But we could still do it for now as it seems it is crashing really rarely.

The easiest option I’ve found is to use systemctl.

Running this in a cron will do the job:

* * * * * systemctl is-failed --quiet mariadb.service && systemctl start mariadb.service

And maybe I could get a notification that this happened by sending myself an email… except emailing from a server is hard these days… If you know, you know.

By Marc Olivier Meunier

Marc has spent the past few years putting oil on the fire of a hyper growth ad tech company. At he was in charge of scaling the support and its culture. At Eficode he is now leading an engineering team and running operations. He leads by example and puts a lot of emphasis on diversity and inclusion, constantly working to create a safe environment. A warm leader with a passion for memorable experiences and innovation.
Find Marc on Linkedin

Leave a Reply

Your email address will not be published. Required fields are marked *