Solving Sensu's No Keepalive Sent
No keepalive sent from client for 120 seconds
Those using Sensu for monitoring and alerting have seen this dreadful message many times. Almost always it indicates that node your are monitoring (sensu client) has failed to send keepalive for preconfigured duration of time. In other words - there is no communication between sensu client and sensu server. This could indicate network or host issues on the client side or problems with network route between client and the server.
However, at the times you are pretty sure that your client host is up, running and healthy. In fact its serving production traffic with ease. You can also ping server from the client and vice versa. I had come across such a scenario recently.
Troubleshooting
-
Restart client service e.g. for CentOs:
systemct restart sensu-client.service
-
Restart server service e.g. for CentOs:
systemct restart sensu-server.service
-
Remove client from the sensu server
curl -X DELETE http://sensu-server:4567/client/client-name
Above request is so handy in day-to-day operations, that I have even made it into an alias. I normally use it when end-of-life a server manually.
Add this to your ~/.bash_profile or ~/.zprofile
alias shutup='function _shutup(){ curl --user username:password -s -i -X DELETE http://sensu.prod.loveholidays.com:4567/clients/$1; };_shutup'
shutup deprecated.server.com
-
Restart Redis service on the server e.g. for CentOs:
systemct restart redis.service
-
Check time between client and the server This is an odd problem. We had a KVM VM pause due to the resource overcommitment on the host. We’ve discovered and resolved the issue and started the VM. However Sensu’s No keepalive sent notification would not stop.
Further investigation has revealed the following:
[user@sensu-client-host ~]# date
Fri 24 Mar 14:55:28 GMT 2017
[yser@sensu-server-host ~]# date
Fri 24 Mar 15:20:01 GMT 2017
We are 25 minutes in the past on our client VM.
Easy way to fix it - install & run ntpdate.
yum install ntpdate -y
ntpdate pool.ntp.org
24 Mar 15:21:29 ntpdate[32253]: step time server 82.220.2.2 offset 1461.980689 sec
[user@sensu-client-host ~]# systemctl restart sensu-client
Other helpful resources:
StackOverflow http://stackoverflow.com/questions/33734921/no-keepalive-sent-from-client
AgileTesting http://agiletesting.blogspot.co.uk/2012/11/quick-troubleshooting-of-sensu-no.html