No keepalive sent from client for 120 seconds

Those using Sensu for monitoring and alerting have seen this dreadful message many times. Almost always it indicates that node your are monitoring (sensu client) has failed to send keepalive for preconfigured duration of time. In other words - there is no communication between sensu client and sensu server. This could indicate network or host issues on the client side or problems with network route between client and the server.

However, at the times you are pretty sure that your client host is up, running and healthy. In fact its serving production traffic with ease. You can also ping server from the client and vice versa. I had come across such a scenario recently.

Troubleshooting

  • Restart client service e.g. for CentOs:

    systemct restart sensu-client.service
    

  • Restart server service e.g. for CentOs:

    systemct restart sensu-server.service
    

  • Remove client from the sensu server

    curl -X DELETE http://sensu-server:4567/client/client-name
    

Above request is so handy in day-to-day operations, that I have even made it into an alias. I normally use it when end-of-life a server manually.

Add this to your ~/.bash_profile or ~/.zprofile

alias shutup='function _shutup(){ curl --user username:password -s -i -X DELETE http://sensu.prod.loveholidays.com:4567/clients/$1; };_shutup'
Now run as:
shutup deprecated.server.com

  • Restart Redis service on the server e.g. for CentOs:

    systemct restart redis.service
    

  • Check time between client and the server This is an odd problem. We had a KVM VM pause due to the resource overcommitment on the host. We’ve discovered and resolved the issue and started the VM. However Sensu’s No keepalive sent notification would not stop.

Further investigation has revealed the following:

[user@sensu-client-host ~]# date
Fri 24 Mar 14:55:28 GMT 2017

[yser@sensu-server-host ~]# date
Fri 24 Mar 15:20:01 GMT 2017

We are 25 minutes in the past on our client VM.

Easy way to fix it - install & run ntpdate.

yum install ntpdate -y
ntpdate pool.ntp.org
24 Mar 15:21:29 ntpdate[32253]: step time server 82.220.2.2 offset 1461.980689 sec

[user@sensu-client-host ~]# systemctl restart sensu-client

Other helpful resources:

StackOverflow http://stackoverflow.com/questions/33734921/no-keepalive-sent-from-client

AgileTesting http://agiletesting.blogspot.co.uk/2012/11/quick-troubleshooting-of-sensu-no.html