Reconfigure vSphere HA Host Operation Timed Out


I think it is worth sharing what I learned Today. We found that HA is not functioning on all hosts in a vSphere cluster. Reconfiguring the vSphere HA host always failed with ‘Operation timed out’ error message. I checked the fdm.log on couples servers, and found them all have such message:

error ‘Election’ opID=SWI-cb1a0483] ReadMsg: [120 times] Wrong fault domain ID: 9148BCE8-A6E7-45D7-B591-76C15A3F6470-26-9e10b65-my-vCenter!= 9148BCE8-A6E7-45D7-B591-76C15A3F6470-26-8284ae7-my-vCenter from 192.168.1.102

My understanding of this message is that the Master election process failed due to the different fault domain between local host and the remote host (192.168.1.102). The weird thing is 192.168.1.102 is a host that has been placed into maintenance mode. So I guess it could be caused by that the 192.168.1.102 was the master in the fault domain, but somehow it failed to tell other hosts while it quit from the domain and enter into the maintenance mode. To approve my guess, I pull the host back from maintenance mode, all red alarm of the HA failure disappeared right away!!

I checked the HA status, all look good. A new master has been elected successfully. Then I place the 192.168.1.102 into maintenance again, HA on all other hosts still functioning well.

image

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s