Tuesday, October 13, 2009

VMware ESXi Infrastructure Client Issues revisited

In a previous blog entry I posted about how to restart the VMware management agents in the event that you could not connect via the Infrastructure Client. I have found however this does not always work and sometimes one has to get a bit more involved to regain control of a non responsive host.

After restarting the management agents within dcui from the ssh prompts sometimes you will find that the host is still unresponsive. You may also observe the following.

Before our restart in DCUI


~ # ps aux | grep hostd
62991508 22972707 hostd hostd
30481665 22972707 hostd hostd
22972707 22972707 hostd hostd
42826021 22972707 hostd hostd
42826022 22972707 hostd hostd
42826023 22972707 hostd hostd
62069036 22972707 hostd hostd
62069039 22972707 hostd hostd
45624624 22972707 hostd hostd
58286499 22972707 hostd hostd
63749610 22972707 hostd hostd

After the restart in DCUI

~ # ps aux | grep hostd
62991508 22972707 hostd hostd
30481665 22972707 hostd hostd
22972707 22972707 hostd hostd
42826021 22972707 hostd hostd
42826022 22972707 hostd hostd
42826023 22972707 hostd hostd
62069036 22972707 hostd hostd
62069039 22972707 hostd hostd
45624624 22972707 hostd hostd
58286499 22972707 hostd hostd
63749610 22972707 hostd hostd

Notice none of the PID's changed?

In order to get the host responding again to our Infrastructure client we have to kill off these processes with and restart the hostd process. After killing the processes and restarting hostd you should be able to connect again with the infrastructure client. If you don't have ssh enabled on your host you can also do this from the host console.

kill -9 PID#
/etc/init.d/hostd start

2 comments:

Iain said...

Thanks so much! I am running ESXi 3.5 and have not been able to connect to it for 2 weeks, been going mad!!!! All info is for ESX and restart options I dont have!

Cheers again.

Robert Chase said...

@Iain

I was in a similar position of loosing my console and not being able to restart the hypervisor. Rather than restart an entire CDN I did some digging.

I have noticed that later versions of ESXi are not as susceptible to this problem as earlier versions. You may want to consider upgrading when it comes time for your next maintenance window.