The last few days I have been troubleshooting a very strange issue with the C# vSphere client on a new vSphere 6.0 install for a customer. The vSphere Client was initially working fine, until I replaced the Machine SSL Certificate for vCenter. After the Machine SSL Certificate was replaced, the vSphere client would timeout on connection. The issue was only connecting to vCenter, if connecting the vSphere client directly to hosts the client worked fine.
If I reverted back to VMCA signed certs, the vSphere client would begin working again. To make it even stranger, sometimes the client would actually connect but it would take upwards of 60 seconds to do so.
This particular customer is using an externally published CA. To clarify, the vSphere webclient was working. It was just the C# client that was causing issues.
The error that was shown by the vSphere client on login is as follows
To begin troubleshooting, I used Baretail to tail the vi-client logs whilst the vsphere client was connecting. This is an excellent tool that is available for free here.
I created a filter to highlight text with the word “Error” in red and “Warning” in yellow and opened the vi-client log located in the following directory.
The following log snippit shows a socket error whilst the client is connecting, just before the connection fails
Relevant text from the log is here. I have masked the name of the customers vCenter server.
[viclient:Error :W: 6] 2016-05-28 17:48:19.743 RMI Error Vmomi.ServiceInstance.RetrieveContent – 1
<Message>The request failed because the remote server ‘SERVER FQDN’ took too long to respond. (The command has timed out as the remote server is taking too long to respond.)</Message>
<Message>The command has timed out as the remote server is taking too long to respond.</Message>
<Target type=”ManagedObject”>ServiceInstance:ServiceInstance [SERVER FQDN]</Target>
To dig deeper into why I was getting a socket error, I fired up procmon from sysinternals to find out what the client was doing when it failed. In sysinternals I created a filter to only output activity created by vpxclient.exe
Notice the time difference of seven seconds from both TCP Reconnects. This TCP reconnect would reoccur multiple times until the vSphere client timed out and subsequently failed.
I was curious on the status of this TCP connection, so I started another great sysinternals tools called Process Explorer. Process Explorer allows you to check a corresponding process’s Network status, including remote addresses and ports, along with the status of the connection. Selecting vpxclient.exe in Process Explorer showed the following under TCP/IP
You can see the same remote connection to Akamai in process explorer. The status of the connection is SYN_SENT, yet the connection is never established.
I was certain this external connection was causing the vSphere client to timeout. Since the customer is using a third party issued cert, the client is checking the CRL of the cert on the internet. This is why I did not see the error using the Self-Signed VMCA vCenter Machine SSL Cert. You can see the cert is using an external CRL distribution point in the screenshot below.
I ran an NSLOOKUP on the CRL distribution point hostname, and the address matched the Akamai address space with a cname pointing to the CRL.
After all this, I began troubleshooting why the vSphere Client could not connect to the CRL distribution point. Well it turns out after all this the corporate proxy was not configured in Internet Explorer, so the management servers where the vSphere client was installed could not access the CRL address for the certs.
Once I had the details for the proxy and configured it in internet explorer, the vSphere client successfully created a TCP connection to the CRL on login and, then connected successfully to vCenter with no timeout. This seemed to only need to be configured once. I removed the proxy for subsequent logins and the vSphere Client connected fine.
My recommendation would be if you do replace vSphere certificates, is to use an internal managed enterprise CA with a certificate revocation list that can be accessed internally. Also add a copy of Procmon, Process Explorer and Baretail to your troubleshooting toolkit if you don’t already. They are all great tools that have helped me multiple times in the past.