The case of the flapping VMware Secure Token Service

sick-vmwareSo after upgrading to vCenter 5.5.0b we encountered a problem where the VMware Secure Token Service would not stay started. It would start and then immediately fail. Some initial poking around lead to looking at the STS logs in C:\ProgramData\VMware\CIS\runtime\VMwareSTS\logs. After checking the catalina log for the current date (catalina.2014-03-21.log) I noticed a bunch of SEVERE errors like the following:

SEVERE [WrapperSimpleAppMain] org.apache.coyote.AbstractProtocol.init Failed to initialize end point associated with ProtocolHandler [“http-bio-7080”]

This error was in the vpxd log:

Unable to create SSO facade: No connection could be made because the target machine actively refused it.

And finally a few java errors:

java.net.BindException: Address already in use: JVM_Bind <null>:7080

Staring at those errors lead me to remember where I’ve seen “7080” before. Long ago vCenter Converter Standalone was installed on the system and during its configuration port 7080 was selected. As it turns out this port is needed in order for the Secure Token Service to run but its nowhere to be found in the Required ports for vCenter 5.5 KB article. You can check what ports are being used by vCenter Converter by looking at the XML located at use C:\ProgramData\VMware\VMware vCenter Converter Standalone\converter-server.xml and drill down to the proxySvc\ports\http node.

Stopping the vCenter Converter services and/or changing the port resolves this issue. This probably wont be true for most of you so look for any services using port 7080 (netstat -abn might help).

vCenter Operations Manager IP Pool Error

Following the vCenter Operations Manager 5.8.1 installation and deployment guide leads you to a notice that in order for the deployment of the vApp to work properly you must create an IP Pool and associate it to the portgroup where the vApp is to be connected. IP Pools are created at the Datacenter level. After creating the pool and deploying the app I was all set to power up the vApp. At power on an error was returned:

Cannot initialize property ‘vami.netma-sk0.VM_1’. Network has no associated network protocol profile.

Googling this error will lead you to a few places where its mentioned the issue is that you did not create an IP Pool. The problem was I did in fact create this pool. What the issue turned out to be was we have multiple dVswitches for different clusters that have the same portgroup names. Even though I triple checked the correct portgroup where the vApp was located did indeed have an IP Pool associated this did not rectify the error. The fix was to go back to the IP Pool configuration section, right click on the pool, and edit the properties. Once inside go to the associations tab and select all portgroups that have similar names.

Another quick fact about this IP Pool is you do not need to select Enable IP Pool inside of the pool settings. This checkbox option is only necessary if you intend to specify a range of IP’s.

Warning to ESXi 5.5U1 upgraders

Warning_esxi55Just a few reminders out there for those looking to upgrade to ESXi 5.5U1 from anything that is not 5.5. Keep in mind with this version that VMware removed drivers for devices that are not on the HCL. This includes a few NICs like Realtek and Marvell and possibly a few SATA controllers. In order to prevent you from this disaster the best way to accomplish the upgrade is using the profile update esxcli command. Details to follow soon!

 

VM Guest Customizations failing

A new VM was deployed from a template but the VM guest customizations would not complete. The error that was received was:

LaunchDll:Could not load DLL C:\Windows\system32\iesysprep.dll

Solution:

The template was made with Windows 2008 x64. Before the VM was made into a template it had IE9 installed. IE9 was then uninstalled to downgrade back to IE7. It appears when IE is upgraded some more sysprep steps are added for IE. The removal of IE9 did not remove these extra steps and when sysprep goes to call the .dll it is not present.

The additional sysprep steps were removed from the registry:

Navigate to:

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Setup\Sysprep

 Under each of these keys, Cleanup, Generalize, Specialize, delete any value that looks like:

 C:\Windows\System32\iesysprep.dll,Sysprep_Cleanup_IE

Cloning vCenter VM to another cluster

Here are the steps I took to move the vCenter VM from one cluster to a new cluster with new storage. This new cluster also had hosts which were of a different CPU type.

1) Remove any snapshots from the vCenter VM.

2) Clone vCenter server to new cluster and/or storage.

3) Configure new vCenter clone with proper hardware, network, and switch settings.

4) Make note of which ESXi hosts each vCenter server resides on.

5) Power off source vCenter Server – I issued this command while still in the vSphere client and then disconnected from vCenter and connected to the ESXi host that vCenter resides on. I did this to monitor the shutdown. Monitoring the shutdown is optional.

6) Connect to ESXi host where the  vCenter clone resides.

7) Power on vCenter clone.

8) Connect to console on new vCenter clone – Logon to the machine, you will most likely need to configure the network settings and/or reboot the machine once more after new hardware is detected (if the host machine is a different architecture).

9) Verify the  new vCenter can connect with the vCenter DB and verify you can connect with either the vSphere client or vSphere Web client.

When the new vCenter clone was done configuring new hardware and such it was up and running without any issues. We left the old vCenter machine around for a few days before we removed it.

Register vCenter server with vSphere Web Client using the command line

Open up your command line interface of choice and navigate to the vSphere Web client scripts directory which is located in the install directory where it was installed. For example:

InstallDir\VMware\Infrastructure\vSphereWebClient\Scripts

Once there execute:

Admin-cmd.bat register <vsphere-web-client-url> <vcenter-server-url> user password
(ex. Admin-cmd.bat register https://MyNewVcenter:9443/vsphere-client https://MyNewVcenter AdminUser AdminPW123)

You will then need to take action on the untrusted certificate prompt:

admin-cmd_1

It will notify you of a successful completion. If you run the “admin-cmd” by itself it will output some help.

Management Traffic VMK checkbox in vCenter

I was always leery about what constitutes management traffic when going through a kernel port. Well this post from Duncan Epping over at Yellow Bricks pretty much sums it up.

The feature described as “Management traffic” does nothing more than enabling that VMkernel NIC for HA heartbeat traffic.

Much clearer when you put it like that. The vCenter Server Best Practices for Networking touches on this but it’s worded differently.

On ESXi hosts in the cluster, vSphere HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, vSphere HA shares it with vMotion, if necessary. With ESXi 4.x and ESXi, you must also explicitly enable the Management traffic checkbox for vSphere HA to use this network.