Warning to ESXi 5.5U1 upgraders

Warning_esxi55Just a few reminders out there for those looking to upgrade to ESXi 5.5U1 from anything that is not 5.5. Keep in mind with this version that VMware removed drivers for devices that are not on the HCL. This includes a few NICs like Realtek and Marvell and possibly a few SATA controllers. In order to prevent you from this disaster the best way to accomplish the upgrade is using the profile update esxcli command. Details to follow soon!

 

Management Traffic VMK checkbox in vCenter

I was always leery about what constitutes management traffic when going through a kernel port. Well this post from Duncan Epping over at Yellow Bricks pretty much sums it up.

The feature described as “Management traffic” does nothing more than enabling that VMkernel NIC for HA heartbeat traffic.

Much clearer when you put it like that. The vCenter Server Best Practices for Networking touches on this but it’s worded differently.

On ESXi hosts in the cluster, vSphere HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, vSphere HA shares it with vMotion, if necessary. With ESXi 4.x and ESXi, you must also explicitly enable the Management traffic checkbox for vSphere HA to use this network.

 

ZFS, NFS, ESXi and slow performance

On a previous post I mentioned putting the ZIL on a SLOG. Let me get into why this was done. There is a known problem with NFS as it relates to synchronous writes, especially with ESXi. I knew of this problem when I set out to use NFS for ESXi datastores. ESXi uses synchronous writes when writing data to a NFS datastore. ESXi sets the O_SYNC flag when writing data.

O_SYNC – The file is opened for synchronous I/O.  Any write(2)s on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware.

So what’s happening here is that ESXi request an acknowledgement that the data it sent is actually being written to stable storage before it makes another write. Normally the ZIL (ZFS Intent Log) is stored on same storage pool where the data is being written if you do not have it on a SLOG (Separate Intent Log) device. The ZIL holds synchronous data written to it then flushes it out as a transactional write. Without the ZIL on a SLOG I got about 8MB/s writes. After I enabled the ZIL on a SLOG, which is on a SSD, I now get about 30MB/s which is a significant improvement. Keeping in mind this is on a 3 drive RAIDZ using 7200rpm disks. So you have a few choices in my opinion if you’re going to be using ESXi with NFS on ZFS:

  1. Put your ZIL on a seperate device that has a fast write speed like a SSD or some type of Flash RAM device.
  2. Use iSCSI.
  3. Don’t use ZFS.
  4. Disable sync writes. (I’d only do this if you don’t give a hoot about the data that’s being written)

Now you may need to experiment with your setup as there could be other factors creating your problems or perhaps another solution. If you go with option 1 I’d recommendan SSD with a fast write speed using SLC NAND chips.

Sources:
http://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/
http://forums.freenas.org/threads/sync-writes-or-why-is-my-esxi-nfs-so-slow-and-why-is-iscsi-faster.12506/
https://blogs.oracle.com/realneel/entry/the_zfs_intent_log
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained