Configure VMware Fault Tolerance

  • General FT Requirements and Recommendations
    •   Cluster and host requirements
    •   Storage requirements
    •   Networking recommendations
  • Timekeeping Recommendations
  • Configuration Recommendations to be observed when configuring FT
  • Best Practices for Fault Tolerance.
  • Cluster requirements for FT
  • Host requirements for FT
  • VM requirements for FT
  • Configuration steps
    • Configure Networking
    • Check compliance

General FT Requirements and Recommendations               
Cluster and host requirements :
* VMware FT can  only  be used in  a  VMware HA cluster. * Ensure  that  all  ESX  hosts in  the  VMware HA cluster  have identical  ESX  versions  and  patch levels.vLockstep technology only works between Primary and Secondary VMs on hosts running identical versions of ESX. Please see the section on Patching hosts running VMware FT virtual machines for recommendations on how to upgrade hosts that are running FT virtual machines. * ESX  host  processors  must be VMware FT capable and  belong  to  the  same processor model family.VMware FT capable processors required changes in both the performance counter architecture and virtualization hardware assists of both AMD and Intel. * VMware FT does not disable AMD’s  Rapid Virtualization  Indexing  (i.e.,  nested  page tables)  or  Intel’s  Extended Page Tables for the ESX host, but it is automatically disabled for the virtual machine when turning on VMware FT. However, virtual machines without FT enabled can still take advantage of these hardware-assisted virtualization features. * VMware FT is  supported  on ESX  hosts which have hyper-threading enabled or disabled. Hyper-threading does not have to be disabled on these systems for VMware FT to work. Storage requirements :

  • Shared storage required – Fibre channel, iSCSI, or NAS.
  • Turning on VMware FT for a virtual machine first requires the virtual machines’ virtual disk (VMDK) files to be eager zeroed and thick-provisioned. During the process of turning on VMware FT, a message will state this requirement. The message asks whether or not it should convert the virtual disk to the supported format of eager-zeroed and thick-provisioned. The user must convert the virtual disk at this time in order to proceed with turning on VMware FT. Alternatively, the user may wish to convert the virtual disks before they turn on VMware FT to allow for a quicker VMware FT turn-on process at a later time. So, thin-provisioned or lazy-zeroed disks could be converted during off-peak times through two methods:
  • Use the vmkfstools –diskformat eagerzeroedthick  option in the vSphere CLI when the virtual machine is powered off.
  • Inflate the virtual disk, defaults to eagerzeroedthick. (thin to thick provisioning): /vmfs/volumes/54e226a0-5e31baf3-0477-005056ba5da5/Win7-01 # vmkfstools –inflatedisk Win7-01_1.vmdk
  • Set cbtmotion.forceEagerZeroedThick = “true”  flag in the .vmx file before powering on the virtual machine. Then use VMware Storage VMotion to do the conversion.
  • Backup solutions within the guest operating system for file or disk-level backups are supported. However, these applications may lead to the saturation of the VMware FT logging network if heavy read access is performed. In fact, saturation of the FT logging network could occur for any disk-intensive workload. The resulting network saturation may affect and lower the performance of the VMware FT-enabled virtual machine. Do not run a lot of VMware FT virtual machines with high disk reads and high network inputs on the same ESX host.

Networking recommendations

  • At a  minimum, use 1  GbE  NICs  for VMware FT logging  network.  Use 10  GbE  NICs  for increased bandwidth  of  FT logging traffic.
  • Ensure  that  the  networking  latency between  ESX  hosts is  low  Sub-millisecond  latency is  recommended for the FT logging network. Use vmkping to measure the latency.
  • VMware vSwitch settings on the  hosts should  also  be uniform,  such  as  using the  same VLAN for VMware FT logging, to make these hosts available for placement of Secondary VMs Consider using a VMware® vNetwork Distributed Switch to avoid inconsistencies in the vSwitch settings.

Networking Baseline recommendation:

  • Preferably, each host has separate 1 GbE NICs for FT logging traffic and VMotion. The reason for recommending separate NICs is that the creation of the Secondary VM is done by migrating the Primary VM with VMotion. This can produce significant traffic on the VMotion NIC and could affect VMware FT logging traffic if the NICs are shared.
  • It is preferable that the VMware FT logging NIC has redundancy, so that no unnecessary failovers occur if a single NIC is lost. As described in the steps below, the VMware FT logging NIC and VMotion NIC can be configured so that they will automatically share the remaining NIC if one or the other NIC fails.
  1. Create a vSwitch that is connected to at least two physical NICs.
  2. Create a VMware VMkernel connection (displayed as VM kernel Port in vSphere Client) for VMotion and another one for FT traffic.
  3. Make sure that different IP addresses are set for the two VMkernel connections.
  4. Assign the NIC teaming properties to ensure that VMotioand FT use different NICs as the active NIC:
  • For VMotion: Set NIC A as active and NIC B as passive.
  • For FT: Set NIC B as active and NIC A as passive.

Note that it is possible to run VMware FT with just a single NIC. The vSwitch stack is flexible enough to route all the traffic (e.g., console, virtual machine, VMware FT, VMotion) through one NIC. However, this configuration is strongly discouraged, since VMware FT will perform better and more reliably with redundancy at all levels of Not supported: Source port ID or source MAC address based load balancing policies do not distribute FT logging traffic.  However, if there are multiple VMware FT host pairs, some load balancing is possible with an IP-hash load  balancing scheme, though IP-hash may require physical switch changes such as ether-channel setup. VMwar FT will not automatically change any vSwitch settings. Timekeeping Recommendations In order to avoid time mis-match issues of a virtual machine after an VMware FT failover, perform the  following steps:

  1. Synchronize the guest operating system time with a time source, which will depend whether the guest is Windows or Linux.
  2. Synchronize the time of each ESX server host with a network time protocol (NTP) server.

Best Practices for Fault Tolerance: To ensure optimal Fault Tolerance results, you should follow certain best practices. Host Configuration Best Practices: Consider the following best practices when configuring your hosts:

  • Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes.
  • Apply the same instruction set extension configuration (enabled or disabled) to all hosts. The process for enabling or disabling instruction sets varies among BIOSes. See the documentation for your hosts’ BIOSes about how to configure instruction sets.

Homogeneous Clusters: vSphere Fault Tolerance can function in clusters with nonuniform hosts, but it works best in clusters with compatible nodes. When constructing your cluster, all hosts should have the following configuration: – Processors from the same compatible processor group. – Common access to datastores used by the virtual machines. – The same virtual machine network configuration. – The same ESXi version. – The same Fault Tolerance version number. – The same BIOS settings (power management and hyperthreading) for all hosts. Run Check Compliance to identify incompatibilities and to correct them.Run Check Compliance to identify incompatibilities and to correct them. #Performance To increase the bandwidth available for the logging traffic between Primary and Secondary VMs use a 10Gbit NIC, and enable the use of jumbo frames. #Store ISOs on Shared Storage for Continuous Access: Store ISOs that are accessed by virtual machines with Fault Tolerance enabled on shared storage that is accessible to both instances of the fault tolerant virtual machine. If you use this configuration, the CD-ROM in the virtual machine continues operating normally, even when a failover occurs. For virtual machines with Fault Tolerance enabled, you might use ISO images that are accessible only to the Primary VM. In such a case, the Primary VM can access the ISO, but if a failover occurs, the CD-ROM reports errors as if there is no media. This situation might be acceptable if the CD-ROM is being used for a temporary, noncritical operation such as an installation. #Avoid Network Partitions: A network partition occurs when a vSphere HA cluster has a management network failure that isolates some of the hosts from vCenter Server and from one another. See Network Partitions. When a partition occurs, Fault Tolerance protection might be degraded. In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it. To ensure that your management network is less likely to have a failure that leads to a network partition, follow the recommendations in Best Practices for Networking.
Configuration Recommendations to be observed when configuring FT:

  • In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.
  • If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly.
  • Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.
  • Use a maximum of 16 virtual disks per fault tolerant virtual machine.
  • To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.

Cluster requirements for FT :

  • Host certificate checking must be enabled (by default since vSphere 4.1
  • At least two FT-certified hosts running the same Fault Tolerance version or host build number.
  • ESXi hosts have access to same VM datastores and networks.
  • FT logging and vMotion networking must be configured.

    image00113

    image00113

  • vSphere HA cluster is needed. In other words: FT depends on HA.

Host requirements for FT :

  • Hosts must have processors from the FT-compatible processor group. It is also highly recommended that the hosts’ processors are compatible with one another.
  • Hosts must be licensed for FT (Enterprise(Plus)).
  • Hosts must be certified for FT. Use VMware Compatibility Guide and select Search by Fault Tolerant Compatible Sets

    image0023

    image0023

VM requirements for FT :

  • No unsupported devices attached to the VM (SMP, Physical RDM, CD-ROMs, Floppy, USB, Sound devices, NPIV, Vlance NICs, thin provisioned disks, Hot-plugging, serial- parallel ports, 3D video and IPv6).
  • Disks should be virtual RDM or Thick provisioned VMDK.
  • vSphere Fault Tolerance is not supported with a 2TB+ VMDK.
  • VM files must be stored on shared storage.
  • VM must have a single vCPU.
  • VM max. RAM is 64 GB (not sure).
  • VM must run a supported guest OS. See KB “Processors and guest operating systems that support VMware Fault Tolerance”.
  • Snapshots must be removed or committed.

Configuration steps : Enable host certificate checking (already discussed). Configure networking.\ Create the HA cluster and add hosts. Check compliance. Configure Networking

  • Multiple gigabit NICs are required. For each host supporting Fault Tolerance, you need a minimum of two physical gigabit NICs. For example, you need one dedicated to Fault Tolerance logging and one dedicated to vMotion. Three or more NICs ared recommended to ensure availability.
  • The vMotion and FT logging NICs must be on different subnets and IPv6 is not supported on the FT logging NIC.

Check compliance

  • To confirm that you successfully enabled both vMotion and Fault Tolerance on the host, view ts Summary tab in the vSphere Client. In the General pane, the fields vMotion Enabled and Host Configured for FT should show Yes.

    image00311

    image00311

  • On the “Profile Compliance” tab on the Cluster level, you can check to see if the cluster is onfigured correctly and complies with the requirements for the successful enablement of  Fault Tolerance. Click “Description” to watch the criteria. Click “Check Compliance Now” to run a tests.
image0042

image0042

Sources: http://paulgrevink.wordpress.com/ vsphere-esxi-vcenter-server-55-availability-guide.pdf fault_tolerance_recommendations_considerations_on_vmw_vsphere4.pdf

Advertisements

About Ahmad Sabry ElGendi

https://www.linkedin.com/pub/ahmad-elgendi/94/223/559
This entry was posted in Vmware. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s