- General FT Requirements and Recommendations
- Cluster and host requirements
- Storage requirements
- Networking recommendations
- Timekeeping Recommendations
- Configuration Recommendations to be observed when configuring FT
- Best Practices for Fault Tolerance.
- Cluster requirements for FT
- Host requirements for FT
- VM requirements for FT
- Configuration steps
- Configure Networking
- Check compliance
- Keep in mind , when you enable FT , you get this message, it guides you how things should be:
- Watch: https://www.youtube.com/watch?v=7lPf4OiMKug
- Configuration Error Messages : vSphere HA and FT Error Messages (1033634) http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033634>
General FT Requirements and Recommendations
Cluster and host requirements : * VMware FT can only be used in a VMware HA cluster. * Ensure that all ESX hosts in the VMware HA cluster have identical ESX versions and patch levels.vLockstep technology only works between Primary and Secondary VMs on hosts running identical versions of ESX. Please see the section on Patching hosts running VMware FT virtual machines for recommendations on how to upgrade hosts that are running FT virtual machines. * ESX host processors must be VMware FT capable and belong to the same processor model family.VMware FT capable processors required changes in both the performance counter architecture and virtualization hardware assists of both AMD and Intel. * VMware FT does not disable AMD’s Rapid Virtualization Indexing (i.e., nested page tables) or Intel’s Extended Page Tables for the ESX host, but it is automatically disabled for the virtual machine when turning on VMware FT. However, virtual machines without FT enabled can still take advantage of these hardware-assisted virtualization features. * VMware FT is supported on ESX hosts which have hyper-threading enabled or disabled. Hyper-threading does not have to be disabled on these systems for VMware FT to work. Storage requirements :
- Shared storage required – Fibre channel, iSCSI, or NAS.
- Turning on VMware FT for a virtual machine first requires the virtual machines’ virtual disk (VMDK) files to be eager zeroed and thick-provisioned. During the process of turning on VMware FT, a message will state this requirement. The message asks whether or not it should convert the virtual disk to the supported format of eager-zeroed and thick-provisioned. The user must convert the virtual disk at this time in order to proceed with turning on VMware FT. Alternatively, the user may wish to convert the virtual disks before they turn on VMware FT to allow for a quicker VMware FT turn-on process at a later time. So, thin-provisioned or lazy-zeroed disks could be converted during off-peak times through two methods:
- Use the vmkfstools –diskformat eagerzeroedthick option in the vSphere CLI when the virtual machine is powered off.
- Inflate the virtual disk, defaults to eagerzeroedthick. (thin to thick provisioning): /vmfs/volumes/54e226a0-5e31baf3-0477-005056ba5da5/Win7-01 # vmkfstools –inflatedisk Win7-01_1.vmdk
- Set cbtmotion.forceEagerZeroedThick = “true” flag in the .vmx file before powering on the virtual machine. Then use VMware Storage VMotion to do the conversion.
- Backup solutions within the guest operating system for file or disk-level backups are supported. However, these applications may lead to the saturation of the VMware FT logging network if heavy read access is performed. In fact, saturation of the FT logging network could occur for any disk-intensive workload. The resulting network saturation may affect and lower the performance of the VMware FT-enabled virtual machine. Do not run a lot of VMware FT virtual machines with high disk reads and high network inputs on the same ESX host.
- At a minimum, use 1 GbE NICs for VMware FT logging network. Use 10 GbE NICs for increased bandwidth of FT logging traffic.
- Ensure that the networking latency between ESX hosts is low Sub-millisecond latency is recommended for the FT logging network. Use vmkping to measure the latency.
- VMware vSwitch settings on the hosts should also be uniform, such as using the same VLAN for VMware FT logging, to make these hosts available for placement of Secondary VMs Consider using a VMware® vNetwork Distributed Switch to avoid inconsistencies in the vSwitch settings.
Networking Baseline recommendation:
- Preferably, each host has separate 1 GbE NICs for FT logging traffic and VMotion. The reason for recommending separate NICs is that the creation of the Secondary VM is done by migrating the Primary VM with VMotion. This can produce significant traffic on the VMotion NIC and could affect VMware FT logging traffic if the NICs are shared.
- It is preferable that the VMware FT logging NIC has redundancy, so that no unnecessary failovers occur if a single NIC is lost. As described in the steps below, the VMware FT logging NIC and VMotion NIC can be configured so that they will automatically share the remaining NIC if one or the other NIC fails.
- Create a vSwitch that is connected to at least two physical NICs.
- Create a VMware VMkernel connection (displayed as VM kernel Port in vSphere Client) for VMotion and another one for FT traffic.
- Make sure that different IP addresses are set for the two VMkernel connections.
- Assign the NIC teaming properties to ensure that VMotioand FT use different NICs as the active NIC:
- For VMotion: Set NIC A as active and NIC B as passive.
- For FT: Set NIC B as active and NIC A as passive.
Note that it is possible to run VMware FT with just a single NIC. The vSwitch stack is flexible enough to route all the traffic (e.g., console, virtual machine, VMware FT, VMotion) through one NIC. However, this configuration is strongly discouraged, since VMware FT will perform better and more reliably with redundancy at all levels of Not supported: Source port ID or source MAC address based load balancing policies do not distribute FT logging traffic. However, if there are multiple VMware FT host pairs, some load balancing is possible with an IP-hash load balancing scheme, though IP-hash may require physical switch changes such as ether-channel setup. VMwar FT will not automatically change any vSwitch settings. Timekeeping Recommendations In order to avoid time mis-match issues of a virtual machine after an VMware FT failover, perform the following steps:
- Synchronize the guest operating system time with a time source, which will depend whether the guest is Windows or Linux.
- Synchronize the time of each ESX server host with a network time protocol (NTP) server.
Best Practices for Fault Tolerance: To ensure optimal Fault Tolerance results, you should follow certain best practices. Host Configuration Best Practices: Consider the following best practices when configuring your hosts:
- Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes.
- Apply the same instruction set extension configuration (enabled or disabled) to all hosts. The process for enabling or disabling instruction sets varies among BIOSes. See the documentation for your hosts’ BIOSes about how to configure instruction sets.
Homogeneous Clusters: vSphere Fault Tolerance can function in clusters with nonuniform hosts, but it works best in clusters with compatible nodes. When constructing your cluster, all hosts should have the following configuration: – Processors from the same compatible processor group. – Common access to datastores used by the virtual machines. – The same virtual machine network configuration. – The same ESXi version. – The same Fault Tolerance version number. – The same BIOS settings (power management and hyperthreading) for all hosts. Run Check Compliance to identify incompatibilities and to correct them.Run Check Compliance to identify incompatibilities and to correct them. #Performance To increase the bandwidth available for the logging traffic between Primary and Secondary VMs use a 10Gbit NIC, and enable the use of jumbo frames. #Store ISOs on Shared Storage for Continuous Access: Store ISOs that are accessed by virtual machines with Fault Tolerance enabled on shared storage that is accessible to both instances of the fault tolerant virtual machine. If you use this configuration, the CD-ROM in the virtual machine continues operating normally, even when a failover occurs. For virtual machines with Fault Tolerance enabled, you might use ISO images that are accessible only to the Primary VM. In such a case, the Primary VM can access the ISO, but if a failover occurs, the CD-ROM reports errors as if there is no media. This situation might be acceptable if the CD-ROM is being used for a temporary, noncritical operation such as an installation. #Avoid Network Partitions: A network partition occurs when a vSphere HA cluster has a management network failure that isolates some of the hosts from vCenter Server and from one another. See Network Partitions. When a partition occurs, Fault Tolerance protection might be degraded. In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it. To ensure that your management network is less likely to have a failure that leads to a network partition, follow the recommendations in Best Practices for Networking.
Configuration Recommendations to be observed when configuring FT:
- In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.
- If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly.
- Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.
- Use a maximum of 16 virtual disks per fault tolerant virtual machine.
- To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.
Cluster requirements for FT :
- Host certificate checking must be enabled (by default since vSphere 4.1
- At least two FT-certified hosts running the same Fault Tolerance version or host build number.
- ESXi hosts have access to same VM datastores and networks.
- FT logging and vMotion networking must be configured.
- vSphere HA cluster is needed. In other words: FT depends on HA.
Host requirements for FT :
- Hosts must have processors from the FT-compatible processor group. It is also highly recommended that the hosts’ processors are compatible with one another.
- Hosts must be licensed for FT (Enterprise(Plus)).
- Hosts must be certified for FT. Use VMware Compatibility Guide and select Search by Fault Tolerant Compatible Sets
VM requirements for FT :
- No unsupported devices attached to the VM (SMP, Physical RDM, CD-ROMs, Floppy, USB, Sound devices, NPIV, Vlance NICs, thin provisioned disks, Hot-plugging, serial- parallel ports, 3D video and IPv6).
- Disks should be virtual RDM or Thick provisioned VMDK.
- vSphere Fault Tolerance is not supported with a 2TB+ VMDK.
- VM files must be stored on shared storage.
- VM must have a single vCPU.
- VM max. RAM is 64 GB (not sure).
- VM must run a supported guest OS. See KB “Processors and guest operating systems that support VMware Fault Tolerance”.
- Snapshots must be removed or committed.
Configuration steps : Enable host certificate checking (already discussed). Configure networking.\ Create the HA cluster and add hosts. Check compliance. Configure Networking
- Multiple gigabit NICs are required. For each host supporting Fault Tolerance, you need a minimum of two physical gigabit NICs. For example, you need one dedicated to Fault Tolerance logging and one dedicated to vMotion. Three or more NICs ared recommended to ensure availability.
- The vMotion and FT logging NICs must be on different subnets and IPv6 is not supported on the FT logging NIC.
- To confirm that you successfully enabled both vMotion and Fault Tolerance on the host, view ts Summary tab in the vSphere Client. In the General pane, the fields vMotion Enabled and Host Configured for FT should show Yes.
- On the “Profile Compliance” tab on the Cluster level, you can check to see if the cluster is onfigured correctly and complies with the requirements for the successful enablement of Fault Tolerance. Click “Description” to watch the criteria. Click “Check Compliance Now” to run a tests.
Sources: http://paulgrevink.wordpress.com/ vsphere-esxi-vcenter-server-55-availability-guide.pdf fault_tolerance_recommendations_considerations_on_vmw_vsphere4.pdf