VMware Best Practice Guide

Introduction

This document illustrates recommended configurations for various aspects of a VMware environment in tandem with a StorTrends SAN. Examples are given within particular sections to help further illustrate how a certain feature or configuration benefits the user. The StorTrends family of SANs utilize a variety of product integrations and performance tuning mechanisms in order to enhance and simplify the overall performance and manageability of a virtualized environment.

Prerequisites

In order to have a proper understanding about the information found within this document, readers should have a good understanding of the following topics:

  • Configuration and operation of VMware vSphere
  • Configuration and operation of the StorTrends SAN
  • Operating systems such as Windows and various Linux distributions

Additional Resources

In addition to the recommendations found within this document, the following guides should provide recommendation for specific applications within the virtualized environment:

Intended Audience

This guide is intended for IT Managers, Solutions Architects and Server Administrators with the desire to implement a StorTrends SAN within their current, or a new, VMware environment.

NOTE: The information found in this guide is intended to give general recommendations that should apply to a majority of VMware environments. In some cases, these recommendations may not be applicable. That determination should be based on individual and business requirements.

Host Based Connectivity

The StorTrends family of SANs offer iSCSI connectivity, both 1 GbE and 10 GbE. The below recommendations will allow users to have optimized connections between their StorTrends SAN and their VMware environment.

Networking Recommendations

Within each VMware ESXi host, there should be separate port adapters for management and iSCSI traffic purposes. If there are multiple iSCSI ports within the VMware ESXi hosts that are under the same subnet, these ports should be added to the 'Network Port Binding' list for each host.

NOTE: For more information on port binding, follow the following link to a KB article that goes into further detail on the subject: VMware KB article 2038869

In order to add these ports to that list:

  • Go to the specific VMware ESXi host within your vCenter and go under the 'Manage' tab.
  • From this tab, go to the 'Storage' sub-tab and choose 'Storage Adapters'.
  • Find the iSCSI Software Adapter (usually at the end of the list).
  • Under the 'Adapter Details' section, find the 'Network Port Binding' and click on the green '+' to bring up a list of available VMkernel adapters.
NOTE: Do NOT add management port adapters to this list.
NOTE: VMware vSphere versions 4.x, 5.0, 5.1, 5.5 and 6.0 do not support routing for any port being used for port binding. For this reason, be sure that the StorTrends SAN and all iSCSI VMkernel ports are on the same subnet and are able to communicate directly with each other.

Jumbo Frames

Jumbo frames, while not required, are recommended for environments that have high throughput requirements. This will require jumbo frames to be enabled from the StorTrends SAN, the switch and each individual VMware ESXi host.

In order to enable this on the StorTrends SAN:

  • Login to the ManageTrends UI and click on the 'Control Panel' for the Left Controller found on the left-hand navigation tree.
  • Under the 'Network' section, click on 'TCP-IP'.
  • Find the interface that will have jumbo frames enabled from the dropdown and check the jumbo frame status to the right of the dropdown. If the status shows a gray 'X', click the 'Enable' button to enable jumbo frames for that interface.

In order to enable this on the VMware ESXi host:

  • Login to vCenter and navigate to the host in question. Click on the 'Manage' tab and then click on the 'Networking' sub-tab.
  • Under the 'Virtual switches' option, find the proper vSwitch from the list and click on the pencil icon to change its settings. Change the MTU value from 1500 to 9000.
  • Under the 'VMkernel adapters' option, find the VMkernel that correlates to the vSwitch that was modified above and click the pencil icon to change its settings. Go under 'NIC settings' and change the MTU to 9000.

In order to enable jumbo frames for your switch, consult the switch's user manual.

Teaming Options and MCS/MPIO

There are a few options for increasing the available throughput of the environment as a whole; both on the StorTrends SAN side as well as the individual VMs (if the data drives are presented through the VM's iSCSI initiator).

On the StorTrends SAN side, teaming the physical NICs of each controller can give a boost to throughput capabilities. There are a few options for teaming including Round Robin, Adaptive Load Balancing, and Link Aggregation Control Protocol (802.3ad). To gain more knowledge on these options, take a look at the StorTrends Alias Configuration Guide.

For individual VMs, Multiple Connections per Sessions (MCS) or Multipath I/O (MPIO) can be helpful protocols for increasing throughput of the VM. The main difference between the two protocols is that MCS creates multiple sessions on the same connection whereas MPIO creates multiple paths from physical connections. For more information on both protocols, and to see how to implement these protocols in Windows environments, take a look at the StorTrends MPIO/MCS Comparison Guide.

Round Robin Multipathing

Round robin multipathing is a VMware ESXi parameter that will create both redundancy and increased throughput for iSCSI connections. This setting needs to be applied on a per target basis. Below are steps to follow in order to do this manually:

  • Login to vCenter and navigate to the host in question. Click on the 'Manage' tab and then click on the 'Storage' sub-tab. Click on the 'Storage Devices' option and choose a device that is connected to the StorTrends SAN. Then, under the 'Device Details', click on 'Edit Multipathing…'
  • In the dropdown under 'Path selection policy:', choose 'Round Robin (VMware)'.
NOTE: If there are a multitude of targets to go through, StorTrends Support can offer a customized script that will make the changes for all targets that are connecting to a StorTrends LUN.
NOTE: It is recommended that round robin multipath be set for all iSCSI connections associated with the StorTrends SAN. This will allow for additional performance as seen in the performance section, below.

iSCSI Timeout Values

All StorTrends SANs are all fully redundant with dual-controllers. In the event of a failover, the user must ensure that proper timeout values are set throughout the environment in order to ensure no loss in connectivity. For VMware, change the following option in the iSCSI adapter.

  • Login to vCenter and navigate to the host in question. Click on the 'Manage' tab and then click on the 'Storage' sub-tab. Under the 'Storage Adapters' option, scroll down until the iSCSI Software Adapter is found and then click on the 'Advanced Options' tab in the 'Adapter Details' section. Click on the 'Edit…' button.
  • Scroll down until the entry for 'RecoveryTimeout' is found. Change the value to 120.

Other timeout recommendations may be found in the StorTrends Timeout Guide.

Data Drive Considerations

There are a few considerations to take when thinking about how to deploy datastores (both for OS data and for application data). Some of the considerations to take into account include how the datastores will be provisioned and how to connect the StorTrends volumes to the virtual machines (VMs) in question. The below sections will shed light on each of these considerations.

Provisioning Considerations

A general rule of thumb when figuring out what provisioning type to use for specific datastores is to use the opposite type then what was used on the StorTrends SAN.

NOTE: In the StorTrends 2610i, all volumes are created as thin provisioned volumes on the SAN.

Thin Provisioning

VMware vSphere thin provisioning allows for space overcommit at the datastore level. The capacity of the VMFS files found in each datastore provisioned as thin provisioning will only allocate the space as it is needed. If the virtual machines being deployed in the VMware environment have unpredictable space growth, thin provisioning is a viable option. With this type of provisioning, VMs running on these datastores will be susceptible to out-of-space conditions. For this reason, it is recommended that the 'Datastore usage on disk' alarm is enabled. To enable this alarm, log in to vCenter from the vSphere Web Client follow the steps below:

  • In the left-hand navigation tree, click on vCenter object and choose the 'Manage' tab. Choose the 'Alarm Definitions' sub-tab and type 'usage' in the search box to narrow down the alarms.
  • Click 'Edit…' to get to the settings for the alarm. Be sure that 'Datastores' are selected as the type being monitored and that the radio button for 'specific conditions or state, for example CPU usage' is chosen. Click 'Next'.
  • The default values are 75% usage for a warning condition and 85% usage for a critical condition. These may be changed by clicking on each percentage and choosing a new value from the dropdown menu.
  • In the 'Actions' section, choose what actions to take for each condition. It is recommended to have email notifications sent out when these conditions are met to ensure that the correct people are notified.
NOTE: If the virtual disk supports clustering solutions such as Fault Tolerance, it should NOT be provisioned as a thin provisioned datastore.

Lazy Zeroed and Eager Zeroed Thick Provisioning

Lazy zeroed thick provisioning will create a virtual disk in a default thick format. With this format, the space required for the virtual disk is allocated at creation time. Data remaining on the physical device will be zeroed out on demand at a later time on first write from the VM.

Eager zeroed thick provisioning supports clustering features such as Fault Tolerance. With this format, the space required for the virtual disk is allocated at creation time. The data remaining on the physical disk is zeroed out when the virtual disk is created. This type of provisioning could take longer than other types of provisioning.

NOTE: Eager zeroed thick provisioning is the recommended option for provisioning virtual disk.

Methods for Connectivity

There are a few options to consider when thinking about how to connect a disk to a VM. There are two options that involve the VMware ESXi hosts – Raw device mapped (RDM) and creating a disk on a datastore and presenting that to a VM. The third option utilizes the VM's iSCSI initiator.

VMDK on VMFS

One of the most common methods of connecting data drives to VMs is creating a VMDK disk on a VMFS volume. This method requires a user to carve out space on a datastore and assigning that new space to a VM for use.

Advantages

  • Ability to add storage to a VM from free space in the datastore that is hosting the VM or provision out a new datastore
  • Ability to view it in vCenter offers light overhead for administration
    • Allows the user to take advantage of vCenter tasks such as vMotion and cloning

Disadvantages

  • Unable to take application aware snapshots of data residing on VMDK

Raw Device Mapped (RDM) LUNs

Another method for connecting data drives to a VM within VMware is by utilizing RDM LUNs. This method bypasses the necessity to create a datastore and directly connects the RAW device to the VM.

Advantages

  • No real advantages over other methods

Disadvantages

  • Unable to take application aware snapshots of data
  • No performance gains as compared to other methods
  • There is a 256 target maximum per iSCSI initiator per VMware ESXi host and each RDM has to be a separate target

VM's iSCSI Initiator

A third option for connecting data drives to a VM is to use the VM's built-in iSCSI initiator. With this method, Windows and Linux VMs depend on their own iSCSI initiators and completely bypass the VMware layer.

Advantages

  • For Windows servers: ability to take application aware snapshots with the use of SnapTrends for VSS aware snapshots.
  • Ability to isolate data drives for disaster recovery (DR) purposes. This allows your data drive to be on a different replication schedule then the OS drive.
  • Can be easily mounted on either a virtual or physical server for quick recovery on a physical server in the case that the virtual environment is down.
  • When going from a physical server to a virtual one, there is no need to change any of the best practices already in place

Disadvantages

  • Not visible to vCenter, which can cause management overhead.

ATS Heartbeat

Starting with ESXi 5.5 Update 2, VMware changed its method for VMFS heartbeat updates from plain SCSI reads and writes with the VMWare ESXi kernel handling validation to offloading this procedure to the SAN utilizing the ATS VAAI primitive. In some cases, this can cause unwanted temporary lost access to datastores. If you see any of the following prints, you may be experiencing this phenomenon:

  • In the /var/run/log/vobd.log file and Virtual Center Events, you see the VOB message: Lost access to volume <uuid><volume name> due to connectivity issues. Recovery attempt is in progress and the outcome will be reported shortly
  • In the /var/run/log/vmkernel.log file, you see the message: ATS Miscompare detected between test and set HB images at offset XXX on vol YYY
  • In the /var/log/vmkernel.log file, you see similar error messages indicating an ATS miscompare: 2015-11-20T22:12:47.194Z cpu13:33467)ScsiDeviceIO: 2645: Cmd(0x439dd0d7c400) 0x89, CmdSN 0x2f3dd6 from world 3937473 to dev "naa.50002ac0049412fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
  • Hosts disconnecting from vSphere vCenter.
  • Virtual machines hanging on I/O operations.

StorTrends recommends to disable ATS Heartbeat, whether you have experienced any of the points above or not, so as to ensure proper and stable connections to the StorTrends LUNs. VMware's KB Article 2113956 has the proper steps to take depending on whether you utilize VMFS5 or VMFS3 datastores.

Performance Tuning Recommendations

The following section will walk through some changes that can be made to the environment in order to finely tune the performance of everything in order to get greater gains in overall performance.

Native Multipathing Plugin (NMP) Configuration

One of the aspects of VMware's architecture that VMware improved upon starting in version 4.0 is their Native Multipathing Plugin (NMP) by introducing Storage Array Type Plugins (SATP) and Path Selection Plugins (PSP) as part of the VMware APIs for Pluggable Storage Architecture (PSA). With these plugins, storage systems are given the ability to aggregate I/Os across multiple connections as well as implement failover methods between those connections. VMware has three options for handling these and StorTrends recommends the "Round Robin" option, as seen in the 'Round Robin Multipathing' section earlier in this guide.

In this section, we explore additional settings that can be set when using Round Robin Multipathing to realize increased throughput. When Round Robin is configured, there are parameters that dictate how often each connection is utilized for servicing I/Os. By default, VMware switches paths every 1,000 I/Os. This can lead to uneven usage of the connections as well as unrealized throughput gains. Stortrends' recommended revision to this setting is to set it to change paths for every 1 I/O. By making this change, VMware will switch paths for every I/O, enabling a complete balance of utilization between connections as well as ensuring that each connection is fully utilized. As an added bonus, in the case of a failover, the time it takes to complete the failover from VMware's perspective is reduces as the switching between connections will happen quicker. Making this change does incur a minute amount of additional cost to the CPUs of the ESXi hosts, but it is not enough to cause any undesirable side effects.

Depending on your ESXi version, making this change will vary slightly, but regardless, this change needs to be made from each ESXi host's SSH shell. The scripts below will ensure that any new StorTrends LUNs that are presented to the ESXi host automatically take these settings when Round Robin is set as well as make the change to any current StorTrends LUNs.

NOTE: In order for the rules for new connections to take effect, a reboot of the ESXi host is required.