VMotion, Storage vMotion and Cold Migration
- 1 Originating Author
- 2 Video Content [TBA]
- 3 Introduction: VMotion, Storage VMotion, and Cold Migration
- 4 Managing VMotion
- 5 Managing Storage VMotion
- 6 Triggering Cold Migration
- 7 vMotion, Storage vMotion, Cold Migration (PowerCLI)
Video Content [TBA]
Introduction: VMotion, Storage VMotion, and Cold Migration
Version: vSphere 5.5
This topic all about the moving of the virtual machine from one location to another - occasionally this refer to as workload protability or 'live migration' by industry experts or other virtualization vendors. VMware's flagship technoloy is called VMotion. Indeed it was VMware who pioneered the technology that allows the SysAdmin to move a running VM from one physical host to another, without powering off the VM and without disconnecting users. Storage VMotion discribes similiar process by which the files that make up the VM (.VMX, .VMDK) are relocated from one datastore to another, again without powering off the VM and without disconnecting users. Finally, cold migrate describes a process by which the VM is relocated either to another host, another datastore or both - with the VM powered off. This can be neccessary because the requirements of VMotion or Storage VMotion for what ever reason cannot be met.
The requirement for VMotion and Storage VMotion to work have change over various releases. With VMware introducing new requirements, and weakening or some case removing them altogether. For instance, early version of Storage VMotion, required that VMotion was enabled first - now Storage VMotion is built-in to the platform and does not require that VMotion be enabled first. Similiarly, early versions of VMotion required the VM to be located on Shared Storage (such as FC, iSCSI, NFS) and non-shared storage such as Local VMFS volumes were not supported. By combining the functionality of VMotion and Storage VMotion it is now possible to enjoy the benefits of workload portability without the need for shared storage - although this remains a desirable feature to many customers.
Initially, when VMotion was first demonstrated few people really appreciated how revolutionary the technology was going to be. However, once customers got over the initial thrill and disbelief, they quickly came to accept the capability as given. Now, where VMotion shows its value is in related features and technologies that leverage it. It's these business benefits that really makes a VMotion an important feature in vSphere. For instance:
- Distributed Resource Scheduler (DRS)
A feature of VMware Clustering, DRS will dynamically move VMs around hosts within a cluster to improve the overall performance of the VMs.
- Distributed Power Management (DPM)
A feature of VMware Clustering, DPM will calaculate based on load whether all of the servers need to be online. It will then selectively evacuate the the host of VMs (using a combination of VMotion and DRS), and place them in powered off/standby state. Ready to be resummed when load increases on the remaining active nodes in the cluster.
- Update Management (VUM)
Like all software, vSphere requires patching for security and bug fixes. Used inconjunction with DRS, can automatically evacuate a vSphere host of VMs, and carry out the patch management routing - allowing the host to enjoin the cluster once the process has been completed
- Hardware Maintanance
In the past hardware refresh cycles dicated a 3 yearly renewal of hardware. Industry watchers are now beginning to see this period being extended out into a 5 year period. However, when a server is decommissioned, in many cases the VMs can be VMotion from the old servers to the new seamlessly. The same can be said of storage. Just like server, storage arrays are purchased with period of warranty which will eventually expire. As with servers its often more cost effective to acquire new storage - and get the benefits of support as well as hopefully new features and improved performance. Using
VMotion can be easily configured without understanding at all the background processes that allow it work. However, it can be useful to know this because it helps explain the requirements and troubleshooting the technology. In VMotion the files that make up the VM are not copied or moved, but remain on the datastore. What is copied across the network is the memory contents of the VM, to hidden or "ghost" VM on the destination vSphere host. Once the two VMs are in the same memory state, the source is turned off and deregistered, and the target is started. This all happens so quickly that often no packet lost occurs as the VM is "moved" from one vSphere host to another. When VMotion is initiated an initial pre-copy takes place, and log file is generated - sometimes referred to as a "memory bitmap". Once the two VMs are practically at the same state the bitmap on the source host is transferred to the destination. Generally, the memory bitmap must be in very small state for this happen, so the transfer takes milliseconds to complete. To allow for complete synchronization of memory contents, just before the transition of the VM from source to destination host takes place, the VM on the source is put into a quiesced state which temporarily pauses activity.
VMotion can be seen as transactional process, as it either succeeds or fails - the SysAdmin should never have situation where due to a VMotion - that there is two VMs with the same identity on different hosts. So if there was a network failure during a VMotion. The process would stop, and the SysAdmin would find the VM still running on the original host.
Once the VM is running on the destination host, the VMotion process triggers a RARP packet to update tables on the network. Although the MAC address of the VM never changes during a VMotion, packets destined for it would have been directed to the source vSphere host rather than the destination without the wide physical network being made away of the change in the physical location. The time it takes for RARP packets to proliferate, and MAC address caches to reach their TTL can result in one or two lost packets. In modern networks its been observed that packets aren't lost but merely delayed. Despite this "network jitter" effect it is not significant enough to create data corruption, or even upset the workings of so called real-time protocols such as Microsoft RDP, VMware View PCoIP or Citrix HDX communication. Indeed, since VMotion first inception we are now rapidly moving into period where "Metro VMotion" is possible across previously inconcievable distance. As the years progress the hardware and software requirements need to move VM from one City to another for instance will be easier to meet.
Requirements and Recommendations
- VMKernel Networking: There needs to be at least 1Gps connectivity between the hosts, with a VMKernel port enabled for VMotion. VMotion will work with 10Gps, and with bundled VMKernel Ports and NIC Teaming.
- Virtual Machine Networking: The portgroup labels on the source/destination vSphere hosts for which the VM is configured need to be consistent. In reality this not a concern in Distribute Switches, where centralised management ensure consistent naming. It is more a challenge for environment that utilize the Standard Switch which is configured on a per-host basis. A portgroup named "Production" on the source host, where the destination used "production" would cause a problem as portgroup labels on Standard Switches are case sensitive.
- Shared Storage: In the past it was a requirement for the source/destination vSphere Hosts to have access to the same datastore. This practise is one that persists today, however it is possible to now move a VM from one host to another where shared storage is not available - such as moving a VM from local datastore on one host to another local datastore.
- CPU Compatiabililties: In recent years much work as been carried out to mitgate against the requirement to have matching CPU attiributes from the source/destination hosts. For instance in VMware's DRS technology that utilizes VMotion, there is an Enhanced VMotion Compatiability (EVC) feature that allows for the masking (hiding) of CPU attributes to allow VMotion take place. Sadly, however no technology exists presently to allow VMotion take place between Intel and AMD chipsets. So it is likely we will need to live with CPU compatability issues for the forseeable future. Two KB articles cover the CPU attributes from the main vendors outlining the compatiability requirements:
Typically, these articles will focus on the fact that different CPU generations have different attributes such as variances on the version of Streaming SMID Extensions (SSE), hardware assist features such as Intel-VT and AMD-V, and pressance of different data protection features such as Intel XD and AMD NX. Generally, the simplest solution to this challenge is buy compatiability, by buying servers in blocks all of the same type. However, this can sometimes be tricky for SMBs who don't have the budget to take this approach. In most case customers opt for buying at least 3 server or 4 servers. This allows them to meet the requirement for other VMware technologies such as HA, DRS and VSAN. Typically, home lab users must take care in acquiring hardware that are required to install VMware ESX, but also have matching attributes - so care must be taken if buying second hardware from sources such as ebay. One way out of this bind for home labs is to used Nested VMware ESX.
Identifying the CPU attributes and whether have been enabled on an existing environment can be a challenge. For some SysAdmins merely looking at the details of the CPU from the Web Client is sufficent. These are located in >>Hosts & Clusters >>Select Host >>Manage Tab >>Settings Column >Hardware and Processors:
Fortunately, there are tools out that will flag up the attributes that pertinent to VMware. From VMware there are two ultilities - Site Survey and CPUID.iso. Site Survey as of vSphere 5.1 is no longer supported. CPUID.iso is boot DVD that reports CPU information at the physical console of vSphere host. CPUID.iso was last updated in 2009, and is available at the Shared Utlities page.
Below is screen grab from HP ML350 Gen8 server using the CPUID.ISO image:
Although its somewhat dated the run-virtual.com website hosts a "WMware CPU Host Info 2.01" utility that was last updated in 2008. But it still works with vSphere 5.5 vCenter, and does not require the physical host to be reboot to examine the attributes.
Typically, VMotion requires a VMKernel Portgroup. If this is on Standard Switch care should be taken to ensure the portgroup label is consistent across all hosts, and the IP assign is unique. VMotion traffic can be significant and is not secured or encrypted. Best practise is ensure takes place within a unique VLAN, and with a dedicated 1Gps or higher interface. Where customers are using a limited number of 10Gps interfaces its not uncommon to either use IO-Virtualization at the blade level to seperate and dedicate bandwidth to the process or else use VMware's NIOC to impose limits and priorities on VMotion traffic to prevent it saturating links and interferring with other traffic types. In SMB and Home Lab configurations its not uncommon for SysAdmins to enable the default Management Network for VMotion as work around to a limitation on the number of physical VMnics. VMotion Enabled on Standard Switch Portgroup:
VMotion Enabled on Distributed Switch Portgroup:
For futher information on: Standard Switches and Portgroups
For futher information on: Distributed Switches and Portgroups
A multi-NIC configuration for VMotion is supported which allows for load-balancing across mulitple VMnic interfaces. It does not require any special load-balancing configuration on the physical switch such as complicated LACP or IP Hash load balancing to make this work, there is no need to use Load Based Teaming (Route based on physical NIC load). This multi-nic configuration has been shown to accelerate VMotion events triggered by maintanance mode, and speed up the process of evacuating many VMs from the same host. The configuration is not very similiar that is used when enabling iSCSI Binding Load-balancing. In this case a single Standard Switch is created backed by two or more VMnics. Two or more portgroups are created such as VMotion0 and VMotion1, and using Teaming and Failover options on the properties of portgroup set the first VMnic to be Active and the second VMnic to be standby. So for example:
The portgroup, VMotion0 would use vmnic0 as "Active" and vmnic1 as "Not in Use" The portgroup, VMotion1 would use vmnic1 as "Active" and vmnic0 as "Not in Use"
Whatever your configuration - you can confirm that VMotion has been successfully enabled by inspecting the Summary tab of vSphere Host, and the Configuration panel.
Triggering a VMotion Event - Change Host
VMotion can be carried out by drag+drop - locating the VM on the source vSphere host, and dropping it on destination host. This will open the Migration Wizard. Alternatively, the SysAdmin can right-click a VM and select the Migrate option. By default the wizard will select the "Change Host" radio button. If carrying out a manual move its worth checking which vSphere host the VM currently resides on, as the Migration Wizard does not warn the SysAdmin of attempt to move the VM to the same location.
1. Select Change Host
2. If VMware Clustering (HA/DRS) has been enabled select the cluster, and select the option to Allow host selection within this cluster
3. Next select the destination vSphere Host you wish to move the VM to. The Web Client will trigger a Compatability Check to ensure the configuration is good.
4. In VMotion Prority, its possible to adjust how VMotion reacts to a lack of spare CPU cycles. In reality most people ignore this option as they have plenty of spare CPU cycles available.
The progress of the VMotion event can be monitored from the Recent Tasks pane. The time VMotion takes complete is based on a number of factors - bandwidth, load-balancing, and the quantity of active memory to transmitted across the wire to the destination host.
Triggering a VMotion Event - Change Host and Datastore
In VMotion's first iteration both the source and destination hosts need access to the same shared-storage. This limited the scope of VMotion to protocols like FC, iSCSI and NFS. It also meant possible limits around moving the VM from one vSphere Cluster to another vSphere Cluster - as in many cases the storage of one cluster isn't visable to another. Many customers worked around this problem by having at least one datastore accessible to both clusters - often referred to in popular parlance as the "Swing LUN". Later releases of vSphere have introduced the capacity to move VMs from host to another when there isn't a common shared datastore - its wants sometimes referred to as a "shared nothing" environment.
Esseentially, the Change Host and Datastore option chains together two process - the SVMotion of the VMs file to another hosts, followed by the switching from one host to anther of the ownership of the running VM through the process of VMotion. This means it is possible to move a running VM on local storage on one host to different hosts local storage. Additionally, it also allows for the ability to move VMs from one cluster to another where no common shared storage exists.
In this case a VM is located on local storage one vSphere host (esx01nyc), to another vSphere host's (esx03nyc) local storage:
1. Select Change Host and Datastore
2. Select a Destionation Resource (aka VMware Cluster or Resource Pool)
3. Select a Host
4. Select a Datastore
5. Select a VMotion Priority
As ever with software most problems stem from poor or inconsistent administration. Once the basic requirements have been met there are number of situations that can trigger errors or warnings on during the "compatiability" check or cause VMotion to fail altogether. These warnings occur due to setting on the physical vSphere host or the properties of the VM, and are summarized below. Errors must be resolved for the feature to work, but warnings can be bypassed in the interface.
Ideally, a manual VMotion should trigger no errors or warnings to give the VM Operator a seamless experience. Privileges and Permissions can be used to disable options and features for these vCenter users to stop them occuring in the first place.
- Storage Errors
In a classic configuration VMotion requires that the VM files be accessible to both the source and destination vSphere hosts. If for example a VM is placed on a local VMFS volume, which is accessible just to the source vSphere host then an error message will appear. Most SysAdmins resolve this problem by using Storage VMotion or Cold Migrate to relocate the files. To prevent VM Operators from creating VMs on local storage there are couple of options - If the host boots via FC-SAN, iSCSI-SAN, USB/SD-CARD or PXE then there is no real requirement for a local VMFS volume. Alternatively, local storage can be grouped into datastore folders, and the permissions assigned to stop operators from accessing it.
- Network Connectivity Errors
Typical network connection errors display themselves with VMotion reaching 14% in the progress, and then timing out complete. The cause of this can be many fold and include incorrect TCP/IP configuration with the VMKernel Port; invalid VLAN tagging configuration and/or incorrect VMnic configuration. In the screengrabs below although the VMotion VMKerenl port had the correct IP data, the VLAN tagging was incorrect.
VM Stuck at 14%
VMotion Error message indicates problem with networking
Same VMotion Error with extended detail
- Inconsistently Named Portgroups/Case-Sensitivity
When a VM is moved from one vSphere host to another - its network portgroup configuration cannot be modified. This does mean the portgroup name, be it "Production" or "VLAN101" must be consistent. Remember with Standard Switches the portgroup labels are case-sensitive. In some cases the error message can simply be that a portgroup on one host, is simply not present on another.
In this case a VM is being moved from esx03nyc to esx01nyc. Sadly the portgroup labelled on one host is VLAN101 and on the destination host is vlan101.
- Connected and Configure CD-ROM/Floppy to local resource
If the CD-ROM of the VM is connected and configured to the physical hosts CD-ROM or alternatively connected and configured to a .ISO image which not available to both the source and destination vSphere hosts - this will result in an error:
Local CD-ROM Device:
CD-ROM .ISO image on non-shared storage:
- CPU Affinity
It is possible to peg a VM to specific CPU socket(s) or core(s) using Scheduling Options on the virtual CPU. However, this option does generate a VMotion Error. If DRS is in use with a fully-automated mode, the option to modify CPU affinities is supressed in the UI. For this reason many SysAdmins choose ensure that CPU Affinities is not enabled prior to enabling DRS.
- RDMs and LUN Visability
Visability of storage to all the nodes in a VMware Cluster is commonly recognised configuration. This is specifically true with the use of Raw Device Mappings (RDM). Whilst its relatively easy to confirm that ALL hosts in a VM Cluster have access to the same VMFS/NFS datastores, its is trickier to check that the LUNs/Volumes that is mapped to a single VM is also accessible to every host that VM could be schedule to run on.
- Inconsistent Security Settings on Switch/Portgroup
If the Security Settings on Switch or Portgroup are inconsistent between one host to another - this will produce a VMotion Error. For instance if one host has Accept/Reject/Reject and other host has Accept/Accept/Accept.
- No Heartbeat
It is possible to recieve warnings about missing "heatbeats" from the VM. This can happen for two main reasons. Firstly, VMware Tools may have not been installed or the VMware Tool service itself is not running. Secondly, it maybe that VM was recently moved by VMotion, and another VMotion attempt is being made very shortly afterwards. In this case the VM hasn't been situated on the vSphere host for sufficent heartbeat signals to be recieved. Installing VMware Tools is strongly recommended, although they maybe case where installation isn't possible due to the use of an unsupported distribution of Linux. Generally, heartbeat signals are beign, but it is worth research why the message has appeared if the VM hasn't been recently moved by another process.
Managing Storage VMotion
Storage VMotion (SVMotion) Explained
Storage VMotion (SVMotion) was initially introduced as method of relocating VMs of older VMFS file system to a new VMFS file system. Since then its has evolved into a technology that can serve many functions. In a manual configuration it can be use to relocate the files of the VM from datastore to another, however when enabled with Storage DRS and Datastore Clusters - SVMotion provides the engine for moving VMs around to improve overall disk performance, as well as assisting with placing the right VM on the type of storage of its IOPS requirements. In the early days, SVMotion required that VMotion was first configured, since then SVMotion has become a core feature of the vSphere platform, and therefore no such dependency exists.
SVMotion are significantly more intrusive to the vSphere environment than the more common VMotion events. This makes perfect sense because by definition SVMotion means the copying of significant amounts of data. Whereas with VMotion the files of the VM stay still, where the location of the VM moves from host to host. SVMotion can be carried out with moving the VM from the host. So it can be seen as being the polar oppposite of VMotion. With VMotion the VMs files stay still, but the VM is move to different host, with SVMotion the VM stay still, but the files are moved to a different datastore. Due to this increase in on load during the period of SVMotion, it could theoretically degrade the performance of the VM, and if simuluatanous SVMotion are carried out then the overall performance of the vSphere host could be degraded. For this reason many SysAdmin opt to carry out SVMotion at times when the load on the VM and host is at it lowest. VM can be left powered on, but the overall impact is reduced.
There are many different reasons to want to use SVMotion and these include:
- Decommisioning an old storage array who maintanance warranty is about to expire
- Switching from one storage protocol (NFS, iSCSI, FC, VSAN) to another.
- Relocate VMs from a LUN/Volume that is running out of capacity or IOPS (or both)
- To convert RDM disks to virtual disks
Requirements and Recommendations
It goes without saying that to carry out a manual SVMotion the vSphere host requires access to both the source and destination datastores - and the destination datastore has sufficent capacity to hold the relocated VM. Remember this includes not just the .VMDK virtual disks, but other annicilary files such as the VMkernel Swap file and any snapshot files.
The number of similatanous SVMotions you can carry out will be dependent of the number of other events happening at the same time. You can think of each event VMotion, SVMotion, SVMotion without shard-storage and other provisiong events as having a cost. These cost calaculation or scalability maximums are always changing with each new iteration of vSphere, as well being dependent on the event happening in your environment at the time. The current limits are available here
Triggering Storage VMotion (SVMotion)
It's rare to see a VM with single virtual disk that contains the OS, Applications, Data and Logfiles. It's more common to provision a VM with multiple virtual disks, with each different type of disk on different teirs of storage - balancing capacity and performance against the demands of the different types of activity. This does assume that the environment isn't a flat storage layer without teirs - for example currently in a VSAN 1.0 environment, the VMware Cluster presents just one layer of storage accessible to all the hosts.
In this example we have a VM with 3 virtual disks where disk/1/2/3 have all been placed on the highest teir of storage called Platinum. This storage is iSCSI enabled and is SSD based. It's been determined that although the application and data located in disk/1/2/3 are on the correct teir of storage, disk3 which contains the log files could be relocated down to either Gold/Silver/Bronze tiers. This is not an uncommon scenario with organizations that are new to virtualization, and have yet to go through process of automation where these setting would be correct from day one. Adding a new virtual disk to a VM default to creating it in the same location at the VMs .vmx file, and although this can be changed, VM Operators often forget this fact.
1. SVMotion can be triggered by right-clicking the VM, and selecting Migrate
2. Next select the Change datastore option
3. In a "Basic" view ALL the files of the VM would be relocated to the selected datastore. Notice that SVMotion has the capacity to convert the virtual disk format, as it is being moved from one datastore to another. To relocate individual virtual disks we need the Advanced view.
4. The Advanced View allow the SysAdmin to select the virtual disk and browse to the require destination datastore
5. Once select the administrator can change the virtual disk format - remember that if the datastore is NFS based, then all virtual disks are held in the thin provisoning format only.
6. Clicking Next and Finish will triggger the SVMotion process.
At the end of the process you should be able to see the virtual disk has been relocated to the new storage.
Triggering Cold Migration
For everything else there's cold migration. There may be situations where either VMotion or Gold Migration may not possible. A number of reason can be cited such as incompatiabilities when moving from one processor architecture to another. Cold Migration is not without its own requirements for instance it isn't possible to simply move a VM from one vSphere host to another without the VM first being on shared storage. Either the files of the VM must be first cold migrated to shared-storage or else the option to Change Host and Datastore must be selected first.
In this screen grab below the option "Change Host" was used on a VM located on local storage - as you can see it cannot simply me moved from esx03_nyc to esx01_nyc because the destination host does not have access to the storage
The time it takes to complete a cold migration can as little as a matter of second. If the files of the VM are on shared storage its a very quick proceedure. What takes time is claiming an appropriate maintanance windows to gracefully bring down the application, and then time to validate that it is working again. For applications that are network-load balanced, stateless or backed by some 3rd party clustering service running inside the guest operating system it can be process that is seamless to the user, if correctly managed. If however the files of the VM need to be relocated to a new datastore the time for datastore migration can vary based on many competeting variables such as speed of storage, speed of storage fabric (ethernet/FC), and whether the move is between LUNs/Volumes with the same VAAI capable storage array or between two different array vendors.
vMotion, Storage vMotion, Cold Migration (PowerCLI)
Moving a VM with vMotion
The general purpose Move-VM cmdlet can be used to move a VM physical around the vSphere infrastructure, as well as relocating the VM object around in the vCenter inventory. So Move-VM could be used to move VMs from one VM Folder to another, or from one physical vSphere host to another. Remember if you are using VMware Distributed Resource Schedule (DRS) and it is enabled for "Fully Automation" your manual moves may well be fruitless as the system moves VMs around to improve overall performance. To empty (evacuate) a vSphere hosts of all its VMs, it is perhaps more efficent to use "maintanance mode" instead.
The -RunAsync option can use to trigger the command, and then release the prompt to allow you carry on working whilst that the PowerCLI job completes. Without it the cursor is locked until the entire process completes. This can take sometime especially with large Storage vMotions.
Moving a Single VM:
Move-VM corphqdb01 -Destination esx01nyc.corp.com -RunAsync
Moving Multple VMs:
Multiple similtanous moves can be achieved with the use of wildcards and good naming convention. Alternatively, you could use attributes such as folder location or the notes field to move all the VMs belonging to particular team, BU or individual.
Move-VM corphqdb* -Destination esx01nyc.corp.com -RunAsync
Moving a VM with Storage vMotion
Again, the multi-functional Move-VM can be used to relocate a VM to different datastore by simply changing the destination type
Simple Storage vMotion:
In this simple Storage vMotion example all the corphqdb* VMs are relocated from the platinum-nyc datastore to the gold-nyc datastore tier.
Move-VM corphqdb* -datastore gold-nyc -RunAsync
Evacuating VMs from a host with maintainance mode
If you do have VMware DRS enabled one method of moving many VMs from host (for instance to reboot it prior to firmware updates) to another is by using maintainance mode. The PowerCLI to achieve this is:
Set-VMHost -VMHost esx01nyc.corp.com -State "Maintenance"