Welcome to SRM

From vmWIKI
Jump to: navigation, search


Currently, SRM is a DR automation tool. It automates the testing and invocation of disaster recovery (DR), or as it is now called in the preferred parlance of the day, “business continuity” (BC), of virtual machines. Actually, it’s more complicated than that. For many, DR is a procedural event. A disaster occurs and steps are required to get the business functional and up and running again. On the other hand, BC is more a strategic event, which is concerned with the long-term prospects of the business post-disaster, and it should include a plan for how the business might one day return to the primary site or carry on in another location entirely. Someone could write an entire book on this topic; indeed, books have been written along these lines, so I do not intend to ramble on about recovery time objectives (RTOs), recovery point objectives (RPOs), and maximum tolerable downtimes (MTDs)—that’s not really the subject of this book. In a nutshell, VMware SRM isn’t a “silver bullet” for DR or BC, but a tool that facilitates those decision processes planned way before the disaster occurred. After all, your environment may only be 20% or 30% virtualized, and there will be important physical servers to consider as well.

This book is about how to get up and running with VMware’s SRM. I started this section with the word currently. Whenever I do that, I’m giving you a hint that either technology will change or I believe it will. Personally, I think VMware’s long-term strategy will be to lose the “R” in SRM and for the product to evolve into a Site Management utility. This will enable people to move VMs from the internal/private cloud to an external/ public cloud. It might also assist in datacenter moves from one geographical location to another—for example, because a lease on the datacenter might expire, and either it can’t be renewed or it is too expensive to renew.

With VMware SRM, if you lose your primary or Protected Site the goal is to be able to go to the secondary or Recovery Site: Click a button and find your VMs being powered on at the Recovery Site. To achieve this, your third-party storage vendor must provide an engine for replicating your VMs from the Protected Site to the Recovery Site—and your storage vendor will also provide a Site Recovery Adapter (SRA) which is installed on your SRM server.

As replication or snapshots are an absolute requirement for SRM to work, I felt it was a good idea to begin by covering a couple of different storage arrays from the SRM perspective. This will give you a basic run-through on how to get the storage replication or snapshot piece working—especially if you are like me and you would not classify yourself as a storage expert. This book does not constitute a replacement for good training and education in these technologies, ideally coming directly from the storage array vendor. If you are already confident with your particular vendor’s storage array replication or snapshot features you could decide to skip ahead to the Chapter called Installing VMware SRM. Alternatively, if you’re an SMB/SME or you are working in your own home lab, you may not have the luxury of access to array-based replication. If this is the case, I would heartily recommend that you skip ahead to the chapter called Configuring vSphere Replication (Optional).

In terms of the initial setup, I will deliberately keep it simple, starting with a single LUN/ volume replicated to another array. However, later on I will change the configuration so that I have multiple LUNs/volumes with virtual machines that have virtual disks on those LUNs. Clearly, managing replication frequency will be important. If we have multiple VMDK files on multiple LUNs/volumes, the parts of the VM could easily become un-synchronized or even missed altogether in the replication strategy, thus creating half-baked, half-complete VMs at the DR location. Additionally, at a VMware ESX host level, if you use VMFS extents but fail to include all the LUNs/volumes that make up those extents, the extent will be broken at the recovery location and the files making up the VM will be corrupted. So, how you use LUNs and where you store your VMs can be more complicated than this simple example will first allow. This doesn’t even take into account the fact that different virtual disks that make up a VM can be located on different LUNs/volumes with radically divergent I/O capabilities. Our focus is on VMware SRM, not storage. However, with this said, a well-thought-out storage and replication structure is funda¬mental to an implementation of SRM.