Bidirectional Relationships and Shared Site Configurations

From vmWIKI
Jump to: navigation, search

Originating Author

Michelle Laverick

Michelle Laverick.jpg

Video Content [TBA]

Bidirectional Relationships

Version: vCenter SRM 5.0

So far this book has focused on a situation where the Recovery Site is dedicated to the purpose of recovery. This Recovery Site could be hired rack space provisioned by a third-party company; this is very popular in smaller organizations that perhaps only have one datacenter or have a datacenter that is small and does not have the resources to be both a Protected Site and a Recovery Site at the same time. As with conventional redundancy, this “dedicated” Recovery Site model is not especially efficient as you are wasting valuable financial resources to protect yourself from an event that might never happen. Like home insurance and car insurance, this is a waste of money—until, that is, someone breaks into your house, or steals your car, drives around like a lunatic, and sets on it fire for kicks.

Due to licensing and other associated costs, it is much more efficient for two or more datacenters to be paired together to offer DR resources to each other. Such a configuration is referred to as a bidirectional configuration in the official VMware SRM documentation. I’ve left this type of configuration for the end of the book—not because I thought most people wouldn’t be interested, but because of the following three main reasons. First, I wanted to make it 100% crystal-clear which tasks are carried out at the Protected Site (site pairing, the array manager configuration, and inventory mappings) and which tasks are carried out at the Recovery Site (Recovery Plans). Second, permissions are simpler to explain and test in a conventional Protected Site and dedicated Recovery Site configuration. And third, at this stage my hope is that you should now have a very good understanding of how SRM works, so it shouldn’t be too difficult to add a bidirectional configuration to an existing unidirectional configuration.

I want to say something about the process of setting up bidirectional configurations in general. Simply put, it is much easier than it has been in any previous release of SRM. It’s a difficult fact to explain or prove if you haven’t used the product before. Configuring this kind of active/active relationship used to require several steps, especially around the array manager configuration. I think the new UI has a lot to do with this. Now that the SRM sites are more visible to one another, rather than being locked to their specific vCenter instances, the whole operation is much more seamless. At this stage, I did make some major storage changes. Previously, the Recovery Site (New Jersey) just had access to replicated volumes from the Protected Site (New York). For a bidirectional configuration to work you clearly need replication in the opposite direction. When this happens the clear distinction between the Protected and Recovery Sites breaks down, as they are both Recovery and Protected Sites for each other. If it helps, what I’m changing from is an active/passive DR model to an active/active DR model, where both sites reciprocate to each other—both running a production load, while at the same time using their spare capacity to offer DR resources to each other.

Configuring Inventory Mappings

I created a new resource pool and folder structure on both the New Jersey and New York sites. This was to allow for inventory mappings created for the first time between New Jersey and New York. Personally, I feel that for simplicity the various sites should almost be a complete “mirror” of each other. In other words, the resource pools and folder structures are identical wherever possible. In reality, this mirroring may not be practical or realistic, as no two sites are always identical in terms of their infrastructure or operational capabilities. To reflect this, and to make this section more interesting, I decided that the New Jersey location would take a totally different approach to how it allocated resources. The New Jersey location uses LUNs/volumes to reflect different line-of-business applications such as Finance, Help Desk, Sales, and Collaboration, and it also uses vSphere vApps in its configuration (see Figure 13.1).

SRM now supports the mapping of vApps at the Protected Site to vApps in the Recovery Site. So I created a resource pool called DR_NJ and mapped all the vApps in New Jersey to vApps created manually in New York (see Figure 13.2). The important fact to note here is that there is no fancy option; just right-click a vApp and click the Protect button. vApps are themselves just collections of VMs. What’s new here is SRM’s ability to map a vApp in the Protected Site to a vApp in the Recovery Site.

Bidirectional-relationships- (01).jpg

Figure 13.1 The New Jersey site with a collection of applications in the format of vApps

IMPORTANT NOTE: SRM does not link vApps together, so the settings in the vApps in the Protected Site are not duplicated over at the Recovery Site. SRM does not use the start-up orders present in the vApp, but by default leverages the priority orders and VM dependencies created in the Recovery Plan.

Of course, this process didn’t stop with just the resource mappings. I also needed to create mappings for folders and networks as well. Additionally, I confirmed that there was a placeholder datastore. So I mapped a single folder that contained New Jersey’s vApps to a single folder on the New Jersey location called NJ_DR_vApps (see Figure 13.3).

I also mapped the port groups representing the VLANs at New Jersey to the port groups in New York (see Figure 13.4).

Bidirectional-relationships- (02).jpg

Figure 13.2 vApps in New Jersey (vcnj.corp.com) mapped to vApps in New York. The recovery vApps are labeled NJ_ApplicationName.

Bidirectional-relationships- (03).jpg

Figure 13.3 vApps exist as a container in the VMs and Templates view, so there is no need to create a folder for each vApp.

Bidirectional-relationships- (04).jpg

Figure 13.4 In this case, vlan51 is mapped to vlan11 and vlan52 is mapped to vlan12, and so on.

Refreshing the Array Manager

Unlike the first time we set up SRM, the two locations are paired already; there is no need to pair the two sites together again. We must refresh the array manager so that the SRM and SRA at New Jersey are aware of which volumes are available and which are replicated. I updated my datastore folder structures to take into account the fact that I was building out a bidirectional configuration (see Figure 13.5). This was largely a cosmetic decision, but I thought it would make it clear to me, and hopefully to you. For me, this aptly shows how flexible and dynamic a vSphere 5.0 and SRM 5.0 environment can be. It takes a matter of minutes to totally restructure your environment based on your business needs and organizational changes.

To refresh the array manager, follow these steps.

1. In SRM, select the Array Manager pane.

2. Select the Devices tab.

3. Locate the Refresh button (see Figure 13.6) on the far right, and click it.

Once the refresh has completed, you should see that the device list is updated and includes arrows pointing in both directions (see Figure 13.7). In my case, this represents the replication from New Jersey to (A) New York and from New York to (@) New Jersey. In this case, I had the New Jersey site selected as “local.” If I’d logged in to the New York vCenter first, the arrows would be in opposite directions. This view is useful before, during, and after the failover and failback process to validate your configuration.

You can see that my new datastores at the New Jersey site have yet to be allocated to Protection Groups.

Bidirectional-relationships- (05).jpg

Figure 13.5 VMware vSphere folders are an effective way to organize datastores. Remember, these folders can have permissions set on them.

Bidirectional-relationships- (06).jpg

Figure 13.6 If new datastores are created it’s good to refresh the array manager to confirm that SRM is aware of the new storage.

Bidirectional-relationships- (07).jpg

Figure 13.7 This view, taken from New Jersey, shows that the “application” datastores are being replicated to New Jersey.

Creating the Protection Group

Again, creating a Protection Group does not differ substantially in a bidirectional configuration. But you might now begin to see the value of Protection Group folders. When Protection Groups are created they are visible to both the Protected and Recovery Sites. This is because they are needed for failover as well as failback. By default, they are presented in a flat window, so unless you have a good naming convention and you know your environment well, determining which Protection Groups belong where could become tricky (see Figure 13.8). Simply using Protection Group folders remedies this potential confusion quickly and easily. In my case, there is almost a one-to-one relationship between datastores and Protection Groups, except in the case of the Database Protection Group that contains two datastores.

Bidirectional-relationships- (08).jpg

Figure 13.8 Protection Group folders will help you quickly and easily separate Protection Groups available from one site to another.

Bidirectional-relationships- (09).jpg

Figure 13.9 Recovery Plans are placed into Recovery Plan folders to make it easy to see that they are associated with particular sites.

Creating the Recovery Plan

Again, Recovery Plans do not differ substantially in a bidirectional configuration, and you will see that using Recovery Plan folders makes more sense once you have two locations offering DR resources to each other (see Figure 13.9).

Using vApps to Control Start-up Orders

As you might know, vApps do have their own start-up orders that are configured on the properties of the vApp itself (see Figure 13.10). It is possible to “turn off” SRM’s method of powering on VMs and expressing VM dependencies. I don’t recommend this approach as it largely defeats the point of using SRM as a tool for automating DR, and it has not been tested by VMware as a supported configuration. Nonetheless, I thought you might be interested in how it is done, and what the advantages and disadvantages are. You can turn off SRM’s power-on function using the Virtual Machines tab on the Recovery Plan; select all the VMs, and change their Startup Action to be Do Not Power On (see Figure 13.11). A legitimate use of this feature is to recover VMs and then manually decide if they are needed in the DR event. This could be used as a way to preserve key resources such as memory.

Bidirectional-relationships- (10).jpg

Figure 13.10 Using the up and down arrows, it is possible to set the start-up orders on VMs.

Bidirectional-relationships- (11).jpg

Figure 13.11 Through the standard multiple-select options present in Windows it is possible to select groups of VMs to not be powered on.

Then you need to add a custom script that will use PowerCLI to power on the vApp that will force the system to power on using the order defined in the vApp itself. This script will leverage the redirect.cmd batch file I created in Chapter 11, Custom Recovery Plans, and the pass-through authentication of PowerCLI when the service account in SRM has been changed to one that has rights in the vCenter environment.

c:\windows\system32\cmd.exe /C c:\redirect.cmd c:\start-finance.ps1

The start-finance.ps1 file has a PowerCLI command in it that uses the Start -vApp cmdlet together with the name of the vApp in the Recovery Site. The start-sleep command is used to ensure that the script runs after the VM storage recovery process has completed, and the VMs contained in the vApp are in a fit state to be powered on.

start-sleep -s 300

start -vapp NJ_FinanceApp

Finally, all you need to do now is add a command step to the Recovery Plan using redirect. cmd to in turn call the PowerCLI script. That command step is step 5 in Figure 13.12.

For me, the biggest downside of this approach is evident if the VM is referenced in more than one Recovery Plan. Disabling the power-on function of a VM in one Recovery Plan disables it for all Recovery Plans if it is referenced elsewhere (see Figure 13.13).

Bidirectional-relationships- (12).jpg

Figure 13.12 A 300-second wait is added to the script to ensure that the VM storage is correctly configured before initiating a power on.

Bidirectional-relationships- (13).jpg

Figure 13.13 Removing a VM’s power-on function in the Virtual Machines tab disables it from being started in SRM in all Recovery Plans.

In short, this approach means abandoning SRM’s built-in power-on and VM dependencies functionality, and resorting to manual power on of all affected VMs. I hope that by seeing what’s involved in this process you realize this approach is not viable. Sometimes you can learn a lot by knowing the wrong way to do something, as well as the right way. Personally, I think the power-on process could be more easily managed using the priority orders in the Recovery Plan. If my finance application has three roles—a back-end database, an application server, and a Web front end—nothing would be stopping me from putting db01, db02, and db03 in priority order 1, and app01, app02, and app03 in priority order 2.

This completes our discussion of configuring a bidirectional relationship, and hopefully you can see the heavy lifting is mainly taken care of by the initial configuration of the Protected Site when SRM was first deployed. There are some unique one-off settings that can be reused when building out a bidirectional model, such as the fact that the sites are already paired together, and that the array configuration is already in place.

Shared Site Configurations

WARNING: The term shared site should not be uttered by someone who has poorly fitting false teeth or who has consumed large amounts of alcohol.

A shared site configuration exists in SRM when more than one production location is set to receive its DR resources from SRM’s Recovery Site services. It allows for a so-called “spoke and hub” configuration to be created where one Recovery Site offers DR resources for many production sites or Protected Sites. Such a configuration would be used by a large company or by a service provider that offers SRM as an outsourced solution to companies that do not have the resources to manage their own DR location. Clearly, in this commercial scenario permissions must be correctly assigned so that the customer does not start managing the Recovery Plans of a totally different business! It’s also necessary to prevent duplicate name errors, which could happen if two different companies had VMs with the same VM name. Although this separation sounds complete—even with the correct permissions—one customer can see the events and tasks of a different customer.

At the moment, running the SRM installer with command-line switches triggers the shared site configuration. In time this may become integrated into the Install Wizard or, better still, as part of the wizard used to pair sites together. The command-line switches allow you to generate what VMware calls an SRM “extension” ID. In terms of the product, the configuration of SRM remains the same (array manager, Protection Groups, inventory mappings, and Recovery Plans) and this allows for each SRM instance to be uniquely created. The requirements for the shared site configuration are quite simple. The new site—say, Washington DC—would need its own ESX hosts, vCenter, VMs, and SRM server. The New Jersey site, for example, would act as its DR location and would have a second SRM host added to it. In this scenario, New Jersey’s site would have two SRM hosts configured to one vCenter—one facilitating failovers from New York and another facilitating failovers from Washington DC. Also, the New Jersey site would become a DR “hub” for New York and Washington DC.

From a DR provider’s perspective, this means customers have their own dedicated SRM host, but they can share the compute resources of the entire DR location. Some customers might not be happy with this, and may want their own dedicated ESX hosts that are used to bring VMs online. Of course, they will pay a premium for this level and quality of service. It’s perhaps worth stating that some service providers I have spoken to are not yet satisfied that this level of separation is sufficient for their needs. Admittedly, these providers are looking toward DR delivered via the cloud model that is still in early development.

When you log in to the SRM you log in to a particular extension which relates to the Recovery Site that you are managing. Rather than running the installation normally, you would run the SRM installer with this switch:

/V" CUSTOM_SETUP=1"

This adds two additional steps to the installation wizard. First, a mandatory step enables the custom plug-in identifier, followed by the settings for the custom SRM extension. The extension has three parameters.

• The SRM ID

A piece of text of no more than 29 characters, although you can use characters such as underscores, hyphens, and periods. I recommend you stick to purely alphanumeric characters, as underscores, hyphens, and periods can cause problems if they are used at the beginning or end of the SRM ID. The SRM ID should be the same on both SRM hosts in the Protected and Recovery Sites. Essentially, this makes SRM pair the sites on the shared SRM Site ID value. The SRM ID creates a variable by which multiple SRM instances can be uniquely identified—and must be the same on both the Protected and Recovery Site SRM hosts for them to be successfully paired together. In my case, there will be an SRM server in Washington DC and an SRM server in New Jersey. The SRM ID for both installs will be the string WashingtonDCsite.

• Organization

A friendly 50-character name. No restrictions apply to this field with respect to special characters.

• Description

A friendly 50-character description. No restrictions apply to this field with respect to special characters.

Once the install has been completed, you continue to pair up the sites as you would normally. When the pairing process has completed, during the logon process you will be able to see all the custom extension IDs together with their descriptions. While this information is not commercially sensitive, it is not possible to hide these refer-ences; they need to be visible so that folks can select which site contains their recovery configuration.

In my current configuration, my Protected Site and Recovery Site are already paired together. This configuration cannot be changed without uninstalling the SRM product. So it’s perhaps worth thinking about how important the shared site feature is to you, before you embark on a rollout. That might sound like bad news, but it is not the end of the world. There is nothing stopping me from creating a new site representing, say, Washington DC, and then adding a new SRM server in the Recovery Site to be paired with the new site that needs protection. As you can see, I created a new vCenter instance for the Washington DC datacenter (vcwdc.corp.com) together with the SRM server for that location (srmwdc.corp.com). To allow for the shared site configuration I created another SRM server at the Recovery Site, called srmwdc-rs. Also, I created a resource pool in the New Jersey datacenter to hold the placeholders for Washington DC, called WDC_ DR. Of course, for this to work a SQL database was created for Washington’s vCenter and SRM instances (see Figure 13.14).

Bidirectional-relationships- (14).jpg

Figure 13.14 vSphere client view, now with three datacenters and multiple instances of SRM servers

Installing VMware SRM with Custom Options to the New Site (Washington DC)

The first task will be to install the SRM server at the new location—in my case, Washington DC. We will run the SRM installer with a special custom option that will show new steps in the graphical wizard that assists in the installation process. This will allow us to set the SRM ID for the site which will be used at both the Protected and Recovery Sites to enable the pairing process.

1. Log in to the new Protected Site SRM server. In my case, this is srmwdc.

2. Open a command prompt and run the SRM installer with the following switch:

/V"CUSTOM_SETUP=1"

The complete string will look something like this, with N representing the version and build number of SRM:

VMware-srm-5 .N.N-NNNNNN.exe /V"CUSTOM_SETUP=1"

3. Complete the setup routine as normal. At the Protected Site, I used the name “Washington DC” (see Figure 13.15).

4. In the plug-in identifier window, select the Custom SRM Plug-in Identifier option (see Figure 13.16).

5. Enter the SRM ID, organization name, and a description (see Figure 13.17).

Once the installation is complete, you will be able to log in to the site’s vCenter and connect to the SRM service. The only noticeable difference at this stage is that there is an additional field in the Summary tab of the site that shows the SRM ID field (see Figure 13.18).

Bidirectional-relationships- (15).jpg

Figure 13.15 The local site name of “Washington DC” will make it easy to identify in the user interface.

Bidirectional-relationships- (16).jpg

Figure 13.16 The SRM plug-in identifier options only appear if you use the custom setup option.

Bidirectional-relationships- (17).jpg

Figure 13.17 SRM will use the SRM ID in the pairing process, and will expect the Washington DC SRM server in the Recovery Site to have the same ID.

Bidirectional-relationships- (18).jpg

Figure 13.18 A shared site configuration differs visually only by the reference to the SRM ID value in the Summary tab of the site.

Installing VMware SRM Server with Custom Options to the Recovery Site

Now that the Protected Site is installed, it’s time to run the same routine at the new SRM server at the Recovery Site; this is the VM I called srmwdc-rs. During the install, it’s important to remember that when prompted for the vCenter details you need to specify the vCenter at the Recovery Site (in my case, this is srmnj.corp.com). The site name must be unique, as you cannot pair two sites together that have identical locations; however, the two sites will be paired together and identified uniquely by the Site ID parameter specified during the custom installation. Again, run the installation as before with the additional /V switch:

VMware-srm-5 .N.N-NNNNNN.exe /V"CUSTOM_SETUP=1"

At the Recovery Site I used “Washington DC Recovery Site” as the friendly name (see Figure 13.19). Washington DC doesn’t have a dedicated Recovery Site. Instead, it shares the DR resources of New Jersey with its sister location of New York.

During the configuration of the Custom SRM plug-in you must enter the same SRM ID as you did during the install of the Protected Site SRM (see Figure 13.20).

Bidirectional-relationships- (19).jpg

Figure 13.19 In this case, I labeled the site differently by adding the word Recovery to its friendly name.

Bidirectional-relationships- (20).jpg

Figure 13.20 The SRM ID is the same for both the Protected Site SRM host in Washington DC and its dedicated SRM host in New Jersey.

Pairing the Sites Together

After this process, you use the Configure Connection Wizard to pair the two sites together. Previous versions of SRM required a special plug-in to be installed to extend the vSphere client functionality; this is no longer required, although the installer still references this with the term plug-in identifier. All that is required now is the install of SRM with the SRM ID specified. During the pairing process, the Site ID is matched at the Protected Site with the SRM host with the same SRM ID at the Recovery Site. So, at the heart of any shared site configuration is the unique SRM ID value that must be the same at the Protected and Recovery Sites for the pairing to work. If you like, the pairing is being made not just between one SRM server and another, but to the specific SRM ID value which is the same at the Protected and Recovery Sites (see Figure 13.21).

In Figure 13.21, it looks as though Washington DC has its own Recovery Site. In reality, there is still only one vCenter in New Jersey, which is now the DR location for both New York and Washington DC. When the pairing process takes place the administrator for Washington DC would still require login details to the New Jersey vCenter and as the inventory mappings were configured the administrator would see all the resources in New Jersey allowed by user rights and privileges.

Once this pairing is complete, the rest of the configuration proceeds in exactly the same manner as the standard installation, and you can begin to configure the array manager, Protection Groups, and Recovery Plans. Figure 13.22 shows my Washington DC Site and Washington DC Recovery Site paired together. Although they have different site names, they are linked by virtue of the SRM ID.

Bidirectional-relationships- (21).jpg

Figure 13.21 The Protected Site (Washington DC) and Recovery Site (Washington DC Recovery Site). The shared SRM ID pairs them together.

Bidirectional-relationships- (22).jpg

Figure 13.22 This dialog box appears whenever the administrator at Washington DC switches from the local site to the Recovery Site.

When you come to switch from the Protected Site (New York or Washington DC) to the Recovery Site (New Jersey), you will be presented with a dialog box from which to select your SRM ID and add friendly descriptions (see Figure 13.22). The reference to <default> is the initial site pairing between my New York and New Jersey sites, which I created in Chapter 7, Installing VMware SRM.

Finally, if I was connected to the Recovery Site (New Jersey) and wanted to switch from one SRM ID to another—say, I wanted to switch to the New York site—I could click the Log Out option at the Recovery Site and then reconnect to the SRM Service (see Figure 13.23). This option is available from the Commands option on the far right-hand side of the Sites view in SRM.

Bidirectional-relationships- (23).jpg

Figure 13.23 The Log Out option lets you log out of the current instance of SRM.

Decommissioning a Site

Let’s assume, for whatever reason, that the Washington DC site was to be decommissioned. A good example of this is if you were using SRM to facilitate a datacenter move from Washington DC to New Jersey. The removal process from an SRM perspective would be a reversal of the normal work. Once the virtual machines in the old site (Washington DC) were up and running, the old site could be removed. In this case you would do the following.

1. Remove the Recovery Plans used for the planned migration.

2. Remove the Protection Group used for the planned migration. This should unregister and delete the placeholder VMs and their corresponding .vmx files.

3. Remove the array manager configuration from the old Protected Site (Washington DC in my case).

4. “Break” the pairing of the Protected Site to the Recovery Site.

Strictly speaking, there is no requirement to “break” the site relationship. I just like to do it. The real reason for this is to undo the pairing process in cases where the administrator makes an error, such as pairing two sites together accidentally.

5. Rerun the vCenter Linked Mode Wizard, to remove the old site’s vCenter instance and isolate it to being a stand-alone vCenter.

6. Uninstall the SRM product from the Protected Site SRM and from its previously paired Recovery Site SRM server.

7. Revisit your licensing and remove any stale sites that may be listed. On the right-click of a stale site there is a Remove Asset option that is used to complete this process.

This process must be carried out in this particular order; otherwise, error messages will appear. For example, removal of the array manager configuration while it is in use with Protection Groups will create a pop-up message preventing you from making that change.

Summary

Once you understand the principles and concepts behind SRM, a bidirectional or shared site configuration is really an extension of the same principles we covered in earlier chapters. The only complexity is getting your mind around the relationships. Perhaps occasionally you stopped in the chapter to clarify to yourself what the relationships were between the two locations, both in SRM and in the storage array. You are not alone; I did the same thing. I got so wrapped up in the Protected Site/Recovery Site view of the world that it took me some time to adjust my thinking to accept that each location can have a dual functionality. Of course, I always knew it could, but adjusting to that switch once you have the concept that Site A is the Protected Site and Site B is the Recovery Site just takes a little time. In a bidirectional configuration, if your Protected Site (New York) and Recovery Site (New Jersey) are configured similarly it’s sometimes tricky to keep the relationships clear in your head—and that’s with just two sites!

You will find this becomes more so when we deal with failover and failback—especially with failback. I really had to concentrate when I was doing my first failback and writing it up. Anyway, I digress, as that is the subject of the next chapter: failover and failback, and running our Recovery Plans for real—what some folks call “Hitting the Big Red Button.”