Alarms, Exporting History, and Access Control

From vmWIKI
Jump to: navigation, search

Originating Author

Michelle Laverick

Michelle Laverick.jpg

Video Content [TBA]

Alarms, Exporting History, and Access Control

Version: vCenter SRM 5.0

You will be very pleased to learn that SRM has a large number of configurable alarms and a useful reporting feature. Alarms are especially well defined, with lots of conditions that we can check on. SRM 5.0 introduces a whole series of new alarms, including ones that cover the vSphere Replication (VR) process. The action we can take in the event of an alarm being triggered includes sending an email, sending an SMNP trap, or executing a script. It’s perhaps worth stating something very obvious here: SMTP and SNMP are both networked services. These services may not be available during a real disaster; as such, you may not wish to rely on them too heavily.

Additionally, you will find that SRM does not have its own specific “events” tab. Instead, SRM events are included alongside your day-to-day events. I think that as long as you have allocated used roles and permissions for SRM you should be able to filter by these accounts, which should improve your traceability. After I have covered the topic of access control, I will include some filtering/searching screen grabs to illustrate what I mean.

vCenter Linked Mode and Site Recovery Manager

Before I jump into looking at alarms, I want to take a moment to discuss the importance of vCenter linked mode to this chapter. The usefulness of linked mode to the SRM administrator is somewhat diminished what with the improvements to the SRM interface that allow for almost seamless switching from one site to another using the core UI. Previously, without linked mode, an SRM administrator would require two vSphere client windows to be open: one on the Protected Site vCenter and one on the Recovery Site vCenter.

With that said, I still think it is useful to have linked mode available in the day-to-day management of environments comprising multiple vCenters.

When I started to write this edition of the book I was tempted to introduce this feature in Chapter 7, Installing VMware SRM, but I was worried that the distinction between a Protected Site and a Recovery Site might be obscured by the use of this feature. At this point in the book, I think that if you are new to VMware SRM this distinction is more than clear. I would like to introduce linked mode into the picture for one main reason. When you have your vCenter set up with linked mode among the licensing data, the vCenter also shares the “roles” created in vCenter; this will have a direct impact on our coverage of access control in this chapter. I feel that if you use linked mode it will dramatically cut down the number of logins and windows you will need open on your desktop, and significantly ease the burden of setting permissions and rights with the Site Recovery Manager product itself.

My only worry about linked mode in the context of SRM is that you are creating a relationship or dependency between the Protected and Recovery Sites. SRM doesn’t require the linked mode feature to function. However, when balanced against the advantages of linked mode I think this anxiety is unfounded. Anything that eases administration and reduces the complexity of permissions and rights has to be embraced.

Normally, you enable linked mode when installing vCenter. If you haven’t done this, a wizard exists on the vCenter Start menu that lets you rerun that portion of the vCenter installation. When you run the Linked Mode Wizard you must use a domain account that has administrative rights on both the Protection Site and the Recovery Site vCenters. To run the wizard, follow these steps.

1. Log in to the Recovery Site vCenter.

2. Select Start | Programs | VMware and then click the vCenter Linked Mode Config-uration Properties option. Click Next.

3. Select the radio button to “Modify linked mode configuration.”

4. Ensure that in the “Linked Mode configuration” dialog box the option to “Join vCenter server instance to an existing linked mode group or instance” is selected. Click Next.

5. In the Connect to a vCenter Instance dialog box, enter the FQDN of the Protected Site vCenter. After the installation is complete, you will see something similar to Figure 12.1.

Access-control- (01).jpg

Figure 12.1 Two vCenter instances in one window, aggregating the vCenter inventory into a single view

When you click the Site Recovery Manager icon after the first login, you will still be asked for a username and password to communicate to the Site Recovery Manager server. Switching between the Protected and Recovery Sites is simply a matter of selecting them from the navigation bar (see Figure 12.2).

When you first do this you will be asked for the credentials of the SRM host. Although linked mode cuts down on the number of vCenter logins, you still must authenticate to the SRM host. This is an additional layer of security, and it is worth noting that your vCenter credentials might not be the same as your SRM credentials. When you are asked for authentication via SRM be sure to use the DOMAIN\username format to complete the login process.

Access-control- (02).jpg

Figure 12.2 Select the site from the navigation bar.

Alarms Overview

Alarms cover a huge array of possible events including, but not limited to, such conditions as the following:

• License status

• Permission status

• Storage connectivity

• Low available resources, including:

• Disk

• CPU

• Memory

• Status of the Recovery Site, including:

• Recovery Site SRM is up/down

• Not pingable

• Created/deleted

• Creation of

• Protection Groups

• “Shadow” placeholder virtual machines

• Status of Recovery Plans, including:

• Created

• Destroyed

• Modified

The thresholds for disk, CPU, and memory alarms are set within the GUI from the Advanced Settings dialog box within the vSphere client. You will see the Advanced Settings option when you right-click the Site Recovery node (see Figure 12.3).

Once the Advanced Settings dialog box has opened, it’s the localSiteStatus node that contains the default values for these alarms (see Figure 12.4).

Access-control- (03).jpg

Figure 12.3 The Advanced Settings option is available when you right-click each site in the SRM inventory.

Access-control- (04).jpg

Figure 12.4 You can adjust alarm tolerances in the localSiteStatus node. Notice how the friendly name of the site can be modified here also.

As you would expect, some alarms are more useful than others, and in some respects they can facilitate the correct utilization or configuration of the SRM product. Additionally, you will notice that both the Recovery and Protected Sites hold the same alarms. Configuring both sites would be appropriate in a bidirectional configuration.

Here are some examples.

•Example 1

You may want an alarm to be raised whenever there is a failure to protect a VM within a Protection Group.

•Example 2

Although Recovery Plans have a notification or message feature, you will only see the message if you have the vSphere client open with the Site Recovery Manager plug-in. It might be desirable to send an email to the appropriate person as well.

•Example 3

Failure to receive a ping or response from the Recovery Site could indicate misconfiguration of the SRM product, or some kind of network outage.

Creating a New Virtual Machine to Be Protected by an Alarm (Script)

Unlike the scripts executed in a Recovery Plan, alarm scripts are executed by either the Protected Site vCenter or the Recovery Site vCenter. As such, these scripts must be created and stored on the vCenter responsible for the event. This can be identified by the use of the word Protected or Recovery in the event name. One of the most common administrator mistakes is simply forgetting to enable the alarm. Administrators seem to be so focused on configuring the alarm condition that they forget that once it is defined they must remember to click back to the General tab and enable the alarm in the first instance. The icons for enabled and disabled alarms are very similar to each other and have the same color, so you may struggle to spot the difference. An alarm that has yet to be enabled has a red cross next to its name; an enabled alarm does not have this cross (see Figure 12.5).

Access-control- (05).jpg

Figure 12.5 Alarms that are not configured are marked with a red cross (bottom), whereas configured alarms are clear (top).

To set an alarm, follow these steps.

1. Select the Protected Site.

2. Select the Alarm tab and double-click the VM Not Protected alarm (see Figure 12.6).

3. In the Edit Alarm dialog box, select the Actions tab.

4. Click the Add button.

5. From the pull-down list, select Run a Script, and enter the following, as shown in Figure 12.7:

C:\Windows\System3 2\cmd.exe /c c:\newvmscript.cmd

One alarm can have multiple actions, so it’s possible to create a condition that will send an email and SNMP trap, as well as run a script.

6. On the Protected Site SRM, create a script called newvmscript.bat with this content:

@echo off

msg /server:mf1 administrator “A new VM has been created and is waiting to be configured for protection”

This script is only intended as an example. I do not recommend use of the Messenger Service in production.

Access-control- (06).jpg

Figure 12.6 A range of alarms covering virtual machines, and their relationship with SRM

Access-control- (07).jpg

Figure 12.7 Alarms can contain scripts just like Recovery Plans.

Creating a Message Alarm (SNMP)

To create a message alarm, follow these steps.

1. At the Recovery Site, click the SRM button.

2. Select the Alarm tab and double-click the Recovery Profile Prompt Display alarm.

The Recovery Profile Prompt Display alarm means the Recovery Plan has paused with a message step and is waiting for an operator to respond to it.

3. In the Edit Alarm dialog box, select the Actions tab.

4. Click the Add button.

5. From the pull-down list, select “Send a notification trap.”

Unlike with the “Send a notification email” option, the destination/recipient is not defined here; instead, it is defined in vCenter by selecting Administration | vCenter Server Settings | SMNP. By default, if you run an SNMP management tool on vCenter in the “public” community you will receive notifications. To test this functionality I used the free utility called Trap Receiver (www.trapreceiver.com/); VMware also uses this in its training courses to test/demonstrate SMNP functionality without the need for something like HP Overview. I installed Trap Receiver to the Recovery Site vCenter server to test the SNMP functionality.

In my case, I added a message at the end of my Database Recovery Plan that simply states “Plan Completed!” (see Figure 12.8).

Creating an SRM Service Alarm (SMTP)

To create an SRM service alarm, follow these steps.

1. At the Recovery Site, click the SRM button.

2. Select the Alarm tab and double-click the Remote Site Down and Remote Site Ping Failed alarm.

3. In the Edit Alarm dialog box, select the Actions tab.

4. Click the Add button.

5. From the pull-down list, select “Send a notification email” and enter the destination/ recipient email address (see Figure 12.9).

Access-control- (08).jpg

Figure 12.8 Although not suitable for production environments, Trap Receiver is useful for basic configuration tests.

Access-control- (09).jpg

Figure 12.9 The Send Email action defaults to the email address specified in the SRM installation; you can change it to any email address required.

In the Edit box, enter an email address of an individual or a group that should receive the email. Again, configuration of the SMTP service is set in vCenter by selecting Administration | vCenter Management Server Configuration | SMNP. By default, SRM will use the email address provided during installation of SRM for the administrator’s email. Emails will be trigged when the “Disconnected” message appears in the SRM Summary page. The actual emails produced with alarms can sometimes be cryptic, but I think they do the job required.

Exporting and History

It is possible to export a Recovery Plan out of Site Recovery Manager as well as to export the results of a Recovery Plan out of Site Recovery Manager. The export process can include the following formats:

• Word

• Excel

• Web page

• CSV

• XML

Although Recovery Plans can be exported out of SRM, they cannot be imported into SRM. The intention of the export process is to give you a hard copy of the Recovery Plan, which you can share and distribute without necessarily needing access to SRM. Currently, SRM defaults to open the exported file at the location where your vSphere client is running. If the system on which you are running the vSphere client does not have Microsoft Word/ Excel, this will fail. The plan is still exported, but your system will fail to open the file. In my experiments, Microsoft Word Viewer 2007 worked but Microsoft Excel Viewer 2007 did not. Additionally, Excel Viewer could not open the CSV format. I found I needed the full version of Excel to open these files successfully. The XLS file comes with formatting, but as you would expect the CSV file comes with no formats whatsoever.

Exporting Recovery Plans

To export a Recovery Plan, follow these steps.

1. In SRM, select your Recovery Plan.

2. Click the Export Steps icon (see Figure 12.10).

3. From the Save As dialog box, select the format type. The output of the plan looks like Figure 12.11, which was taken from Word Viewer.

Access-control- (10).jpg

Figure 12.10 Recovery Plans and the history of Recovery Plans are exportable in many different formats.

Access-control- (11).jpg

Figure 12.11 A sample export of a Recovery Plan

Recovery Plan History

SRM has a History tab which will show success, failure, and error summaries, and allows you to view previous runs of the Recovery Plan in HTML format or export them in other formats as indicated earlier. For many people this is one of the top features of SRM; it means they can test their Recovery Plan throughout the year and demonstrate to external auditors that correct provisions have been made for potential disasters. To see the Recovery Plan history you simply do the following.

1. At the Recovery Site SRM, select a Recovery Plan.

2. Click the History tab, select a previously run Recovery Plan, and click View or Export (see Figure 12.12).

Access-control- (12).jpg

Figure 12.12 Recovery Plan history is viewable in HTML and exportable to many other formats.

Access-control- (13).jpg

Figure 12.13 The results of a successful Recovery Plan

Clicking View automatically outputs the report of the Recovery Plan in HMTL format, whereas clicking Export enables output in the other formats discussed previously. Figure 12.13 shows the results of a successful Recovery Plan.

Although the report shows a successful test, these reports truly become useful in trouble-shooting scenarios. When errors do occur in a Recovery Plan it is sometimes tricky to see the long “status” messages when the plan is executing. The easiest way to grab these error messages is to generate an output of the report, and then use copy and paste to grab the message that way. The message can then be pasted into service requests, forums, and blog posts.

Access Control

Call it what you will, but permissions, access control, and change management are part and parcel of most corporate environments. So far we have been managing SRM using a default “administrator” account for every task. This is not only unrealistic, but also very dangerous—especially in the realm of DR. DR is such a dangerous undertaking that it should not be triggered lightly or accidentally. Correctly setting permissions should allow SRM to be configured and tested separately from the process of invoking DR for real. Although this is a high-level “C-Class” executive decision, management of the process should be in the hands of highly competent, trained, and ideally well-paid IT staff. Alarms can be very useful if you have an environment that is changing rapidly or if you don’t have tight change control management in place. At least with alarms the SRM administrator can be alerted to changes in the environment caused by others. In an ideal world, the left hand knows what the right hand is doing. But we don’t always live in an ideal world, so if alarms are configured, your attention is a least drawn to changes in the environment that could impact SRM and a successful recovery process.

SRM introduces a whole raft of roles to vCenter, and as with day-to-day vCenter rights and privileges, the SRM product displays the same “hierarchical” nature as vCenter. An additional layer of complexity is added by having two vCenter systems (Protected and Recovery Site vCenters) that are delegated separately. It’s worth saying that in a bidirectional configuration these permissions would have to be mutually reciprocal to allow for the right people to carry out their designated tasks properly.

As with the alert actions, access control is driven by authentication services. For many people, this will mean Microsoft Active Directory and Microsoft DNS. If these services fail or are unavailable you may not even be able to log in to vCenter to trigger your Recovery Plan. Proper planning and preparation are required to prevent this from happening; also, you may wish to develop a Plan B where a Recovery Plan could be triggered without the need for Active Directory. Depending on your corporate policies, this could include the use of physical domain controllers, or even the use of local user accounts on your vCenter and SRM system. From a security perspective, local user accounts are frowned upon, to say the least, in most corporate environments. So the first step is to review the default vCenter permissions which allow full access to vCenter using the local administrator account on the vCenter server itself. You should also notice that with the release of VR new roles have been added to vCenter.

The Site Recovery Manager roles include

• SRM Administrator

• SRM Protection Groups Administrator

• SRM Recovery Administrator

• SRM Recovery Plans Administrator

• SRM Recovery Test Administrator

• VR Replication Viewer

• VR Recovery Manager

• VR Virtual Machine Replication User

• VR Administrator

• VR Diagnostics

• VR Target Datastore User

You can see these roles in the vSphere client by selecting Home | Administration | Roles. If you copy these roles to create new ones, it can take time if you are in linked mode for them to be replicated to other vCenters in the environment. At the bottom of the roles list in the vSphere client you will see the warning shown in Figure 12.14.

Creating an SRM Administrator

It might be seen as interesting to define each of these roles, but I think transposing the definitions of them into this book would be quite tedious. Instead, I think it’s more helpful to think about the kinds of tasks an SRM administrator might need to perform during the course of maintaining and managing an environment. For example, if a new datastore were created, potentially a new Protection Group would be needed. Similarly, as new virtual machines are created, they must be correctly configured for protection. We would also want to allow someone to create, modify, and test Recovery Plans as our needs change. In the following scenario, I’m going to create some users—Michael, Lee, Alex, and Luke— and allocate them to a group in Active Directory called SRM Administrators. I will then log in as each of these users to test the configuration and validate that they can carry out the day-to-day tasks they need to perform. The plan is to allow these guys to only carry out SRM tasks, with the minimum rights needed for day-to-day maintenance of the SRM environment. The configuration will allow these four individuals to manage a unidirectional or active/passive SRM configuration. In other words, they will be limited to merely creating and executing Recovery Plans at the Recovery Site.

For me, one of the first stages in taking control of any system is to make sure I do not need or use the built-in administrator(s) delegation, and that there is a dedicated account that has access to the environment. This will allow me to consider removing the default allocation of privileges to the system. As usual, Windows defaults to allow the first administrator to log in. To create an SRM administrators delegation you need to create the group in Active Directory, before assigning it to the inventory in SRM. Then, from SRM, you can start your delegation. It doesn’t really matter which vCenter you connect to as your rights and privileges will be assigned to whichever site you select. However, in keeping with the style of the book, I connected to the Protected Site vCenter, because the two sites are paired together.

Access-control- (14).jpg

Figure 12.14 New roles in vCenter require some time to be duplicated to other vCenters in a linked mode configuration.

1. Select the Protected Site (Local) and click the Permissions tab.

2. Right-click underneath Administrators for the local site, and select Add Permission (see Figure 12.15).

3. Use the Add button in the Assign Permissions dialog box to locate your SRM Administrators group. From the Assigned Role pull-down list select the role of SRM Administrator, and ensure that the option to Propagate to Child Objects is selected (see Figure 12.16).

You can repeat this process for the Recovery Site location as well. This should give the SRM Administrator group complete control over the SRM system, and it means you no longer need the built-in administrator delegation.

Access-control- (15).jpg

Figure 12.15 SRM has its own permission tab, as with elsewhere in the vCenter inventory a right-click is used to add permissions.

Access-control- (16).jpg

Figure 12.16 From the pull-down list you can select the built-in SRM roles.

Summary

As you can see, SRM significantly extends and adds to vCenter’s alarm, report, and access control features. And while alarms may not have the configurable options you might see in the main vCenter product, such as triggers and reporting tabs, the sheer number of alarms or conditions we can trap on is a very welcome addition to what was once an underdeveloped aspect of the core vCenter product until vSphere 4 was released. Again, the ability to run reports in SRM is a great addition, as once again it’s a feature we don’t usually see in the core vCenter product. In one respect, VMware’s investment in vCenter is paying dividends in allowing its developers to extend the product’s functionality with plug-ins. Personally, I feel other additions to the VMware stable of applications, such as VMware View, need to join in the party too. In this respect, I think VMware SRM engineers have lit a torch for others to follow.

This more or less concludes this particular configuration type. So far this book has adopted a scenario where your organization has a dedicated site purely for recovery purposes, and I now want to change this scenario to one in which two datacenters have spare CPU, memory, and disk capacity from which they can reciprocate recovery—for example, a situation where New Jersey is the Recovery Site for New York and New York is the Recovery Site for New Jersey, or where Reading is the Recovery Site for London and London is the Recovery Site for Reading. For large corporations, this offers the chance to save money, especially on those important and precious VMware licenses. In the next chapter we will also look at the multisite features which allow for so-called “spoke-and-hub” configurations in which one Recovery Site offers protection to many Protected Sites.