image01
Why does upgrading Microsoft Windows have to be like climbing Mount Everest?

You can read this whole blogpost if you like. But here’s the very short version. Clean install.

Nothing has really changed in terms of upgrading Microsoft Windows….

Throughout my career I’ve been dogged by a constant question from customers, clients, consultants and students. What works best, a clean install or upgrade? What always gets me about this question is generally the folks asking it already know the answer. Generally, in the shaggy-dog-story of Windows upgrades we all know that clean installs generally work better. In fact a lot of so-called “upgrades” are really just de-installs of one product, with a clean install of the new product on top. In the world of upgrade very little actually happens “in place”. In our industry upgrade processes have a particularly poor reputation. I think that’s because most software companies make little or no effort to testing upgrades against the rigours of a real-life installation. I imagine many just install cleanly the previous version plus some patches, and then a couple of hours later – test their upgrade. That’s why its so often the case that upgrade issues only surface after a product is released, and they are generally found by numb-nuts like me upgrading a product the day of the GA.

As for VMware I’ve always felt that whilst upgrades of both vCenter and ESX in place are viable. It’s really only vCenter that should qualify as genuine upgrade in-place target (rather than a re-install). The VMware ESX host should be regarded as just a block of CPU/Memory. You should (in theory) be able to place an ESX host into maintenance mode (remove it from the Distributed vSwitch if you’re using them) and then remove the host and re-install. That’s because I feel you should really have a robust scripted installation (or PowerCLI scripts) that puts back all the configuration of the host once its back in vCenter. For everyone else there’s an in-place upgrade that actually works pretty well in most cases.

In this first part I’m trying to do an upgrade from SCVMM SP1 (Cumulative Roll-up 4) to SCVMM R2. I thought I would start with the management layer first, and then move on to an upgrade of the Windows Hyper-V hosts. Unfortunately, when I went to run the SCVMM R2 setup program it only gave me the option to uninstall the previous release. The online guides from Microsoft suggest that I shouldn’t get any messages at all – not even a “hi, we see you have SCVMM already installed. Do you want to upgrade it”. 🙂

Edited Low-lights:  R2 Upgrade

  • The Biggie. You need two clusters – Windows Server Hyper-V 2012 and one of Windows Server Hyper-V R2 to do an “online” upgrade that would mean the VMs would always be powered on. That means you need the “spare” hardware to do the upgrade, and there’s storage IOPS generated as you “migrate” VMs from one cluster to the other – as each cluster has its own Cluster Shared Volume (CSVs). This might be viable for large environments, but for SMEs having the resources to build an additional cluster merely to facilitate a software upgrade might be a bit of an ask in my humble opinion
  • Backup/Snapshot your SCVMM!
  • Check & Double-check your database settings
  • Uninstall SCVMM using the installer for SCVMM R2, choosing to retain the SCVMM database
  • Uninstall the previous Microsoft ADK
  • Download & Install the ADK 8.1
  • Reader, reboot! [Remember this IS Windows…]
  • Install SVCMM R2
  • Appearances can be deceptive. Don’t assume that just because a Windows Hyper-V Server is addable to SCVMM that somehow it’s working. Check it status to confirm the VMM Agent is working, and isn’t out of date.  Failure to do this can cause jobs to fail inexplicably. It is VERY easy to be caught out by this. You’ll find you do a piece of admin to resolve an upgrade issue – but that change doesn’t stick. And you’ll end-up looking to the “jobs” view to find your admin didn’t take affect.
  • However, in comparison to upgrading Windows Server Hyper-V 2012 an upgrade of SCVMM is a veritable walk in the park.

SCVMM R2 Upgrade

Before you even consider a SCVMM R2 upgrade check and double-check the database settings. Use the SCVMM own console to remind yourself of the database parameters.

Under Settings, General and Database Connection, SCVMM 2012 SP1 reports its DB configuration.

02

In hindsight a better naming convention would have helped me stop any confusion. So its important to know this information because SCVMM doesn’t use ODBC connections and once the old SCVMM has been de-installed, if you don’t know your DB configuration you will be a little stuck.

[Note: In my later clean install of R2 I opted for a naming convention of scvmmdb-nyc and scvmmdb-nj]

It does look like an “upgrade” to SCVMM 2012 R2 begins with an uninstallation of the previous version.

03
This can take two passes to complete. Despite selecting to remove SCVMM server and console occasionally, I’ve seen this fail and only remove one component. So you may have to run the installer twice in order to remove both components.

The critical thing is despite removing the previous version of SCVMM (both server and console) you must remember retain the database.

04

Before you embark on the install of SCVMM R2 you must first uninstall the old  “ADK” (Assessment and Deployment Kit”) and install the new ADK for Window’s 8.1 otherwise the pre-requisite check will fail.  After the install of the ADK 8.1 you must reboot before attempting the (re)installation.

05

Once you’ve met the pre-requisites it’s a case of installing the software. During the installer the setup program should detect the presence of the previous SCVMM database, and perform an upgrade.

06

At the end of the upgrade there is a warning about post-configuration steps that must be undertaken. Specifically Microsoft makes reference to a “LinkID” document 214130 located at this URL:

http://go.microsoft.com/fwlink/?LinkID=214130

07

The article outlines a number of post-configuration steps including:

  • Re-associating Hosts & Library Servers – The article talks about needing to do this if you haven’t done an in-place upgrade. I find that odd given the “upgrade” is really an uninstall and re-install. I think what’s meant by this is that this might be necessary if you have done a clean install to new VMM server which pointed at the old database.
  • Updating VMM Agents – The older Windows Hyper-V 2012 hosts I have should still be manageable, but it is recommended to upgrade the agents to get full functionality. Microsoft doesn’t detail in the article what the lost functionality there would be. But I did notice that I had a “Needs Attention” warning on all of my hosts08
  • Updating Templates – This second point is not to be overlooked. It rather sneaks out an acknowledgement in a change in the networking. Where VM Networks actually work very similar to “portgroups” in VMware. Where the VM Network actually holds a VLAN property. To tell the truth I don’t think Microsoft networking makes much sense, but it made even less sense before R2. It’s quite a subtle change, and partly what motivated me to do the upgrade. I had problems with Azure Recovery Manager that I’m hoped the upgrade will fix (it did). Mainly around network mappings. I think this is SO important its worthy of screen grab!
  • Updating Driver Packages – Have to be removed and re-added to be discovered.
  • Relocate the VMM Library – In a HA scenario the VMM Library needs relocating to a high-available file server. That means copying stuff from the existing library to the new highly available one.
  • To do online upgrade with zero downtime on the VM. You need two clusters managed by SCVMM R2. VM’s are migrated from one cluster to another as part of the upgrade. This does assume you have enough free servers/capacity to build another clusters to facilitate the upgrade. That might not be an option for SMBs. You could argue that SMBs could tolerate extended downtimes on the VMs, but that assumes a relatively trouble free upgrade process.
  • In place upgrade – this requires all the VMs to be powered down as upgrade the cluster. Not very useful for those who need maximum uptimes. But it’s the situation I’m in. Two SCVMM environments – one cluster in each – and no spare capacity. That said, you cannot have cluster of mixed Windows versions in the same cluster – So, the reality is you end up creating a second cluster anyway. So an in-place upgrade doesn’t really differ that much from an online upgrade.
  • If you had enough spare capacity. You’d cut to the case. Build a new R2 backed cluster, at into SCVMM and get migrating
  • Rebooting stuff fixes problems. Rebooting Windows Hyper-V and SCVMM R2 fixes a lot of problems. So don’t pull out your hair with that pointless troubleshooting, when rebooting stuff cures all your problems! [I’m being ironic here by the way. I’ve never seen a reboot fix an IP conflict – well it does but only temporarily!!!]
  • Adding & Removing Windows Hyper-V 2012 must be undertaken with caution. I know getting out of the car, getting back in again – and then trying the engine is a popular fix. But every time you do that your server you can loose its “Logical Switch” configuration, which then puts SCVMM in a tizz until you put it back again!

Edited Low-lights: Windows Hyper-V 2012 R2 Upgrade

Of course upgrading my Windows Hyper-V 2012 cluster was a bit of a worry to me. Should I upgrade the Windows Hyper-V server in the cluster, or take it out? Should I do a clean install of R2 and then try “registering” the VMs again. Like anybody I read the documentation, but also searched around the bloggersphere as well. Particularly helpful was this blogpost:

http://camie.dyndns.org/technical/hyper-v-2012-to-2012-r2/

Essentially, the debate as ever centres on what works best – clean install or upgrade? As ever you should never base you opinion on the basis of single blogpost – EVEN ME! 🙂

But I do value direct real world experience, over what vendors tell us. It’s clearly in the interest of all vendors to cast the best light on the process of moving from one version to another – because it improves the likelihood of selling additional licenses and reduces the costs with supporting legacy additions. I’m of the opinion that despite what vendors say, customers should upgrade on the frequency that suits their business cycle. I’m also convinced that the best and easiest way of doing that is making the underlying virtualization so thin and slim that upgrading or replacing should be a relatively small undertaking. Anyway, this is what our man at camie.dyndns.org had to say:

09

It’s also clear that Windows Hyper-V 2012 R2 host cannot reside in a Windows Hyper-V 2012 cluster. Whether you like it or not your forced to take one or more Windows Hyper-V hosts, prefer an upgrade, and then create a new cluster (of just one or more nodes) – and then migrate the hosts across that way.

I was begin to see why this guy had wrote:

“Having gone through the process, ultimately there is not a great time saving benefit in upgrading the cluster vs. performing a clean install – and it is obviously tidier.”

But, it seems such a shame to walk away from this configuration that has taken me so much time to setup, especially with the use of both Hyper-V Replica and Azure Hyper-V Recovery Manager being dependent on it.

So it was a deep breath that I decided to jump in with both feet and have a go myself. Perhaps this guy had a bad experience, and things would go smoother for me. I figured if the upgrade went pair-shaped, I could always go through the some hoop jumping or rebuild the Windows Hyper-V hosts altogether.

Windows Hyper-V Failover Node Eviction

The first step was to free up a node in the cluster to a target for a Windows Hyper-V 2012 R2 “upgrade”. I used maintenance mode to evacuate the server of all the VMs. Beware that SCVMM doesn’t give you a special icon to indicate a host is in maintenance mode.

Next using Failover Manager, I needed to make sure that my selected server wasn’t the cluster owner. For me that meant checking that the server I selected wasn’t holding the Hyper-V Broker Role or owning the storage. I pick my lab17 host, but did double-check the configuration in Failover Manager.

10
Here in the “roles” view can see server LAB17 is paused, and that the Hyper-V Replica Broker is running on lab18.
11
Here in the “Disks” view server, LAB19 is owner of the storage.

From SCVMM I could see that LAB17 was completely empty of VMs (i.e. there were no stubborn locally stored VMs that have been left behind). So I next I had to remove the server from the Failover Cluster. This is called an “eviction” process – think of it like being contestant being removed from the Big Brother House. You must first stop the cluster service, before you get the eviction option to light up.

12

Note: This eviction process does produce bogus alarms/warnings on the clustering – telling the system administrator that a node has been removed. Like in the Hitch Hikers Guide to the Galaxy. It’s important not to panic!

13

Various guides strongly recommend capturing the network card configuration before the upgrade, as this appears to be clobbered by the R2 Install. So I took a screen grab and also a ipconfig /all.

14

15

I would recommend at this stage removing the Windows Hyper-V Server from SCVMM. It seems to have a problem cleaning out stale records and entries from Failover Clustering. It’s easier to remove it when it still working – rather wiping the system, and then trying to remove the server. Otherwise SCVMM can endless look for a Windows Hyper-V Server that just doesn’t exist anymore. This is particular true if you opt for the “clean install” approach.

Upgrading Windows Hyper-V 2012 R2

Note: Before you begin be ready with your Windows 2012 key. Unlike Windows 2008, 2012 does demand a valid key to install or upgrade.

In my case I copied the R2 DVD .iso to a local partition on the server, and mounted it. Given what I heard about networking I was bit concerned about losing connection to the media during the upgrade itself. The upgrade was attempted with RDP session remotely, but I did have access to a HP ILO if anything went pear shaped. So it was deep breathe that I double click “setup.exe”

16

Despite having patched the box to hilt, it was recommended to give Windows Update another crack at the cherry.

17

Once this completes – setup will restart, and you will be asked to supply license key, and what edition of Windows you want to use.  After this you will get the option to upgrade.

18

However, despite evicting the server from the cluster, the upgrade failed claiming the server was still part of a cluster.

19

Checking back through the blogposts, I’d notice that simply removing a server from a cluster isn’t enough. There’s a “clean-up” process that needs to take place using PowerShell:

Clear-ClusterNode -Name “lab17” -Cluster “gold” –Force

Sure enough this step did the trick. Restarting the setup, I was left with just the vanilla warning about checking third party app vendors support for R2. Gee, I thought I was dealing with a hypervisor, not operating system. Just sayin’.

20

One thing I would say about this “eviction” process is that isn’t “clean” as you might think. I found that even when a Windows Hyper-V node as been evicted from the cluster – it is still referenced in SCVMM.

Whilst this upgrade was proceeding I decided to make a start on the Windows Hyper-V servers that were in my “New Jersey” location. I discovered through digging around that the server “LAB14” wasn’t running the Hyper-V replica role and wasn’t the owner.  Sadly, I was less successful with this server – despite using the same DVD ISO I kept on getting this error message:

21

I couldn’t work out why the previous server had upgraded without a problem, but this second server would not. At route of this is Microsoft policy around upgrades as they pertain evaluations:

http://technet.microsoft.com/en-us/library/jj574204.aspx

It’s rather convoluted but technically they don’t support upgrades of evaluation editions. They do support you paying for the product, and then upgrading from that position. This rather blew a massive hole in my attempt to try out the upgrade. Somehow I’d stumbled upon an “evaluation” edition of Windows Server 2012 rather than full retail release.  After some googling I came across utility called DISM (that stands for the Deployment Image Servicing and Management Tool) that can be used to change editions and evaluations – assuming you have the right keys, like mine from TechNet..

22

Post-Upgrade Tasks

Folks were right about the effect on the network of the upgrade. Despite the Windows Hyper-V Server being in SCVMM the whole time, the Logical Switch configuration was blown away.

23

24

This is clearly disappointing given efforts made to put together a Logical Network.  However, it didn’t take long to update the configuration adding in the Logical Networks under “Hardware” and the Logical Switch under “Virtual Switches”.

As I was going to need another cluster, I arrange for two additional LUNs to be presented to the R2 Host. These are disks 4/5 in my configuration. This is because the other LUNs are owned by my first cluster, and cannot be used by the second cluster.

25

These two disks needed to be brought online and initialised before creating the cluster.

I’m sorry to say that not for the first time – trying to create a cluster using SCVMM did not work, and I was forced to use Failover Manager instead to diagnose why. I also had to remove the Windows Hyper-V host (because the cluster refused to appear in SCVMM) and also install the SCVMM Agent manually on the host. Without it the cluster had the Windows Hyper-V server stuck in a “pending” state because of a version mis-match with the SCVMM. In the end I uninstalled all the VMM agents, rebooted the Windows Hyper-V host and re-installed it again.

26 27

I did eventually get this resolved. But all this removing and re-adding Windows Hyper-V Servers had mucked up my networking. So it took me a while to put that configuration back.

My only outstanding issue was an error on the second cluster – which incidentally contained one Windows Hyper-V node. It was reporting as being “over-committed”. That’s understandable considering a cluster of one server isn’t a cluster. So, I did need to go into the cluster setting with SCVMM to change the “Cluster Reserve” value to 0. If I didn’t make this reconfiguration the host would never pass any of the validation tests used when creating or moving a VM.

28

Now I had two cluster running – one of Windows Hyper-V 2012 and the other composed of Windows Hyper-V 2012 R2, was to begin the process of evacuating the old cluster of VMs. The trouble was every time I went to right-click the VM, and select Migrate – the SCVMM console would crash.

30

It was time hope that our old friend “reboot” would fix this issue.

Reader, the reboot fixed my problem.

Migrate – At Last

Believe it or not its 9.12pm at night, and I’ve been at this all day. But I have finally managed to get two clusters one of Windows Hyper-V 2012 and the other of Windows Hyper-V 2012 R2 in the same management plane with the network working. And I was able to take a test machine and migrate that form the old cluster to the new. You should know that this a unidirectional move, in that your allowed with a powered on VM, to move from old to new, but not from new to old. This is despite the fact that the VM is still the same “Gen1” legacy VM. The precise error looks like this:

31

So I was rather pleased that it was just a test VM I’d move rather than one of my VMs using Windows Hyper-V Replica.  To move this test VM back to the old cluster I had to power it off and do a cold migration. That said that did fail; result in a failed job; required a ‘repair’ process – and second attempt before it did finally succeed. In never did get my test VM to go back to the old cluster – so the ease way to fix this issue was to delete the VM. It was only a test VM after all.

Deleting stuff always solves problems. 🙂

The other issue I had was a VM wouldn’t move to a new cluster because the Windows Hyper-V Server was in maintenance mode.

32

Simple you might think, take the host out of maintenance mode, and job done. Sadly, that was not working for me. This was a bit of a Catch-22. I couldn’t get out of maintenance mode, because SCVMM couldn’t manage the host.

33

I did try the command mentioned in the dialog box concerning “winrm”. Running this command did fix my issue. However, this merely exposed another problem that the host wasn’t manageable.

34

I will fess up here. The server LAB14 wasn’t upgraded to Windows Server Hyper-V R2, it was cleanly installed. So technically this isn’t an upgrade issue as such – although it is being added to an upgraded SCVMM. A quick look at the status of the server showed there was a problem.

35

What’s not clear to me is why SCVMM on R2 thinks that Windows Hyper-V R2 server that was cleanly installed – believes it to be using an out of date agent version. Why this happened still isn’t clear to me. Looking at “Programs and Features” on LAB14 indicated no software has been installed.  I found that a install of the Local Agent from the SCVMM DVD iso fixed this issue…

This then exposed yet another problem. I was beginning to see a pattern here. As I worked to resolve one error, I was merely confronted by another. It was like one problem merely disguised another. I was left we wondering where would this end, as there was no visibility to these new problems until I resolve the previous ten. This time around the issue appeared to be the fact that a Hyper-V Replica protected the VM. I was trying to move a 40GB dynamic disk (7.7GB in actual size) to a 200GB Clustered Shared Volume. But SCVMM was saying I didn’t have enough space to carry out that task. In a separate Windows the system was bellyaching about the Hyper-V Replica.

36 37

I decided to see if turning off the Hyper-V Replica on one of my VMs would resolve the issue. Sadly, my upgrade of SCVMM to the R2 release had broken/removed the option to do this from the “provider” from Azure Recovery Manager – so I was forced to this via Hyper-V Manager. I put on my checklist a re-install of the provider software at later date.

Anyway, it once again turned out that the install of the Local Agent hadn’t been successful. And that SCVMM was not able to interrogate the server capabilities – so I ended up re-install the SCVMM Local Agent and rebooting for what felt like the Nth time. To discover I could carry out the migration at last. It was now day two, and lunchtime.

My next step was to migrate the VM out of one cluster into another. This is essentially copy process. As VMs are copied from the Cluster Shared Volume of one Failover Cluster to another different Cluster Shared Volume in a different cluster – even with “SAN Assisted” accelerations and 2GB fibre-channel connection this was going to take some time. The theory is to eventually evacuate the legacy cluster, and to eventually remove it. That does pre-suppose that you have the spare storage and compute capacity carry out the process.  The other assumption is every migration will be successful, and not fail and then require the notorious “repair” function.  For instance I have two VMs moved to the new cluster which stubbornly refuse to repair, and an in a perpetual “Unsupported Cluster Configuration”.

38

No amount of “repairing” fixed the issue despite the jobs view telling me that’s what I needed to do. In the end I deleted the VMs. Life too short for this. It was 1.44pm.

Well, at least all the VMs were on the R2 cluster and I was able to begin the process of upgrading the other host.  As this was the last node in the cluster, it couldn’t be evicted. Instead the cluster needed to be shutdown….

Conclusions

I did eventually end up with an upgrade R2 clusters for both of my SCVMMs. The trouble is there were in such a pigging mess, they weren’t much use to me. I wanted to reconfigure my networking, but I couldn’t. The VM network had too many references to VMs that simply didn’t exist. I could not delete my VM Network, because of these orphaned objects – and of course I could delete these zombie VMs because they didn’t appear in SCVMM anymore! It was the ultimate catch22 – delete these object that don’t exist anymore.

I know VMware, like any software company has had their own upgrade issues in the past. After all there isn’t an ISV out there with completely unblemished record. But I do feel that the procedure from getting from 2012 to 2012 R2 is overly convoluted. I mean come on, this is there is within the same release! With VMware all I would do is upgrade vCenter, and then use something like VMware Update Manager to upgrade the hosts. Maintenance mode would evacuate the host, and the VMware ESXi host would remain in the cluster. They’d be none of this needing a second cluster to facilitate the migration.

Whilst an upgrade of SCVMM to the R2 release is painless enough – the knock on effect is felt on the Windows Hyper-V 2012 R2 servers. As for Windows Hyper-V R2 itself, I agree with my fellow blogger. Upgrades are a waste of time. You might as well re-install. The easiest way would be to have spare capacity enough to build a new cluster. Then at least you could avoid having to power-down the VMs. If customers don’t possess this spare capacity they will have do what I did was gradually moving across bundles of VMs, until their legacy cluster is empty.

Will I be using my upgraded environment in future? Probably, not as I don’t have confidence in it – is it me or is it the upgrade I will be asking myself all the time. So this was interesting as an experiment.

At the beginning of this article I talked about endlessly being asked “what works best upgrade or clean install”. I think when it comes to Microsoft; we all know the answer to that one by now, don’t we? This is something I knew already to tell you the truth, but I wanted to be able to make this statement based on first hand experience, rather than previous painful ones. It’s almost like this question of upgrade/install was forged in the Microsoft era within which I grew up. It’s sort of become a mantra of our industry. But I feel strongly that such a status quo cannot continue. With our multi-tiered virtualization infrastructure with innumerable inter-dependences, they whole idea of ground zero approach or flaky upgrade process must stop. Our worlds too complicated to have to rebuild stuff every two or three years.

Having gone through the painful process of upgrade I think its likely that Microsoft’s long-suffering customers would be more likely to run legacy and R2 side by side in separate management domains. Gradually, winding down, and repurposing the old kit when appropriate – that is a classic corporate strategy when face with upgrade/migration paths that resemble climbing Mount Everest.