In my previous blog post I walked you through what happens when adding an additional EVO:RAIL appliance to an existing deployment or cluster. Now I want to look at the next important workflow. You could relate this to the issue of serviceability. There are a number of scenarios that need to be considered from a hardware perspective, including:

  • Replacing an entire node within the EVO:RAIL appliance
  • Replacing a boot disk
  • Replacing a HHD or SSD used by VSAN
  • Replacing a failed NIC

There are surprisingly few circumstances that would trigger the replacement of an entire node. They usually fall into the category of a failed CPU, memory or motherboard. It’s perhaps worth stating that our different Qualified EVO:RAIL Partners (QEP) have customized the procedure of how they handle these sorts of failures relative to how they handle these issues for other hardware offerings. For instance one partner might prefer to replace the motherboard if it fails, whereas another will see this as being easier to address by shipping a replacement of the node altogether. That’s the subject of this blog post – the scenario where an entire node is replaced by the QEP.

As you might know from my previous post every EVO:RAIL has its own unique appliance ID, say MAR12345604, and every node within that appliance has its own node ID expressed with a dash and number, for instance -01, -02, -03 and -04. When the appliance ID and node ID are combined together it creates a global unique identifier that reflects that node on the network. These values are stored in the “AssetTag” part of each node’s system BIOS settings, and are generated and assigned at the factory.

So if for instance node03 died and had an identity of MAR12345604-03, then a replacement node would be built at the factory and shipped to the customer with the same ID. The old node would be removed and dumped in the trash, and the new node would be slotted into its place, and powered on for the first time. At this point a little EVO:RAIL magic takes place. When the replacement node is powered on for the first time, it advertises itself on the network using the “VMware Loudmouth” daemon. This advertisement is picked up by the existing EVO:RAIL appliance, and it recognizes firstly, that this node should be part of the same appliance, because it has a matching appliance ID, and secondly that it is specifically used to replace a failed node.

In the EVO:RAIL UI this appears as an “Add EVO:RAIL Node” pop-up message – indicating that a node was “serviced” and can be “replaced”.

Screen Shot 2015-04-14 at 16.16.27

The steps taken by this workflow are similar but different to adding additional appliances to an existing cluster:

  1. Check Settings
  2. Unregister conflicting ESXi host from vCenter Server
  3. Delete System VMs from replacement server
  4. Place ESXi hosts into maintenance mode
  5. Set up management network on hosts
  6. Configure NTP Settings
  7. Configure Syslog Settings
  8. Delete Default port groups on ESXi host
  9. Disable Virtual SAN on ESXi host
  10. Register ESXi hosts to vCenter
  11. Setup NIC Bonding on ESXi host
  12. Setup FQDN on ESXi host
  13. Setup Virtual SAN, vSphere vMotion and VM Networks on ESXi host
  14. Setup DNS
  15. Restart Loudmouth on ESXI host
  16. Setup clustering for ESXi host
  17. Configure root password on ESXI host
  18. Exit maintenance mode on the ESXi host

Once again, you’ll notice I’ve highlighted a key step in bold – that’s Step 2. One process that “Add EVO:RAIL Node” workflow automates (amongst many others!) is clearing out dead, stale and orphaned references to ESXI host in vCenter that have shuffled off this mortal coil.

That might leave you with one question begging. Given that the ‘replacement node’ has the same appliance ID, how does the EVO:RAIL engine “know” that this is a replacement node? The answer is that before the “Add EVO:RAIL Node” pop-up appears the node reports its configuration to the core EVO:RAIL engine running inside the vCenter Server Appliance (vCSA). The EVO:RAIL engine inspects the node to check it is blank and just has a generic factory specification.

If you want to experience this process of adding a replacement EVO:RAIL node at first hand, don’t forget our hands-on-lab now showcases this process. Check out the HOL at:

HOL-SDC-1428 – Introduction to EVO:RAIL