EVO:RAIL – Under The Covers – Networking (Part3)
This is my third and final blog post about the networking side of the configuration of EVO:RAIL. In this blog post I want to talk about the current configuration of the networking from a vSphere Virtual Switch perspective. But before I get into that here is a reminder of the physical world. Each EVO:RAIL node (there are four of them per appliance) has two 10Gbps network cards which are presented as vmnic0 and vmnic1.
In the diagram below you can see that vmnic0 of node01/02/03/04 is being patched to one physical switch, and vmnic1 is patched to a second – with both switches linked by ISL interfaces for switch-to-switch traffic. This ensures network availability to the appliance.
So how are these physical interfaces used by vSphere once EVO:RAIL has done its work of configuring the appliance itself? Firstly, EVO: RAIL 1.x uses the vSphere Standard Switch. To be more specific it uses vSwitch0 which is built into the VMware ESXi hypervisor, and patches both “vmnic0” and “vmnic1” to the Standard Switch. This means any traffic on the vSwitch0 has network fault tolerance and redundancy.
You can see this configuration on any of the four nodes that make up an EVO:RAIL system from the Web Client. If you select an ESXi host in the Inventory, select the Manage tab and Networking – you can see the vSwitch0 in the interface and see that vmnic0 and vmnic1 are patched to it.
The “Virtual Machine” portgroups called “Staging”, “Development” and “Production” were created and specified using the EVO:RAIL Configuration UI supplied by the customer. The other “vmkernel” portgroups you see, such as “Virtual SAN” and so on, are system generated by the EVO:RAIL Configuration Engine. Of course, customers are free to add extra virtual machine portgroups as they define new VLANs on the physical switch. EVO:RAIL does support being configured for external IP storage such as NFS or iSCSI using the standard vSphere Clients. But it’s perhaps best not to change the settings to the system generated “vmkernel” portgroups unless you are very experienced with vSphere and know what you’re doing, as casual changes there could cause problems if they aren’t correctly thought through.
It’s worth mentioning that the configuration of the Standard vSwitch0 doesn’t end there, and that per-portgroup settings are applied as well. Essentially, what happens is that the vCenter Server Network, vSphere vMotion, Management Network and MARVIN Management networks are pegged to use vmnic0 as their “active” adapter, with vmnic1 set to be “standby”. You can view this configuration by selecting one of these portgroups, and clicking the ‘pencil icon’ to access the portgroup settings dialog box. In the screen grab below I opened the settings for the vSphere vMotion vmkernel portgroups, and selected the “Teaming and failover” section. Here you can see that per-portgroup settings have been used to peg the vMotion process to vmnic0. This means when all things are good the traffic prefers to use vmnic0. vMotion traffic would only traverse the vmnic1 interface if the vmnic0 failed, or if the physical switch the vmnic1 was attached to failed.
In contrast the “Virtual SAN” vmkernel portgroup has the reverse configuration – such that vmnic1 is its preferred/active interface, and vmnic0 is the standby.
Clearly, there are a couple of expected outcomes from this style of configuration. Firstly, Virtual SAN (or VSAN if you prefer) has dedicated 10Gps NIC assigned to it that it does not share with any other process within the vSphere stack. This means it has exclusive access to all the bandwidth that NIC can provide on separate physical switch from other traffic. The only time that all the traffic can be on the same NIC is if there is a NIC failure or switch failure.
Secondly, as the vMotion, Management and Virtual Machine traffic would normally reside on vmnic0, some care has to be taken to make sure that vMotion itself doesn’t ‘tread on the toes’ of other traffic types. This is something the EVO:RAIL engine takes care of automatically for you. The EVO:RAIL Configuration engine will peg the vMotion process to have 4Gps bandwidth.
Finally, you might be interested to know the role and function of the “MARVIN Management” portgroup. In case you don’t know “MARVIN” was the original project name for EVO:RAIL prior to GA. I suspect that over time we will be replacing this name with the official name of EVO:RAIL. As you can see we have two management networks. The portgroup called “Management Network” holds the customer’s static IP address for each ESXi host in the cluster. This is the primary management port. You could consider the “MARVIN Management” as a type of “Appliance Network”, a network that is used internally by the EVO:RAIL engine for its own internal communications. This means that, should a customer tamper with their own “Management Network”, internal EVO:RAIL processes continue. In an environment where there is no DHCP server residing on the management VLAN, you would expect to see this “MARVIN Management” portgroup default to a 169.254.x.y address. However, if there were a DHCP Server running on the default management network then it would pick up an IPv4 address from it.
Note: In this case as there was no DHCP server running on the network the ESXi host was assigned a ‘link local’ or ‘auto-IP’ IP address. This isn’t a problem. EVO:RAIL uses the VMware Loudmouth service to discover EVO:RAIL nodes on the network. The EVO:RAIL engine will take care of configuring the host and vCenter with the customer IP configuration.
So to conclude – EVO:RAIL 1.x currently uses Standard Switches in vSphere for networking. All traffic is pegged to vmnic0, except for the VSAN network which gets a dedicated 10Gps NIC. Customers are free to add additional portgroups for the virtual machine network, but it’s perhaps wise to leave the ‘system generated’ vmkernel ports alone. EVO:RAIL has two management networks – one using the static IP pool provided by the customer, and the second which is used by the appliance itself.