3-layers-vCD-vSphere-Physical-NetworkPools.png

I’ve been reading a lot about network pools in vCloud Director. There’s still so much to learn, but I feel I’m gradually making headway.

There’s a lot of attributes, requirements, best practices and so on to keep on mind. So in this blogpost (mainly for my benefit to be honest) I want to try and sum those up, and also at the end share my thoughts about how and when to use them. In attempt to relate the technologies to how they might be adopted given the constraints of the real world.

The first thing to get clear before the get-go is which object in vCD owns which network resource:

A Provider vDC:

Owns the physical resources of CPU/Memory and Storage – and it also “owns” the external networks that exist. External Networks give the Organizations access either to the Internet, the wider Corporate network or possibly enough network access to allow a VM to mount storage directly to the guest – such as installing/configuring the NFS/iSCSI stack to the VM to give it “direct” access to NFS export or iSCSI Target. These external networks are to great degree relatively static, and to set them up you must first manually create a portgroup on DvSwitch most likely with some kind of VLAN Tagging to allow this to take place. That’s why whenever you create a new Provider vDC in vCloud Director is shows you what external networks are available to it…

http://communities.vmware.com/servlet/JiveServlet/showImage/38-16806-23923/Screen+Shot+2012-11-01+at+17.18.37.png

A Organization vDC and vApps:
In contrast Organizational vDC and their vApps get their networks from the network pool. These are much more “dynamic” allocations which have a high degree of automation (in the ideal cases). It means Organizations can create new networks on demand without having to trouble the network teams. Rather than making the network teams “redundant” it actually frees up them up to more interesting and useful work than keeping virtualization people happy. I think that is something that is often overlooked in the whole Software-Definied Networking/Storage debate. It isn’t so much about making network/storage people unemployed, but freeing/empowering to do more interesting work than provisioning new networks or LUNs/Volumes…

The Rules and Regulations of Network Pools and Organization vDCs:

You can have many network pools, many Organizations and many Organization vDCs – however there are some regulations around those relationships. I see these relationships a bit like the old database relationships that I had to learn back in the 90’s when I was “apps” guy. So Organizational Virtaul Datacenters are created from network pools, and can be only assigned to one pool at a time. Although its perfect the possible for one Organization that contains many Virtual Datacenters to share the same network pool. Indeed its possible for vDC in one organization to share the same network pool of another vDC in a different Organization. What cannot happen is for an Organization vDC to be attached to TWO network pools.

The Requirements & Advantages/Disadvantages of Different Network Pool Types:

Network Pools don’t exist in thin air. Fundamentally they allow a VM in a vApp to speak on the network. As such any network pool must be backed by method that allows that access to occur. In vCD 1.5 there were 3 types of network resource (VLAN, VCD-NI, Portgroup) and vCD 5.1 add a fourth called VXLAN. Each method comes with its own set of requirements – and these are often the source of the “disadvantages”. You could say “that’s life”. Any software has requirements  – and often it falls to us to make sure they are met – otherwise they don’t work OR they come with performance or reliability hit. It’s a matter of opinion of whether a particular set of requirements adds up together to be so difficult to meet – that they make one method more desirable than other. I’ve come across many cases in my IT career where a sub-optimal method was chosen simply because it was the easiest to get approval for. My attitude has always been if the project deadline is such that you must do this – then do it – but at the same time set of the change-management control request to at some stage re-configure to the more optimal method. Because you know one day you will rub up against the limits/downsides of the least optimal method. To be brief – sometimes you have to blow with the prevailing wind, but don’t let the temporary direction of the weather blow away your preferred destination… In many ways I see this as being similar to the wider history of adoption of virtualization in the previous decade. I was told very often, by many people that this virtualization thang was going to be corralled into the domain of test/dev and DR. It took virtualization sometime to prove itself to the degree that no folks adopt a “virtualization first” policy to any net-new applications.

VLAN Backed Network Pools (Requires Distributed vSwitches)

These are network pools that are backed by pools of VLANs being configured on the physical switch(es). So what the tenant of Organization vDC doesn’t see is that these pools of VLANs are created long before they arrive in the building.  Often the portgroups on the Distributed vSwitch to access these networks aren’t created until they are needed such as on the first power on of a vApp. Unlike the other network backing methods VLAN backed network pools come from a well-known method (the VLAN!) and are directly routable to other network devices (because they know what VLANs do). They have no special requirements such as multicast networking or changes to the MTU values of either the VM, the Virtual Switches or the Physical Switches. On the downside it means for every virtual network you have you need a equal and corresponding VLAN. That could be seen as an “ask” from the team who manage the physical network – as we would need them create potentially large pools of VLANs upfront – before we even need them. There’s also limitation on the maximum number of VLANs physical switches support – in my case – the NetGear GS748T that have in my lab only supports 64 VLANs. That should be more than enough in my lab environment, but I doubt would be enough in large multi-tenancy environment. Of course the method used by vSphere to access these Physical-Virtual-LANs is VLAN-trunking and 802.3 Tagging. I dare say if you have been using vSphere for a while this has already been enabled sometime ago…

One thing that has occurred to me about this method is the potential for admin-error. So a naughty vCD admin could create network pools that point to VLANs that haven’t been created at the physical layer. You’re also very much dependent on the chaps at the physical and network layer getting all their ducks in a row – so the VLANs can be consumed on demand as when needed. It makes me wonder how much testing validation might need to happen before hand to have 100% confidence that it will be all right on the night…

VCD-NI Backed (Requires Distributed vSwitches)

(Believe me vCD-NI is easier to say that “vCloud Director Network Isolation Network Pools”)

This method creates many software-defined “networks” within single wire. It’s doddle to setup from a vCD perspective, and requires no massive pool of VLANs to be created upfront. The VCD-NI method uses what’s called “mac-in-mac” encapsulation to separate the networks – so as packets leave one vSphere host to arrive at another vSphere host (when one VM speaks to another VM) it adds 24bytes to the Ethernet packet. To me this not unlike the tag & untag process we do with VLANs at the moment. I guess another analogy is that this mac-in-mac tunnel is not unlike the VPN tunnels that get created from one site to another – in this case its a tunnel between two ESX host. I know that’s bit of stretch as analogy, but that’s the nature of analogies isn’t it…?  This encapsulation contains the source/destination MAC addresses of the two hosts as well as the source/destination of the VMs themselves. Once the packet arrives at the destination ESX hosts the 24bytes is stripped off. This process is handled by the VSLAD agent that is part of the VSLA module in the vmkernel. Once they are created you can see VCD-NI backed networks from the ESXi host with the command:

esxcfg-moduel -l grep vsla

Once you know how VCD-NI backed networks function its pretty easy to see the requirements:

  • Increase the MTU on Physical Switches/Virtual Switches to stop packets being fragmented
  • (Optional) Ensure the MTU within the Guest Operating System is set to be -24bytes less than the MTU

Also bear in mind that only an ESX host can remove a VCD-NI tags that represent the mac-in-mac data (the 24bytes). Therefore if a VCD-NI backed VM tried to communicate directly to system like a router. The router would look at the tagged packets and say “What the hell is this vCD-NI data, I don’t know what to do with this packet…”

I think given the requirements you you would probably want to use the new health check options in the web-client on vSphere5.1 to ensure everything was lined up correctly. The penalty for not getting the MTU ducks-in-a-row lined up is not insignificant. According to the official vCD course it states that fragmented packets can degrade network performance by as much as 50%. Fragmentation happens when a frame that is 1524 bytes in size hits a device configured just for 1500. Ethernet is forced to break-up the packet into bit-sized chunks. So 9000 MTU packet would be split into 6 x 1500 packets – each with their own Ethernet & IP header payload/overhead. The graphic below shows me enabling the “Health Check” feature on my “Virtual DataCenter DvSwitch” on the “Manage” tab, under “Health Check”.

Screen Shot 2012-11-05 at 16.47.29.png

If a misconfiguration or change in configuration is detected on the Distributed vSwitch then this will generate an alarm within vCenter like so:

Screen Shot 2012-11-05 at 17.40.29.png

If all things are right in the world – the status should be in green on the “Monitor” and “Health” views.

Screen Shot 2012-11-05 at 17.46.56.png

PortGroup Backed (Works with all vSwitches and Cisco Nexus 1K)

I think the portgroup backed method is very similar to the VLAN-backed method –  with one critical difference. Not only do all the VLANs at the physical switch need to be defined before you create the pool, but so do all the portgroups that point to them as well. This allows vCD to create one network to one portgroup. On the plus side portgroup backed network pools don’t require the Enterprize+ version of vSphere – but I personally think a vCD environment without Enterprize+ is a bit like having the nice new Ferrari with Fiat 500 engine inside it. The two go hand-in-hand. Portgroup backed network pools also are compatible with the Cisco Nexus 1K, but given you have to Enterprize+ to use it you had have to have a compelling usage case to want to couple the two together. I’m assuming that anyone using this configuration – would have a great deal of automation in place already – and therefore the automation that comes with VLAN-backed or VCD-NI backed network pools is less significant to them. That probably means if you using Standard vSwitches you have a pretty good PowerCLI scripting at your environment or your using the skills that come with Cisco folks to both create the VLANs at the physical layer, and at the same time create vSphere portgroups that back them at the same time. I guess you could say at a push that if your vCD environment at an Organization level is pretty static – then this work could be done upfront before going live or enrolling a new tenant. But that hardly sits well with this dynamic, elastic, on-demand la-la-la world were all supposed to be living in….

VXLAN Backed: (Requires Distributed vSwitches and vSphere5.1)

For me the VXLAN backed method is very similar to the VCD-NI approach – and you may remember I blogged about some of the requirements a couple of weeks ago:

Part 10: My vCloud Journey Journal – Add a Provider vDC and VXLAN

As you might (or might not recall) the very act of adding a Provider vDC results in VXLAN backed network pools being created. This is fine if you have already enabled VXLAN on the Distributed Switches (refer to the post above to see what this is like). VXLAN has a much larger namespace of networks available to it – 16million per Distributed vSwitch. There’s a couple of requirements to consider – firstly if your upgrading from vSphereN to vSphere5.1 you will need to “upgrade” Distributed vSwitch to support the feature as well as upgrading your vShield environment as well. During the configuration of VXLAN from vSphere Client (currently you cannot configure it via the web-client, its one of the few new features that isn’t yet exposed in the web-client) you will need to set the MTU (1600 or higher) and what type of teaming your using – (Failover only; Static EtherChannel, LACP in either Active or Passive Mode). You also need to set which VLAN is used for the configuration for the VXLAN as well. So from one VLAN with one Distributed vSwitch – as many as 16m networks can be generated. How’s that for scalability in da cloud?

Some Thoughts:

So its possible to have just one jumbo network pool – that is used by many Organizations and their respective Organization vDCs. This would have to be the simplest configuration – single pool of networks used by every organization. It doesn’t strike me as a very “controlled” way to control the resource. It would be like having one jumbo datastore that every Organization dumped its VMs and vApps into… [Ed. Mike isn’t this what you have precisely done with the storage?]

What would make sense to me is if at least one Org had least one network pool – that way we could monitor the consumption of the network IDs used by each Organization – and stop one Organization eating up all the networks from the whole of vCloud Director at the expense of the other tenants.

We could go even further and create a network pool for each vDC within each Organization – so there’d be a Production vDC network pool, and Test/Dev Network Pool. This would allow for different allocations of networks to each vDC and also different methods. So perhaps a Production vDC would have network pool backed range of VLANs, and the Test/Dev vDC would be backed by method that resulted in much less VLANs being consumed such as the vCD-NI or VXLAN method. That’s not attempt to regulate vCD-NI/VXLAN to being some “bleeding edge” technology only suitable for a test/dev environment, but more of a recognition that the requirements of these methods places “demands” on the network team and physical layer that might harder to gain approval for. Additionally, I see Test/Dev environments as generally being more “dynamic” with vApps being created/destroyed in quick succession or perhaps not evening “living” very long – compared to production environments where a routable and more static environment is more common. My hope is that overtime as confidence grows in software-defined method of networking we might see the decline of the “VLAN everything” approach to networking…

After writing the above I started to think how managed “network pools” and compare them to how I managed “storage pools”. Yes, I know in vCD the term “storage pools” doesn’t exist. But you might begin to get the picture if I elaborate. With network pools I decided that a pool per Organization was a good practice – perhaps even a pool per Organizational vDC. But it occurred to me last night that I did NOT do that with my storage. With my storage I carved it up very simply and offered it up to the vSphere Clusters, and in turn I offered that up to my Provider vDCs pretty much directly. What I didn’t do is create specific LUNs for each Organization.

Perhaps that’s because vCD doesn’t lend itself to this configuration. The Provider vDC “owns” the computer, external networks and storage. It’s only the Organization and vApp Networks that get their resources from the “network pools”. So it lends itself more naturally to being partition in the way I’ve described. That got me thinking about how well I’d be able to monitor the storage consumption of each Organization or tenant. And if I didn’t want to expose all storage to all Organization would there be away to do that? Would it be desirable or not? The way I see it the only way to control what storage is visible to the Organization is to have Provider vDC for each one – and that hardly seems viable or fitting with the cloud.

I guess what I’m getting at here is the way manage pools of storage, is quite different from network pools. Although we can apply the concept of a “pool o’ resource” to both storage/network they are not consummed in the same way, and they are not the same type of resource. After all whether you use VLAN, vCD-NI, VXLAN or portgroup backed network pools – none of them have any impact on performance…