September 2

VMworld 2013: What’s new in vSphere5.5: Networking

Well, I’m back in the UK after my trip to the USofA last week for the 10th annual VMworld Fest. I was rushed off my feet last week, so I wasn’t able to write up all of my “What’s New” content. Today’s post is all about networking in vSphere. As ever there are big and small improvements to the overall networking stack. So firstly, vSphere5.5 ships with support for 40Gps networking interface. Support is limited to Mellanox ConnectX3 VPI Adapters, but I’m sure others will follow on the HCL in due course.

Improved LACP

LACP support has been about for sometime in VMware’s DvSwitches, but previously only one LACP configuration was allowed per DvSwitch. That meant you needed to setup more than one DvSwitch (and have the pNICs to do that) if you wanted more. vSphere5.5 introduces support for LACP Link Aggregation Groups (LAG – an unfortunate industry acronymn, after all who wants lags in their networking!) You can have upto 64 LAGs per ESX hosts, and 64 per DvSwitch (1xDvSwitch with 64 LAGs, or 64xDvSwitch with 1 LAG or any combination therein). It now supports upto 22 different algorithms for load-balancing (not just IP hash).

Some Grabs!

In this example LACP deployment there’s two portgroups which are associated with two LAGs backed by one DvSwitch – each portgroup/LAG is backed by two different pNICs.

vpshere-niclb-lacp-lags

Configuration is a two setup process. First add the LAGs, and then assign to a portgroup…vsphere-lags-step1-definelags

vsphere-lags-step2-assigntoportgroups

New Packet Capture Tool

vSphere5.5 will introduce a new packet capturing tool. In previous editions of ESX there was fully-fledge console based on RHEL, and folks would use tools such as tcpdump to capture packets. We used to use this in train courses to show how security could be weakened on the vSwitch to allow promiscuous mode captures. Don’t forget that a DvSwitch supports Netflow and Port Mirroring methods of gathering network information as well. The new utility can capture at three different levels the vmnic, the vSwitch, and the DvUplink.

Port Security via Traffic Filtering

This is sometimes referred to as ACL on physical switches. What it allows you to do is drop packets based on ethernet header information such that traffic that matches the rule never even gets to leave the switch. You can drop based on source/destination MAC address, IP TCP Port, or source/destination IP Address. It’s even possible to drop internal vmkernal communications such as vMotion, although the usage case for doing so has yet to be found. The traffic can be dropped whether its ingress or egress.

QoS Tagging for end-to-end SLA

vSphere5.5 introduces support for Differential Service Code Point (DSCP) marking or tagging. DvSwitches have support for sometime vSphere Network IO Control (NIOC), and for many customers this per-portgroup priority/bandwidth allocations fit their needs. However, they only go so far – and so its been determined to offer a greater level of granularity which allows of tagging of traffic based on the type. For example we can tag and prioritise traffic to a database server based on HTTP or HTTPs. A combination of NIOC and Tagging should allow customers to meet their SLA needs. DSCP Tagging works at L3, and adds 6-bits to define the type of traffic supporting up-to 64 different traffic classes. Whether folks will actually configure this feature will vary upon need, but its likely to keep the network teams happy from a “tick box” perspective. As ever virtualization admins often experience resistance to change, merely because networking guys expect a virtual switch to have ALL the features and functionality of a physical switch.

 vsphere-network-tags-forQoA

Category: VMworld, vSphere | Comments Off on VMworld 2013: What’s new in vSphere5.5: Networking
August 29

VMworld 2013: What’s New in vSphere 5.5 – The Virtual Machine – 62TB VMDK

vmware_monster_vm

Executive Summary:

This became a bit of monster post. But here’s the keypoints in a nutshell.

  • VMware vSphere5.5 supports 62TB disks – it’s not 64TB because we reserve 2TB of disks space for snapshots etc.
  • To take <2TB disk beyond the >2TB range you do have to currently power off the VM. However, you can make the virtual disks bigger without an special conversion process that involves a length file copy process – unlike certain vendor up in Seattle 🙂
  • Disks with MBR need “converting” to GPT to go beyond the 2TB boundary. Currently, Windows 2008/2012 have no native tools to do this, however 3rd party tools do exist. I’ve used them, and they work!
  • Even once the VMDK in VMware, and the DISK in Windows is 62TB, you may still face issue of “allocation unit” or small “cluster” size within the NTFS PARTITION. NTFS defaults to 4K “allocation unit” size by default on smaller partitions, but to access a 62TB partition you need a “allocation unit” size of 16K or higher. So you mileage will ultimately vary based on the limits/features of the file system/Guest Operating System
  • These challenges go somewhere to explain why Windows 2012 HyperV has an add-disk, and copy from old-disk to new-disk process which takes time based on the volume of data, and speed of storage – and also requires the VM to power off…

Continue reading

Category: vSphere | Comments Off on VMworld 2013: What’s New in vSphere 5.5 – The Virtual Machine – 62TB VMDK
August 28

VMworld 2013: What’s new in vSphere Replication 5.5

The extension of vSphere Replication into the main vSphere SKU last year with the release of 5.1 must have been one of the most welcome of changes – and I was overwhelmed with people expressing an interest and asking questions about the technology. I hope it will also spur people to have a great interest in the subject of DR Automation generally – VMware Site Recovery Manager remains a product close to my hear for obvious reasons.

If you want more detail than this blogpost provides – VMworld TV recorded Lee Dilworth & Ken Wenneberg’s session at VMworld 2013 recently:

There are couple of key enhancement to VR in the 5.5 release. Critically, VR.55 adds the ability to deploy new appliances to allow for replication between clusters and non-shared storage deployments. With storage DRS Interoperability – which allows for replicated VMs to be storage vMotioned across datastores with no interruption to ongoing replication. That’s something that’s been a pain point for SRM users for sometime, and challenges remain around SvMotion with array-based replication (ABR), but at least VMware are making in roads into the problem with VR. Of course the UI has been substantially improved – but fair the introduction of multiple points in time support allows administrators to recover to a previous snapshots. That’s something that folks have been asking for a while.

The VR ships as zip file containing two VMs – a core VR appliance and a “add-on” VR Appliance increase you need additional VR servers in your environment. These additional VR appliances allow you to configure replication to upto a maximum of 10 other target locations. Of course there’s nothing stopping having vSphere Replication within a site, to allow for the quick recovery of VMs rather than resorting to backup.

Once VR has been imported it should register itself with the vSphere Web Client, and the Manage button will lead you to create your first site definition if you have a multiple vCenter configuration.

Configuring Replication to Another Site:

Screen Shot 2013-04-29 at 15.26.53

Screen Shot 2013-04-29 at 15.28.04

If you just want to protect a VM within vCenter – by replicating it from one datastore to another, you merely find the VM you want to protect -and select “Configure Replication”

Enabling Replication on a VM:

1. Right-click a VM(s) and select All vSphere Replication Actions, and Configure Replication.

Screen Shot 2013-04-29 at 15.35.38

2. In the wizard you can select your own site or another vCenter that you added earlier. If you select your own site, then replication remains within your site.

Screen Shot 2013-04-29 at 15.37.22

3. It is possible to have multiple VR appliances – and have vSphere select the VR that will handle the replication – or manually assign it yourself.

Screen Shot 2013-04-29 at 15.37.37

4. Next select a destination for the replication

Screen Shot 2013-04-29 at 15.38.41

5. Indicate if you want to use the Microsoft Shadow Copy Services to have a quiesced replication that is consistent.

Screen Shot 2013-04-29 at 15.38.56

6. Finally, and this is where it get’s interesting – define your recovery settings. It’s in here you can see the “point in time” instances. Enabling this option allows you to control the number of point in time referrences you retain. So if you keep 3 instance for the next 5-days – you end up with 15 undo levels. Of course its entirely possible that you might set an RPO where you wouldn’t take 3 snapshots/cycles per day. So it would be impossible to only replicate once every 12hrs and 4 snapshots per day… Hence the dialog box states clearly “You may need to adjust the RPO to achieve the desired number of instance per day…”. Also notice that the maximum number point in time instances maintained is 24 – and wanted a weeks worth of roll-back you would need to adjust the keep to be within those parameters – so you’d be allowed 3 instances in a 7-day period (3×7=21 don’t you just love my helpful maths!).

Screen Shot 2013-04-29 at 15.39.13

Category: vSphere | Comments Off on VMworld 2013: What’s new in vSphere Replication 5.5
August 27

VMworld 2013: What’s New in vCenter 5.5

I guess there’s going to be a bit of “mea culpa” to had around vCenter and SSO. I guess we were little caught out on this feature. Hopefully that’s been addressed with updates and plenty (some too many) KB articles. It’s hoped that SSO in vSphere 5.5 goes along way to repairing that trust relationship we have with customers that VMware technologies just work. For my own part I had a rather pleasant vSphere 5.1 roll-out. I started with a blank slate. Deployed the vCSA where SSO. The new version of SSO should support multi-domain, multi-forest configurations with ease.

vCenter also offers improvements on the connections to the DB backend. Firstly, support for clustered DB backends with Oracle and MS-SQL has been re-introduced. And for Microsoft-SQL there’s now support for Secure connectivity to MS-SQL together with Windows Authentication.

With the vCenter Server Appliance the internal Postgres database now offers vastly improved scalability for up to 100 hosts and 3,000 VMs. It’s hoped this improved scalability will entice more customers to consider a move away from the Ye Old Windows vCenter. With that said, the vCSA still only supports Oracle as an external database which I know will be a concern for customers. But its felt that the new scalability of the Postgres database means the demand for an external database may decline. I remember back in my instructor days people moaning on about needing Microsoft SQL and licenses…

The web-client get a bit of an overhaul. I know some folks are still using the Ye Old vSphere Client, and admittedly I’ve noticed in vSphere5.1 those “gears” did take sometime to turn before a refresh or a menu opened. In my experience of the beta I found the web-client performance vastly improved with the wait time for opening menus or tabs so quick, I had to take a video to capture the gears for a recent blogpost! As Mac user I’m pleased to hear the web-client fully supports OSX. Previously the plug-in to the web-client was Windows only – and that mean I need to run a Window instance in VMware Fusion on my Mac. With the new web-client I won’t need to do that – and the lost functionality (VM Console, Deploy OVF Templates and Client Devices) is now fully available.

Finally, there are improvements to the UI including the ability to do drag & drop, see a list of recently used items to speed up navigation – and filters to clear views down to just the items you want to see.

vsphereplatvorm-filters-in-vcenter

Category: VMworld, vSphere | Comments Off on VMworld 2013: What’s New in vCenter 5.5
August 26

VMworld 2013: What’s New in 5.5 – vSAN

Yes, I know it sounds a bit weird to have a “what’s new” post on a new product – but in effort to keep these posts together it seemed to make sense. Beside which this post is more than just a round up of futures – and more a decision about what vSAN is, what is capable of, and what is not capable of…

vSAN is brand new product from VMware, although it got its first tech preview back in VMworld last year. That’s why I always think if your attending VMworld you should always search for and attend the “Tech Preview” session. We tend not to crow on about futures stuff outside of a product roadmap and a NDA session – so the Tech Previews are useful for those people outside of that process to get a feel for the blue-sky future.

So what is vSAN? Well, it addresses a long time challenge of all virtualization projects – how to get the right type of storage to allow for advanced features such as DRS/HA; deliver the right IOPS – at the right price point. In the early days of VMware (in the now Jurassic period of ESX2.x/2003/4) the only option was FC-SAN. That perhaps wasn’t a big ask for early adopters of virtualization in the Corporate domain, but it rather excluded medium/small business. Thankfully, Virtual Infrastructure 3.x introduced support for both NFS and iSCSI, and those customers were able to source storage that was more competitive. However, even with those enhancements its still left businesses with storage challenges dependent on the application. How to deliver cost-effective storage to Test/Dev or VDI projects, whilst keeping the price point low. Of course, you could always buy an entry-level array to keep the costs down, but would it offer the performance required? In recent years we’ve seen a host of new appliance lead start-ups (Nutanix, Simplivicity, and Pivot5) offer bundles of hardware with combos of local storage (both HDD; SDD and in some case FusionIO cards) in effort to bring the IOPS back to the PCI bus, and allow the use of commodity based hardware. You could say that VMware vSAN is a software version of this approach. So there’s a definite Y-in the road when it comes to this model – do you buy into a physical appliance or do you “roll-your-own” and stick with your existing hardware supplier?

You could say vSAN and it competitors are attempts to deliver “software-defined storage”. I’ve always felt a bit ambivalent about the SDS acronym. Why? Well, because every storage vendor I’ve met since 2003 has said to me “Were not really hardware vendors, what were really selling you is software”. Perhaps I’m naïve and gullible, and have too readily accepted this at face value, I don’t know. I’m no storage guru after all. In fact I’m not a guru in anything really. But I see vSAN as an attempt to get away from the old storage constructs that I started to learn more and more about in 2003 with me learning VMware for the first time. So with vSAN (and technologies like TinTri) there are no “LUNs” or “Volumes” to manage, mask, present and zone. vSAN present a single datastore to all the members of the cluster. And the idea of using something like Storage vMotion to move VMs around to free up space or improve their IOPS (by moving a VM to a big or fast datastore) is large irrelevant. That’s not to say Storage vMotion is a dead-in-the-water feature. After all you may still want to move VMs from legacy storage arrays to vSAN, or move a test/dev VM from vSAN to your state-of-the art storage arrays.  As an aside it’s worth saying that Storage vMotion from vSAN-to-Array would be slightly quicker, than from Array-to-vSAN. That’s because the architecture of vSAN is so different than conventional shared storage.

vSAN has a number of hardware requirements – you need at least 1 SSD drive, and you cannot use the ESX boot disk as a datastore. I imagine a lot of homelabbers will chose to boot from USB to free up a local HHD. You need not buy an SSD drive to make vSAN run on your home rig. You might have noticed both William Lam and Duncan Epping have shown ways of fooling ESX into thinking a HDD drive is SSD based. Of course, if you want to enjoy the performance that vSAN delivers you will need the real deal. The SSD portion of vSAN is used to purely address the IOPS demands – it acts as cache only storage layer. With data written to disk first, before it’s cached to improve performance, and reduce the SDD component as single point of failure.

I don’t want to use this blogpost to explain how to setup vSAN or configure it. But to highlight some of the design requirements and gotchas associated it with it. So with that lets start with use cases. What is good for, and what is not good for.

vSAN Use Cases

vsan

Because vSAN uses commodity based hardware (local storage) one place where vSAN sings is in the area of virtual desktops. There’s been a lot of progress in storage over the last couple of years to reduce the performance and cost penalty of virtual desktops both from the regular storage players (EMC, NetApp, Dell and so on) as well as host of SSD or hybrid storage start-ups (TinTri, Nimble, PureStorage, WhipTail etc). All the storage vendors have cottoned on that the biggest challenge of virtual desktops (apart from having quality images and having good application delivery story!) is storage. They’ve successfully changed that penalty into an opportunity to sell more storage. I’ve often felt that a lot engineering dollars and time has been thrown at this problem, which is largely by design. Even before VDI took off, storage has been systematic/endemic issue. The hope is that new architecture will allow genuine economies of scale. I’m not alone in this view. In fact even avowed VDI sceptics are becoming VDI-converts (well, kind of. Folks do like a good news headline that drives traffic to their blogs don’t they? I guess that’s the media for you. Never let the truth get in the way of a good story, eh?)

The second big area is test/dev. The accepted wisdom is that test/dev does need high-end performance. Good enough will do, and we should save our hardware dollars for products. There is some merit in this, but its also the case that developers are no less demanding as consumers, and there are some test/dev environments that experience more disk IOPS churn than production platforms. There’s also a confidence factor – folks who experience poor responsiveness in a test/dev environment are likely to express scepticism about that platform in production. Finally, there’s ever-present public cloud challenges. Developers turn to the public cloud because enterprise platforms using shared storage – require more due diligence when it comes to the provisioning process. Imagine a situation where developers are silo’d in a sandbox using commodity storage miles away from your enterprise-class storage arrays, demarcated for production use? The goal of vSAN is to generate almost a 96% cache hit-rate. That means 96% of the time the reads are coming off solid-state drives with no moving parts.

Finally, there’s DR. vSAN is fully compatibly with VMware’s vSphere Replication (in truth VR sits so high in the stack it has no clue what the underlying storage platform – it will replicate VMs from one type of storage (FC) to another (NFS) without care).  So your DR location could be commodity-based servers using commodity-based hardware.

So its all brilliant and wonderful, and evangelist like me will be able to stare into people foreheads for the next couple years – brainwashing people that VMware is perfect, and vSAN is an out-of-the-box solution with no best practises of gotchas to think of. Right? Erm, not quite. Like any tech vSAN comes with some settings you may or may not need to change. In most cases you won’t want to change these settings – if you do – make sure your fully aware of the consequences….

Important vSAN Settings

VSANpolicies

Firstly, there’s a setting that controls the “Read Cache Reservation”. This is turned on by default, and the vSAN schedule will take care of what’s called the “Fair Cache Allocation”. By default vSAN makes a reservation on the SSD – 30% for reads, and the rest for writes. The algorithms behind vSAN are written to expect this distribution. Changing this reservation is possible, but it can include files that have nothing to do with the VM – such as the vmx file, log files and so on. The reservation is set per-VM, and when changed it includes all the files that make up a VM. Ask your self this question – do you really want to cache log files, and waste valuable SSD space as consequence? So you should really know they IO profile of a VM before tinkering with this setting. Although the option is there, I suspect many people are best advised to leave it alone.

Secondly, there’s a setting called “Space Reservation”. The default is that is set to 0, and as a consequence all the virtual disks provisioned on the vSAN datastore are set to be thinly-provisioned. The important thing to note is from a vSAN perspective virtual disk formats a largely irrelevant – unless the application requires them (remember guest clustering and VMware Fault Tolerance require the eagerzeroedthick format). There’s absolutely no performance benefit to using thick disks. That’s mainly because of the use of SSD drives, but also it’s a grossly wasteful use of precious SSD capacity. What’s the point of zeroing out blocks on a SSD drive, unless you a fan of burning money?

In fairness you might be sort of shop that isn’t a fan of monitoring disk capacity, and your paranoid about massively over-committing your storage. At the back of your mind you picture Wild E Coyote from the cartoons – running of the end of a cliff. My view is if your not monitoring your storage, the whole thin/thick debate is largely superfluous. The scariest thing is your not monitoring your free space! You must be really tired from all those sleeplessness nights your having worrying if your VMs are about to fill up a datastore!

Finally, there’s a setting called “Force Provisioning”. At the heart of vSAN are storage policies. These control the number of failures tolerated, and the settings I’ve discussed above. What happens if a provisioning request to create a new VM is made, but it can’t be matched by the storage policy? Should it fail, or should it be allowed to continue regardless? Ostensibly this setting is there for VDI environments where a large number of maintenance tasks (refresh and recompose) or the deployment of a new desktop pool could unintentionally generate a burst of storage IOPS. There are situations where the storage policy settings mean that these tasks would not be allowed to proceed. So see it as a failback position. It allows you to complete your management tasks, and once the tasks has completed vSAN would respond to the decline in disk IOPS.

Gotchas & Best Practises

Is vSAN ready for Tier 1 Production Applications? As ever with VMware technologies – so long as you stay within the parameters and definition of the design you should be fine. Stray out of those, and start using it as a wrench to drive home a nail, you could experience unexpected outcomes. First of all usage cases – although big data is listed on the graphics – I don’t think VMware is really expecting customers to run Tier 1 applications in production on vSAN. Its fine for Oracle, SAP and Exchange on vSAN within the context of a test/dev environment – and of course, our long time goal is to do precisely that. But you must remember that vSAN is a 1.0 release, and Rome wasn’t built in a day.  Whenever I’ve seen customers come a cropper with VMware Technologies (or any technology for that matter) is when they take something that was designed for Y, and stretch it to do X. Oddly enough when you take an elastic band and use it to lift a bowling ball it has a tendency to snap under the load….

Don’t Make a SAN out of vSAN: The other thing that came out of beta testing was a misunderstanding in the way customers design their vSAN implementation. Despite the name, vSAN isn’t a SAN. It’s a distributed storage system, and designed as such. So what’s a bad idea is this: Buying a number of monster servers, packing them with SSD – and dedicating them to this task – and then presenting the storage to a bunch of diskless ESX hosts. In other words building a conventional SAN out of vSAN. Perhaps vSAN isn’t the best of names, but if I remember the original name was VMware Distributed Storage. I guess vSAN is catchier as product name than vDS! Now it maybe in the future that is a direction vSAN could (not will) take, but at the moment this is not a good idea.  vSAN is designed to be distributed with a small amount of SSD used as cache, and large amount of HDD as conventional capacity based hardware. It’s also been designed for HHD’s that excels in storage capacity, rather than spindle speeds – so its 7K disks, not 15K disks for which its been optimized. So a vSAN made only from SSD won’t give you the performance improvements you expect – but it will give you an invoice that will makes you wince!

VMware HA. Once again a new innovation from VMware has necessitated an overhaul in VMware’s clustering technology. VM resides on an ESX host, and so does its files. The whole point is keeping the resources of the VM close to each other – memory, CPU, network and now disk are all within the form-factor of a server or blade. But what if a server dies? What then? If an ESX host fails is put into maintenance mode then that will trigger either graceful evacuation of the host or disgraceful one. When the host comes back online it not only re-joins the HA/DRS cluster it also re-joins the vSAN as a member. Now if it’s a maintenance mode then a rebuild begins. In the beta this was delayed for 30mins, but under testing it has been extended for an hour. This is to avoid spurious rebuilds that were not required – by rebuild we mean the metadata/data that backs an individual node that has been down for a period receives delta updates. I guess the analogy would be if you shutdown a Microsoft Active Directory Domain Controller for an hour or so, when it came back up it would trigger a directory services synch. The important thing from a virtualization perspective is we want the ESX host to complete this synch successfully before DRS starts repopulating the host with VMs. Now think about that for second. After a reboot (for what ever reason) an ESX hosts now takes 1hr before its joins the cluster. Therefore you may need to factor in additional ESX host resources to cover this period. The operative word is “may”, not “must”. I think much depends on the spare capacity you have left over once a server is unavailable due to an outage or maintenance. The situation is a bit different if the problem is detected as a component failure. If there is a disk failure or read error then vSAN doesn’t wait an hour. The rebuild process begins immediately.

When is commodity hardware, commodity hardware? This is one for labbers who might want to run vSAN at home.  But it could be relevant to a vSAN configured at work too. I’ve been looking into moving back to a home lab. That means buying commodity hardware. Right now I’m very attracted to the HP ML350e series. It supports a truckload of RAM (196GB Max with two CPUs), although its big and expensive compared to white boxes, and Shuttle XPC. The reseller offered me a choice of disks. The hot-pluggable ones from HP are proprietary and pricey. The more generic SATA drives are not hot pluggable, and are much cheaper. For my home I know what I will be choosing. The other thing I need to think about is what capacity and ratio of HDD and SSD I need for my lab. There could be tendency of over spend. After all my plan calls for the use of Synology NAS which I hope to pack with high-capacity SSD.  Although I want to use vSAN I can imagine in my very volatile lab environment (where its built and destroyed very frequently) having my “core” VMs (domain controller, MS-SQL, View, Management Virtual Desktops) on external storage might give me peace of mind should I do something stupid….

Category: VMworld, vSphere | Comments Off on VMworld 2013: What’s New in 5.5 – vSAN
August 26

VMworld 2013: What’s New in ESX 5.5

So I guess your beginning to detect a theme in my recent posts. Please bear with me, normal service will be resumed shortly.

You may not be surprised to hear that the configuration maximums in ESX have gone up even further. So you can now have upto:

  • 320 Logical CPUs
  • 4TB of RAM
  • 16 NUMA
  • 4086 vCPUs

Per ESX hosts. That’s more or less a doubling of capacity on ESX 5.0/5.1. Now I doubt many folks will actually configure such as system, mainly because to fill a physical box with that much memory is cost prohibitive most people. But we have seen the “standard” for how much memory an ESX host has grow as physical boxes get beefier, and memory prices go down. So I remember when 32GB/64GB was the sweet spot. I guess now its more the 96-128GB range. So there’s a bit of future proofing here, but also making sure that if any other virtualization vendor wants to get into a pissing contest we can more than deal with that scenario. 🙂

There’s also new ESX host features such as support for hot-plug of SSD drives – and the ability to leverage the new physical memory that supports “reliable memory” information. It means the ESX host can pick up information from the RAM chips about portions of the memory that marked as being “reliable”. It will then make resident parts of ESX that critical to its uptime such as the VMkernel itself, user worlds, init threads, hostd and watchdog processes. It should mean the chances of a PSOD due to bad blocks of memory are minimized. I guess the days of burning in your RAM with memtest tools are increasingly unfeasible due the quantity of memory we now have, and the time it takes to do a couple of passes.

Category: VMworld, vSphere | Comments Off on VMworld 2013: What’s New in ESX 5.5
August 25

VMworld 2013: What’s New in vSphere 5.5: Virtual Machines

Note: Sorry for the poor quality of graphics in my post. Some of them were taken from videos and powerpoints (so I could get this content to you quickly), and not from a live system. I will probably set a reminder to myself to update them once I’ve got my paws on the GA release.

vSphere 5.5 introduces a new virtual compatibility version that is now 10. This is sometimes abbreviated to vHW 10.  You might recall we dispensed with “Virtual Hardware Level” values – for something a bit more user-friendly which is compatibility levels. It allows us to express compatibilities based on both hardware and VMware Tools levels as single entity.

62TB Virtual Disks (yes, 62TB, not 64TB!)

vHW 10 introduces support for 62TB virtual disks. That’s something folks have been wanting VMware to do for sometime. Up until now only physical mode RDM’s supported 64TB volumes/LUNs. That’s sounds okay, until you remember that there are some products that we have that are incompatible with RDMs such as vCloud Director. So 62TB virtual disks are now support. Why 62TB and not 64TB. Well, we reserve 2TB of disk space for features like VM snapshots. The last thing you want to do is create a 64TB LUN, and fill it with a 64TB virtual disk, and find you had no space left for snapshots.  This release also introduces support for 62TB virtual RDMS as well.

Now there are some limitations. We currently don’t support the extension of existing <2TB into the >2TB range whilst the VM is powered on. The important point is that there is no onerous conversion process required to get to the 62TB virtual disk unlike some other foreign virtualization vendor who remain unmentioned. J There are however, some requirements and incompatiablities – some of them are not within VMware’s control, and some are. Here’s a quit hit list.

1. BusLogic Controllers (commonly used by default by Windows NT/2000) are not supported

2. The partition(s) within the disk need to be GPT not MBR formatted. There are tools that will convert this for you, but not for boot disks. Also beware of using small cluster sizes on partitions. Anything <16K will mean you won’t be able to take a partition and increase it to the maximum 62TB size. So in a nutshell there’s some guest operating system limits.

2. vSAN is not supported

3. VMware FT is not supported

4. You must use the new web-client, as the Ye Olde vSphere C# client will spit back errors.

vsphereclient-64tb-must-use-the-web-client

Stay tuned to the blog, as I have a much longer blog post on 62TB support. But I need to verify it against the GA release before I click the publish button.

New SATA Controller – Advanced Controller Interface (AHCI)

vHW 10 introduces a new device type called the AHCI. This SATA controller allows for up 30 devices per controller, and we support 4 of them. That’s a max of 120 devices (just in case you need to take out a calculator to work out 4×30). Compatibility is pretty good – and there’s still an IDE controller I needed. One anomaly is if your running Mac OS X on ESX (now there’s a popular configuration!) which requires a CDROM on the AHCI, because Apple dropped support for IDE some years ago.

vsphereplatvorm-sata-disk

GPU Support:

I’m embarrassed to say I don’t know much about GPU support. Mainly because my ESX hosts are running on Jurrasic hardware and none of my GPU’s are supported. I’m also embarrassed to say I didn’t delve into it in my last outing with Horizon View.  The good news is the hardware support is improving not just NVIDA cards, but also Intel/AMD as well. If you have looked at this GPU has three modes – manual, hardware and automatic. Automatic seems the way to go, as it tries the hardware-assisted method first, but fails back to software mode if the GPU isn’t support. A hardware mode seems risky to me, as this can break vMotion if the destination host doesn’t have the supported hardware.  The other interesting news is we now support Linux device drivers as well. That rather sets us apart as the only virtualization vendor with a complete set of enhanced drivers for Linux.

Category: VMworld, vSphere | Comments Off on VMworld 2013: What’s New in vSphere 5.5: Virtual Machines
July 12

vCenter Server Appliance: Recovering from out of diskspace…

Houston, we have  a problem

This week I had a major problem. My vCenter Server Appliance just stopped working for apparent reason. I couldn’t login via the web or vSphere client. All I was getting was an error about not being able to connect to the lookup service, that would appear after the logon attempt stating “Failed to connect to VMware Lookup Service – https://vcnyc.corp..com:7444/lookupservice/sdk.

Screen Shot 2013-07-06 at 05.08.52

Things started to look worrying when I was able to sniff around the appliance management web-page (the one on port 5480). Everything looked to be running, there was just one thing the DB was 100%

Screen Shot 2013-07-06 at 04.48.33

In my experience something at utilised to 100% is generally a problem, especially when it comes to storage. In my case I was using a local PostGres database which is stored on the appliance itself – and it had run out of space. I’m prepared to admit that might have been my fault. vCSA scales to some 5 ESX hosts, and some 50 VMs. But there’s been times I’ve had 9 hosts. I can’t say honestly either way if have had more than 50 VMs (although right now I have nearly 60 VMs and Templates) I doubt it. I don’t have enough RAM for that! It also turns out that could have dialed down the retention of data in the database to keep it skinny.

Sure enough a quick google, I was able to find folks in the vCommunity who had the same experience. I decided to open a internal bugzilla ticket with our support folks, and my worst fears were confirmed. There’s a couple of resolutions to this problem:

  1. Hit the big reset button and zap the database…
  2. Create a new disk, partition and copy the DB files to the new location…
  3. Increase the size of the VMDK, and use a re-partitioning tool to increase the disk space

Clearly, (1) is not a terrifically good option – I’ve got vCD, vCAC, vDP, VR, vCC and View all in some way configured for the vCenter. Zapping the database would pull the rug under the configuration of all these other components. Option (2) seems a bit of faff to me. And option (3) is what I would do if I was experiencing this issue on any other system.

Houston, we have a solution

The first job was to identify in which disk and which partition the database is located. I worked that from interpreting some stuff the support guys had asked me to do. They’d got me to run:

du -h /storage/db/

Using the mount command I worked out that the DB was on /dev/sdb3. The second disk (B),third partition(3):

Screen Shot 2013-07-06 at 05.12.18

To increase the 2nd VMDK, after that I decided to snapshot the VM,and then use a tool like qparted to increase /dev/sdb3. If something went wrong I could revert the VM and try option (2). The order of this important. You cannot snapshot a VM, and then increase the size of the disk. But you can increase the size of the disk, snapshot – and then repartition. The snapshot will protect you from the situation where the repartition goes belly-up.

TIP: I recommend gracefully shutdown the vCenter in the situation. It has 8GB memory allocation so taking a snapshot whilst powered on means creating a memory file of 8GB. The vCenter is inaccessible, and if you using qparted you have boot from DVD .ISO to get exclusive access to the disk anyway. So that would be:

1. Shutdown

2. Increase disk size

3. Snapshot

4. Attach DVD .iso containing qparted

5. Use qparted to resize the disk.

That first step was easy, all I need to was increase a spinner. How much by? Well, the VMDK is thin provisioned, and on volume with 1379GB of free space. So decided to crank it up from 60GB to 200GB.

Screen Shot 2013-07-06 at 07.34.13

6. Next we boot to the DVD. gparted is a very simple utility – switch to the disk (sda to sdb), right-click the partition – select resize – drag the partition to take up the new free space, click OK and Click Apply…

Screen Shot 2013-07-06 at 07.32.40

7. A reboot of the appliance, bought the vCenter up again, and the DB had enough space…

Screen Shot 2013-07-07 at 10.54.49

Houston how can we stop this happening again…

I think there’s a couple of ways to avoid this:

  • Don’t use the embedded DB with the vCSA, and always use an external DB where it maybe easier to monitor and manage the disk space… The difficulty here is the only external DB supported by the vCSA is Oracle. So whilst a home lab might get away with OracleXE – that’s something that’s probably not supported in production, and price option as well. 
  • Wait for a bug fix from VMware. Apparently, this is known issue with the embedded DB and it will be resolved in future releases
  • Personal Recommend: I would recommend increasing the disk space, and change your retention settings to make sure the DB is being purged of stale data…

Database Retention Settings and Statistics Intervals.

 

Category: vSphere | Comments Off on vCenter Server Appliance: Recovering from out of diskspace…
May 16

vINCEPTION: Nested Windows 2012 HyperV on vSphere5.1

Acknowledgement:

I’d like to acknowledge Ricky El-Qasem’s blogpost on the Veeam Blog. This was written in Sept, 2011 ago as Ricky has moved on to work for canopy-cloud.com

IMPORTANT:
Of course all this is just for lab testing purposes, remember vINCEPTION or Nesting as it is more commonly known – is unsupported…

One of the thing I want to do is running Windows 2012 HyperV inside a VMware vSphere 5.1 environment. There’s couple of reasons why:

Firstly, my big project at the moment is learning vCloud Automation Center or vCAC. There’s a couple of ways round pronouncing the acronym. Some people call it VC-AC (think of how you would say AC-DC) and others I’ve heard it pronounce CAC as “cake”. So take your pick. As you might know vCAC is to some degree hypervisor and cloud agnostic – although personally I think your better of using vSphere (but hell, I would say that wouldn’t I?). So I want to learn how to setup ALL the provisioning resource types – as you can’t really go about deploying VMs/vApps without any resources to point to (although apparently with vCAC there is a way of spoofing it to think it is… which I think might be rather interesting to investigate at a later date)

Secondly, we recently released “Multi-Hypervisor Management” (MHM for short, shouldn’t that be vMHM?) so I wanted to setup a temporary nested Windows 2012 HyperV environment to test that as well.

Thirdly, it’s occurred to me that as VMware’s Senior Cloud Infrastructure Evangelist (you gotta love these US job titles!) I can’t really do that in a VMware Bubble. So its time to start looking over the fence at what others are doing – because only by seeing what these systems are like for real, and can a truly see what differentiates VMware racehorse from the also-rans. It makes sense to start with Microsoft’s offering because lets face it there are biggest Elephant in the room. But once I’m done with them, I do want to look at other vendors too such as OpenStack(s).

Of course my focus has to be VMware for reasons which I think are patently obvious – and so for that matter I didn’t want to dedicate hardware in my lab to running them. Nested seems to be there way to go, and that’s what I’m doing for internal builds of vSphere whereas in the physical world – I’m running the latest and greatest GA code. I won’t be able to draw any conclusions from a performance perspective because it will be nested, not native. My interest is really in product functionality and integration.

Now down to business. If you try to install Windows 2012 Server OS and then enable the HyperV role the GUI or via PowerCLI with the proper settings you likely to get this error message. One of the complexities around Window 2012 HyperV is how even get this far. Do you install Windows with or without a GUI front-end (or Server Core as Microsoft calls it). Plus when you look at TechNet there’s even .ISO of things called “HyperV”. Is it me or is that just unnecessarily confusing – compared to dedicated hypervisor with no graphic front-end?

Screen Shot 2013-05-01 at 10.58.17

To do this with PowerCLI on Server Core you would use:

Install-WindowsFeature Hyper-V –Restart

That cmdlet would result in similiar response:

Screen Shot 2013-05-01 at 11.03.34
Thought: Hang on! It’s a “ROLE” in the GUI, and a “FEATURE” in PowerShell. Come on Microsoft make you mind up! 🙂

So to get this nesting to work we need to make some changes on the ESX hosts, and also the VM’s VMX file. Before you go any further you do realise this sort thang isn’t remotely supported. Thought so. But I thought I better say that to CYA…

STEP1: Enable VHV Allow on the ESX Host:

First you need open a console on the ESX host, and modify a text file. These way to do this is temporarily open SSH on the host, and PuTTY in. Once there’s run this command:

echo ‘vhv.allow = “TRUE” ‘ >> /etc/vmware/config

repeat and rinse for the all the remaining hosts in your cluster.

STEP2: Enable VMX Settings

Next we need to add two entries to the .VMX file called monitor.virtual_exec=hardware and hypervisor.cpuid.v0=FALSE parameters to the Nested Windows 2012 HyperV system

Screen Shot 2013-05-01 at 16.53.25

STEP3:  Ensure CPU/MMU Virtualization is engaged

Screen Shot 2013-05-01 at 16.55.29

STEP4: Add a CPU Mask:

Finally, add a CPU mask to the Level1 register on ecx

—- —- —- —- —- —- –H- —-

 Screen Shot 2013-05-01 at 16.57.08

From this point if you pCPU is of the right type be able to power on and enable the HyperV role – sadly for me my CPU still didn’t support this nested approach (although VMware ESX 5.x works perfectly fine).

Screen Shot 2013-05-01 at 18.23.27

 Conclusions:

So for me its back to the drawing board. I guess what I could do is – take 4 of my 9 servers and install Windows HyperV to two of them, and Xen to the others. It’s not ideal. My Lenovo’s at the colo don’t have the blue widget that enabled remote console access. That means scheduling a visit to the colocation facility to install to physical. Plus I didn’t really want to dedicate physical hardware to this sort of thing – just spin it on demand, and power off when I’m done…

Category: vSphere | Comments Off on vINCEPTION: Nested Windows 2012 HyperV on vSphere5.1
May 15

UDA 2.0 (Build 23) – Adds Support for Windows 2008, Windows 7/8 and Windows 2012

I’ve been doing some recent experimentation with nesting Microsoft Windows 2012 HyperV under vSphere5.1 – and that lead me to looking at the UDA as way of pushing out a scripted installation of Windows 2012. I had a quick word with Carl Thijssen and he very kindly put together a patch bundle to add support for Windows 2008, Windows 7/8 and Windows 2012.

Screen Shot 2013-05-15 at 12.15.13

If you run the UDA or intend to I’d highly recommend this patch. Although you might want to backup or snapshot your UDA first, just in case something goes astray during the upgrade itself. Pop along to the download page here or on ultimatedeployment.org/download.html to grab the new patch.

Category: vSphere | Comments Off on UDA 2.0 (Build 23) – Adds Support for Windows 2008, Windows 7/8 and Windows 2012