Reading the Runes with Runecast Analyzer
A runestone is typically a raised stone with a runic inscription, but the term can also be applied to inscriptions on boulders and on bedrock. The tradition began in the 4th century and lasted into the 12th century, but most of the runestones date from the late Viking Age. Most runestones are located in Scandinavia, but there are also scattered runestones in locations that were visited by Norsemen during the Viking Age. Runestones are often memorials to dead men. Runestones were usually brightly coloured when erected, though this is no longer evident as the colour has worn off.
This week I was fortunate to have a briefing with Stan Markov (VCDX #74 and VCI), the CEO of Runecast. In case you don’t know Runecast Analyzer is a tool that gathers info from your vSphere environment and compares it to the VMware KB, Best Practices and the Security Hardening guide. The idea is it makes you proactively act on what it discovers to reduce the time spent reactively acting to events as they happening – in that typical “firefighting manner”.
Typically, we are so busy in the IT world we tend to respond to situations as they arise, and hope that by following design best practice we reduce these events to a minimum. In recent years a number of software vendors have been developing tools to break this cycle of behavior. Despite bold attempts to “automate all the things”, you’d be surprised how many people still are using a combination of Excel spreadsheets and Googling to both keep a track of changes, or respond to new issues as VMware finds them. And, of course, those pesky things called “default settings” that often are left as is, and never reviewed.
When the poop hits the fan such admins are forced into “Cutting and Pasting” cryptic log entries into Google, in the hope that a narrowly defined string will reduce the long list of false positives – it’s become a skill in it’s own right, scrolling through search results and translating the verbiage of KB articles to see if it answers your problem. And I can speak of situations first hand where I’ve had to “stitch together” KB articles to fix an issue. It’s this sort of first-hand pain that the folks at Runecast are addressing.
I was given an NFR license for a year (thank you) and spent yesterday getting my lab environment up and running to ingest their offer. I spent most my time making the lab work again replacing my expired vSphere license! The Runecast Analyzer appliance (in a OVF format) took less time to setup, than it did to download. I pointed at it my vCenter and I was up and running.
As you might gather with the lab being down for more than a year, it’s not been patched in ages, and also I’ve never bothered with any security hardening. So my results will not be reflective of most production environments (or will it?). As you’ve probably gathered, Runecast Analyzer is an on-premises appliance, and although it pulls data down from Runecast Central Repository, which in turn keeps a track on the VMware KB, nothing is pushed out of your environment. Runecast Analyzer does support offline patch-management for those people who require an air gap between themselves and the outside world for compliance purposes.
The Runecast Analyzer appliance sits in your environments and pulls the data it needs for analysis from the Central Repository, which in turn keeps a track on VMware’s KBs. One interesting aspect of this is recognition from the folks at Runecast, is that often problems that first get noticed/detected by customers and the community. It’s not widely understood that some KB articles have started their life as SR’s raised by customers, or with the Community picking up on commonly held experiences. So, Runecast keep an eye on the blogosphere and social media too – so they can react quickly to issues as they develop. Stan was able to give me an example of this in practice. An issue with “change block tracking” (CBT) feature in vSphere 6.0 U2 was picked up by customers using backup technologies such as Veeam; Runecast reacted to this issue as it arose in the blogosphere and online forums. The specific KB article was VMware KB 2145895 and an example blogpost from Andreas Lesslhume
Runecast Analyzer currently categorizes its knowledgebase using the same four level system that VMware has adopted (Critical, Major, Medium and Low) and its possible to filter on these categories in order to triage your response.
By All Knowledge, KB, Best Practices, Security Hardening
In my lab environment I had some 16 Critical, 33 Major and 18 Medium issues. What can I say? I’m a naughty admin! There were one or two issues, to my embarrassment, that had fallen through the net. I decided to focus my attention on best practices, because with my level of experience there’s no reason why I shouldn’t be at least following those. With in the category of “Best Practices” and “Critical”, I had three fails, and three passes.
Despite being out of the loop from things VMware for a year, most of these results made sense right away. The two network warnings are to do with lowered security settings on a portgroup on a vSwitch. I think this was a deliberate configuration change by myself. Despite having plenty of CPU/RAM resources, I do run VMware ESXi in a “nested” configuration as well as on physical. I “only” have three hosts (HP ML350e Gen8’s) and occasionally need more nested ESXi hosts for lab simulations. In order to get comms working in and out of these nested ESXi you do have to lower the security to allow that to happen. Outside of this configuration its very rare to need to change these security settings – it is very rarely needed for software such as NLB or Intrusion detection systems that need lowered security settings to function. So I just knew these settings, if you don’t the little plus symbol next to the alarm both gives explanatory text as well as a hyperlink to source data:
Clicking the “Findings” tab in this pop-up shows which object that setting is on in your vCenter Inventory. In my case it was on every ESXi host on vSwitch0. In the past I’ve set up unique portgroups just for a nested configuration. Latterly I think I just got lazy and started to use the standard portgroups I create for running the lab. I’ve done this on VMKernel portgroups to present physical storage to my nested ESXI instance – this allows for such things as Storage VMotion/VMotion – as well as pretty good performance especially when the appropriate “VMware Flings” are added to boost network performance nested instances:
To avoid being alerted about this in future, I created a filter – and made a mental note that it was probably better practice if I’m going to do nesting – to go back to my previous approach which is discrete portgroups for these lowered security settings. That’s perhaps a little better than just using the “Management” portgroup as I’ve done more recently.
Applying the filter made these issues go away – which I guess is one way to deal with life’s little problems, annoyances and irritations – just ignore them! If only such an approach worked with our politicians!
The other reference was to the fact that TSM is enabled. TSM is the BusyBox based “shell” that provides a console front-end to the physical ESXI host. In a hardened environment, it should ideally be turned off, and restricted such that only the vSphere Client, command-line tools and API access is enabled. It’s a relatively trivial matter to re-enable access should you need it for out-of-band management directly at the console itself. Like a lot of home-labbers – although I do have ILO access to my hosts, I have enabled SSH and suppress the warning messages for ease of use. I’ve found the “Findings” easier to decode than the more abstract reference brought in from the VMware KB articles. But hey, that’s a KB article for you – they read more like legal statements, than something designed for ordinary mortals, which I guess is the gap in the market that Runecast is hoping to fill.
What I like about this – is despite my attempts to hide my nefarious access to SSH by suppressing the default SSH warnings – Runecast Analyzer saw through my duplicity.
As I progressed I got used to applying the filters – to show just “Major” events in terms of “Best Practice” and only showing “Fails”. From there I was able to pick out the main issues. Firstly, a couple of datastore have less than 15% capacity – fortunately these are ancillary stores – for backup, software and legacy templates. But the lack of free space on the backup store is more of an operational worry as I could easily run out of space for my backup schedule.
Additionally, Runecast Analyzer picked up on the fact I hadn’t set an alternative IP address for HA isolation. Typically, you have one management network, and it pings the router to arbitrate any split-brain scenarios, difficulties arise if you have an unstable or unreliable network. The reality is my home lab is up and down like a yo-yo, and the WIFI router is the most ‘stable’ part of my infrastructure, but nonetheless an important consideration in a production environment.
So on and on, and on, I could go on documenting each and every critical, major, medium and minor configuration settings across areas of KB, Best Practices, and Security Hardening. I would have thought that could be come quite tedious – I hope it hasn’t already!
By Verbose Dashboards
One last important feature of Runecast Analyzer is the “verbose dashboards”. This component is new to me, and I don’t recall seeing in any other technology of this ilk. It monitors the logs for certain key words and phrases that are associated with problems, and then gives you statistical analysis of their frequency across a time line. I’m not sure how useful this feature would be for trapping problems, after all that’s a function of the rest of the software, but it could be interesting for spotting trends. For instance more problems and issues being triggered at a particular time of day, week, or month could be indicative of problems that arise during peak workloads. This “Verbose Dashboard” is in addition to the KBs discovered under the “Log Analysis” part of the UI. I think it is even more important part of the product when it comes to log analysis, compared to Verbose Dashboards. KBs discovered will show any problems discovered in the logs that link to known issues in the VMware KB. Verbose Dashboards is an additional feature that provides valuable insights in the logs and allows for quick troubleshooting if the problem you are experiencing is not already related to a known issue. If it were a known issue, it would come up in the KBs discovered section.
For this feature to work you need to hit the “gears” again, and check that Runecast Analyzer is receiving the ESXi and VM logs – this is most likely not the case as my screen grab below shows.
In my case I had setup the Syslog service on my vCenter Server, and pointed my ESXi hosts at it. So, changed the Advanced Settings on each host and update the Syslog.global.logHost setting. It’s possible to use a comma as a separator and have ESXi send its logs to multiple syslog servers. Later on I realized that even if you are already using other syslog collection service(s), you can still use Runecast Analyzer to configure the ESXi hosts for redirection. The IP address of the Runecast Analyzer appliance will be appended to the existing list of syslog destinations. Whether or not you are using any other Syslog Collection service already (why not! Are you insane?) you can get Runecast Analyzer to configure the ESXi hosts to redirect their logs to it by using the little “spanner” icon like so…
To get logging working from the VM to the Runecast Analyzer it needs to be enabled on each VM, fortunately this task is easy to do from the web console, assuming the vCenter account has enough privileges. Once again, this is done by clicking the “spanner” icon under the list of VMs, and using the “Select All” option.
After a wee while the verbose dashboard will start populating its statistical chart and more details gathering of info from the logs. It’s hard to evaluate this feature given the time I have to devote to evals like this. I’ll keep Runecast Analyzer up and running for the next year, and update this blog post as and when I find something worthy to report.
I can imagine an external auditor or consultancy contracted to do a “Health Check” could find Runecast (and other tools) useful in the armory – as wading though all this manually is almost unthinkable. I see both and ‘internal’ use of this for regularly monitoring, as well as an external use. I’m not sure if Runecast Analyzer has a program by which the product can be licensed in this way, and that’s something I’m going to ask about at later stage. I did ask about this idea and discovered that Runecast don’t have a program per se, but they can offer a one-month license for such a purpose. However, despite this initial thought of mine it’s important to remember that Runecast Analyzer isn’t a one-time health check tool. The point is to run Runecast Analyzer continuously. It offers real-time and historical log analysis, regular KB updates and combining that with changes in the vSphere environment, so running Runecast continuously is strongly recommended.
On the subject of regular monitoring and notifications those defaults do need reviewing in the setup of the appliance. By default the “Analyze Now” is manually triggered, but you can setup a schedule using the “gear” icon in the top left which is typical of pretty much every web-based UI these days:
Once coupled to an email configuration in “alerting” you will start to populate your inbox with lots of lovely email. We all know how much we like that!
Joking apart. Stories of emails being ignored or even redirected to a “delete” folder are the stuff of legend. It might interesting to see these notifications be possible to more modern methods of communication such as a Slack Channel. I’m not sure if people would add this to their other dashboards in the NOC although Runecast told me they have customers that do. I think if they did they would want to work thru the long list of existing issues – either resolving them or ignoring them – so if Runecast Analyzer finds something it gets bubbled up on their radars and noticed. Of course, these emails could be directed at ticketing tool if needs be. I suspect it won’t be long before Runecast offers a plethora of different messaging options including such thing as SNMP.
So, that’s what Runecast Analyzer does. It’s not alone in the marketplace – and if this kind of system floats your boat you need to do your due diligence around selecting an appropriate provider. Two things stood out for me – Runecast flags up the blogger/social media aspect of reacting quickly to developing situations – that’s not something I’ve heard other companies talk about. Also the fact they are on-premises system rather than SaaS based make them different from some of the other players. Not that I think that is that important in this day and age. But, it is a difference, and that difference could be significant to you from a security and compliance perspective.
What are Runecast’s plans for the future? I didn’t find the guys unduly secretive about their plans, and there was no NDA around the briefing. To be honest we’re all grown-ups here, so someone like myself isn’t really bothered about timelines and timeframes. Having worked for a large software provider I know that roadmaps come and go, and timelines adjust accordingly. So what I was really looking for was idea of intentions coupled with a vision. Not some airy-fairy blue-sky thing, more practical actions that will make the product better.
Security Profiles. The four level categories are all well and good, but what’s really needed is profiles aligned to various different compliance bodies such as STIG, HIPAA, and PCI-DSS. Let’s face it we all want to be compliant because it is a good in its own right, but the other side of the coin is being able to demonstrate compliance in order to meet your business requirements to be certified to operate in the appropriate market place – plus there often penalties when you fail to do so. This profile approach is on their radar, and seen as a high priority amongst the management and development team at Runecast.
Beyond Virtual Infrastructure. One of the big themes is how everyone is trying to be less fixated on the plumbing, and more focused on higher-level services the plumbing provides a foundation. So whilst core vSphere (vCenter, ESX, and VMs) will always remain the bedrock of any good implementation – long, long, long ago VMware stopped being purely a virtualization vendor. So there are other VMware technologies in the portfolio that would be receptive to similar treatment – for me this includes such things as NSX, and VMware Horizon. Of course, there is LOTS for Runecast Analyzer to ingest from the VMware technology stack and it will have to triage its response by listening to customers, and looking at the volume of use these sorts of technologies are seeing. Again, Runecast gave me an indication this is something they intend to do – they just need to make sure they don’t bite off more than they can chew.
It’s all About the API. I’ve lost count of the number times I’ve heard delegates at User Groups and VMworld ask about APIs. How open they are and how complete they are. So whilst there’s always a demand for off the shelf solutions that rapid to deploy. People can and do want to integrate them into other tools and dashboards as well. Once again, this looks like to be one of the top priorities at Runecast…
And finally. Wanting to learn more? Do you happen to be in Frankfurt on the 14th June? Well, Runecast have a session at the Frankfurt VMUG UserCon. I say Runecast, actually it’s more of customer testimonial than a straightforward vendor pitch. I’ve presented at the Frankfurt VMUG, and I know the folks there aren’t shy with their questions. So hopefully it will be technically detailed and a worthwhile session. In fact this isn’t any old VMUG, it’s the Germany-wide UserCon, so all the more reason to attend…