Disk Monitoring with VMware vROps
Updated: Aug 29, 2021
Out of the box, vRealize Operations (vROps) will monitor and alert on Virtual Machine Disk Space. The Alert itself is called "One or more virtual machine guest filesystems are running out of disk space" and can be seen here.
This will capture Linux filesystems and Windows drives that are running out of space. This is a Symptom Based Alert, so let's explore the underlying Symptom/s.
There are two Symptoms defined, the Alert will trigger if either of them are true. In this case a Warning Alert is generated for filesystems/drives over 90% full and a Critical Alert will be generated for filesystems/drives over 95% full.
Looking at the Symptoms in more detail, you can see the name, metric, value, and criticality.
Advanced Settings gives you the ability to define Wait Cycles and Cancel Cycles. The ability to wait until a Symptom has occurred twice, three times, or more. Hovering over the "Evaluate on instanced metrics" check box gives you some detail as well.
Checking this box evaluates the Symptom on instances of the selected metric. In the case of Disk, that means not only will aggregate Disk be considered, but individual Disks as well. For Linux VMs, think all filesystems. For Windows VMs, think all drives. We also have the ability to exclude instances here. For example, if you'd like to exclude all T: drives on Windows servers, you would do that here by dragging the T: drive metric over.
How do we make sure we're capturing all Linux filesystems and Windows drives? Policies! Find your Active Policy and edit it.
Click EDIT POLICY.
Select the tile you'd like to adjust, in our case Metrics and Properties. Choose Virtual Machine in the dropdown, then go to Metrics - Guest File System and confirm State and Instanced State are Enabled.
This confirms the metric is being collected and is enabled for all instances (all Windows drives and Linux filesystems). Not all metrics are State enabled or Instanced State enabled, so always confirm via Policies.
Back to Alerts, let's dig into one.
Clicking on the Alert will show you the Symptom details.
You can now see the Symptom being triggered, the name of the filesystem (/storage/log), and the threshold being breached (90%). The same thing can be done for Windows VMs. Exploring the VM in detail shows the fileysystems vROps is aware of.
This is similar for a Windows VM and it's drives.
Out of the box vROps will only see local filesystems or drives, it won't see NFS- mounted Linux filesystems or Windows shares. How can we get these? Telegraf Agents! Let's install the Telegraf Agent on a Linux server with an NFS mount to and explore.
Once the Telegraf Agent has been deployed, it'll present as a child object of the VM.
Single click that object and you'll be presented with the Metrics and Properties it captures.
You'll notice an NFS-mounted filesystem here: 10.216.178.69:/share1.
Exploring a Windows VM, let's install the Telegraf Agent.
Once installed, the Agent will present as a child object to the VM, as it did for the Linux VM previously.
In this case, the Windows share presents as HarddiskVolume1. To alert on these non-local filesystems and drives, you'll have to create a new Symptom/Alert combination as these are new object types. For example, if I wanted to alert on that Linux NFS filesystem we previous discovered I would create a Symptom like this.
The associated Alert would look something like this.
I set the threshold such that an Alert is created, here it is.
This is just one use case for Telegraf Agents, but a very powerful one. Keep in the mind the Telegraf Agents can be used to monitor OS processes, they can run custom scripts, and perform ICMP/UDP/TCP/HTTP remote checks. Telegraf Agents capture quite a few more metrics as well, they can be found here.
Disk monitoring in vROps can be quite powerful, with all sorts of levels of granularity and alerting capabilities. To explore vROps in more detail go here!