VMware vRealize Ping Management Pack
Updated: Oct 14, 2022
VMware dropped vRealize Operations (vROps) 8.2 a couple weeks ago. Lots of great new stuff, including the VMware vRealize Ping management pack. Matt Bradford published a great blog post discussing everything new in 8.2: https://blogs.vmware.com/management/2020/10/whats-new-vrops-82.html.
I'd like to go into a bit more detail around the Ping management pack. For years, users have been asking for up/down functionality with varying degrees of success. I've personally used the VM Power State property, the VM Powered On metric, and the VM OS Uptime metric to determine VM availability. I've created symptoms, alerts, and a dashboard using these to show VM availability: https://bpatvmware.wixsite.com/bpatvmware/post/vmware-vm-availability
While helpful, this was mostly reactionary with no historical trends. I could tell when a VM was powered down and when a VM had recently rebooted, but I wanted more. The new Ping management pack provides the additional detail I was looking for. Let's explore the management pack itself. It's available as a Native Management Pack out of the box with vROps.
It includes a dashboard, five views, a symptom, and an alert:
To enable the Ping adapter, click the ACTIVATE button at the bottom of the management pack tile. It will take you to the adapter configuration:
I've called mine Ping_Adapter. According to the Information bubble, each adapter instance can support up to 5000 addresses, but in practice I've seen it supports up to 5000 characters. The configuration provides two different ways to list your endpoints, if the Address List is empty the Configuration File will be used. If both are empty, nothing will be used.
Address List - this is a comma separated list of endpoints. IPs, FQDNs, and CIDR notation IP ranges are supported.
Configuration File Name - this is the name of the file (XML) that contains your list of endpoints. IPs, FQDNs, CIDR notation IP ranges, and general IP ranges are supported. An example might look something like this:
Advanced Settings provides the user with even more functionality:
Wait Interval Time (second) - number of seconds to wait before running the next batch of pings, default is 0, range is 0-300 seconds.
Batch Size - number of request objects sent to each target, default is 20, range is 20-300.
Interval (millisecond) - number of milliseconds that that fping waits between successive packets to each target. Default is 2000 milliseconds, with no upper bound.
DNS Name Resolve Internal (minute) - number of minutes to re-resolve DNS names. Default is 30, minimum is 15, there is no upper bound.
Packet Size (byte) - number of bytes to be sent in ping. Default is 56, range is 56-65536.
Don't Fragment - turns off/on the "Don't Fragment" bit in the IP header. Default is False.
Generate FQDN Child IPs - turns off/on the generation of IP objects and adds them as children of the original FQDN. Default is False.
The Ping Overview dashboard provides the user with visibility into availability of their endpoints. The top row shows the latency and packet loss distribution for all targets. This allows the user to visualize the distribution of problems. The second row allows the user to select a ping target, which then populates metrics in the remaining widgets, showing things like: pack latency, average latency, peak packet loss, and average packet loss.
Sorting on peak latency and/or peak packet loss allows the user to easily see problematic endpoints. In lieu of watching dashboards, the user can also create an Alert based on any of these new metrics. It makes sense to create a Symptom/Alert pair based on peak packet loss of 100%, indicating a target is potentially down. Here's my Symptom, it will trigger when peak packet loss is 100%.
Here's my Alert, it will be generated after two consecutive Symptom triggers. I've configured it to be a Critical Alert.
Finally, I enabled the Alert in the vSphere Solution's Default Policy. I added a down target to my list of IPs/FQDNs and the Alert triggered.
Administrators can now be notified of down endpoints, be they VMs, ESXi Hosts, or any other object.