Windows Service Monitoring with VMware vROps
Updated: Apr 27, 2022
Years ago we had Hyperic Agents, then we had Endpoint Operations Agents, and now we have Telegraf Agents! As of vRealize Operations (vROps) 7.5, VMware is using Telegraf technology as its Agent-based solution.
Started in 2015, InfluxData's open source Telegraf project began with an agent framework and six plugins. It has since grown to over 200 plugins: https://www.influxdata.com/products/integrations. According to InfluxData:
"Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors. Telegraf is written in Go and compiles into a single binary with no external dependencies, and requires a very minimal memory footprint."
This blog will discuss the Telegraf Agent for Windows, which is to say the Telegraf Agent framework and the plugins related to data collection for Windows. We will explore the Telegraf Agents for Linux and other targets in a subsequent blog, but they look like this.
Before jumping into Telegraf it should be noted that vROps Service Discovery can perform some Windows Service (and Linux Process) monitoring. There are roughly 39 different services/processes discovered by vROps Service Discovery, documentation found here: https://docs.vmware.com/en/VMware-vRealize-Operations-Cloud/services/config-guide/GUID-CC683117-D936-432C-B30B-CCCA08E863FB.html
We aren't going to go into the details of vROps Service Discovery in this blog, but VMware Staff Technical Marketing Manager Matt Bradford has, read his blog here: https://blogs.vmware.com/management/2020/10/credential-less-service-discovery-with-vrealize-operations.html
Back to Telegraf, the Agent sits on the VM itself and pulls data from the Guest OS via the Input Plugins. As of vROps 8.4, it's supported for both virtual and physical machines.
We'll perform remote deploys here, but local installs are supported as well, instructions can be found here: https://docs.vmware.com/en/vRealize-Operations-Manager/8.4/com.vmware.vcom.config.doc/GUID-1ED6D96F-BF6E-4E92-8547-43BE68617A45.html
Before deploying there are a couple things we need to do:
Deploy a Cloud Proxy (CP) via Administration - Management - Cloud Proxies, click New. Before vROps 8.4 these were called Application Remote Collectors (ARCs). You can convert your existing ARCs to CPs via this KB: https://kb.vmware.com/s/article/83059
Reconfigure your vSphere Adapter/s to collect through the CP via Administration - Solutions - Cloud Accounts. Select the Adapter/s you'd like to send through the CP and click Edit. Adjust the Collector/Group to the CP under the vCenter tab.
We are now ready to deploy Agents from the vROps UI. Go to Administration - Inventory - Manage Agents - Manage Agents tab, and select the server/s you'd like to deploy to. In this case, we'll deploy to a Windows VM.
Select the Install icon and the appropriate radio button.
Then click INSTALL AGENT.
The last operation status will indicate your status. In this case one is in progress and one is completed.
Now that we have the Telegraf Agent installed and running, we can configure it to monitor whatever Services we'd like. Go to Administration - Inventory - Manage Agents tab and select the Manage Service icon.
There are several options, in this case we'll choose Monitor Windows Services.
We now configure the Display Name and Service Name we'd like to monitor. You can configure monitoring for as many services as you'd like via the + sign. I've configured monitoring for a couple SQL related services as shown below.
You can check on collection of the services here. Just select the Services listed for the VM in question from Administration - Inventory - Manage Agents tab.
The services will now show as child objects of the VM they are running on.
Notice the SQL Server (MSSQLSERVER) service object is healthy, as it's up and running. The up/down status of the service will be indicated via the AVAILABILITY metric. Service CPU and Memory metrics are available via UTILIZATION.
Notice the SQL Server Browser service object isn't healthy, which makes sense as it's not running. The AVAILABILTY metric will be 0, indicating it's down.
Even better, an Alert has been generated indicating the service is down.
You are now monitoring Windows Services with vROps and the Telegraf Agent! In our next blog we'll explore the Telegraf Agent for Linux.