Monitoring Windows Services/Processes and Linux Processes with VMware Aria Operations
Want to monitor a Windows Service/Process or a Linux Process? We can do it in Aria Operations with the Telegraf Agent. Assuming the Telegraf Agent has been installed on the necessary servers, let's explore each use case.
We blogged about Windows Service monitoring previously, but we'll briefly discuss it again here. Find your Telegraf Agents at Configure - Applications and Services - MANAGE AGENTS.
Click MANAGE AGENTS to see all Telegraf Agents. Find a Windows VM, expand it, click the three dots next to Services and select Add.
You are then prompted for the Windows Service you want to monitor.
The Display Name here is what you will see in Aria Operations, it's not the Windows Display Name. The Service Name is the Windows Service name as displayed in the Windows Services utility.
If I want to monitor the VMware Tools service, the Service name VMTools is what I'd use, it would look like this.
Once saved, it will look like this, documentation can be found here.
To see the details click GO TO DETAILS which will take you to the VM Summary page, once there go to the Metrics tab.
Here you will see the available metrics and properties to monitor. Currently, we cannot add Windows processes via the UI for monitoring, but we can adjust the Telegraf.conf to do so, here's how you do it.
You'll notice that the Telegraf Agent is already capturing two Windows processes, namely _Total and telegraf as shown here.
These are configured in the telegraf.conf file, which is encrypted upon install, but can be extracted via the following command:
C:\VMware\UCP\ucp-minion\bin\ucp-minion --config C:\VMware\UCP\salt\conf\grains --action=xtract_config --dest_dir=C:\VMware\UCP\ucp-telegraf
This will extract the encrypted telegraf.conf to a readable format and put it in the destination directory, in my case C:\VMware\UCP\ucp-telegraf. Once run, you'll find it here.
You can edit it via Notepad. You'll notice this section in the INPUT PLUGINS stanza:
[[inputs.win_perf_counters.object]] ObjectName = "Process" Counters = ["% Privileged Time", "% Processor Time", "% User Time", "Elapsed Time", "Handle Count", "IO Read Bytes/sec", "IO Read Operations/sec", "IO Write Bytes/sec", "IO Write Operations/sec", "Private Bytes", "Thread Count", "Virtual Bytes", "Working Set", "Working Set - Private"] Instances = ["_Total", "telegraf"] # Replace this with a list of process names that you want to monitor. "_Total" is all processes combined Measurement = "win.process"
You'll notice _Total and telegraf in the Instances, this is where you can add additional processes to monitor, something like this:
[[inputs.win_perf_counters.object]] ObjectName = "Process" Counters = ["% Privileged Time", "% Processor Time", "% User Time", "Elapsed Time", "Handle Count", "IO Read Bytes/sec", "IO Read Operations/sec", "IO Write Bytes/sec", "IO Write Operations/sec", "Private Bytes", "Thread Count", "Virtual Bytes", "Working Set", "Working Set - Private"] Instances = ["_Total", "telegraf", "notepad", "python"] Measurement = "win.process"
I've added two additional processes to monitor here: notepad and python. Once done, save the new telegraf.conf file and cycle the agent. Back at the VM Summary Metrics tab you will now see the new processes being monitored.
You can now monitor the status and performance of these Windows processes. This isn't currently supported, as the telegraf.conf can be easily overwritten by agent updates. If there are credentials in the encrypted telegraf.conf they will also be exposed in the unencrypted telegraf.conf. I've opened a feature request for the ability to create Windows process monitors via the UI, like we do for Windows Services.
We can also monitor Linux processes with the built-in Telegraf Agent. Back to our list of Telegraf Agents, let's explore one running on a Linux VM.
Expand the VM and click the three dots next to Processes, then click Add.
The Display Name is what will show in Aria Operations. There are three Filter Types available:
Executable Name - /process/executable_name_here for example.
Regex Pattern - ntpd* for example.
Pid File - pid file path, /var/run/crond.pid for example.
Let's explore Linux Processes first, consider the HTTP process on this Linux VM:
[root@web1 ~]# ps -ef | grep -i http | grep -v grep apache 433 24557 0 05:24 ? 00:00:03 /usr/sbin/httpd -DFOREGROUND apache 5517 24557 0 14:30 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND apache 5907 24557 0 14:36 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND apache 6871 24557 0 06:57 ? 00:00:02 /usr/sbin/httpd -DFOREGROUND apache 9628 24557 0 15:30 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND apache 11444 24557 0 15:57 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND apache 12269 24557 0 16:09 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND apache 18041 24557 0 Jan09 ? 00:00:09 /usr/sbin/httpd -DFOREGROUND apache 18948 24557 0 09:54 ? 00:00:01 /usr/sbin/httpd -DFOREGROUND root 24557 1 0 2022 ? 00:19:04 /usr/sbin/httpd -DFOREGROUND apache 31909 24557 0 13:03 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
Let's create a monitor for it using the Executable Name, which in this case is httpd and would look like this.
Going to the VM Summary Page and the Metrics tab you can now see the details for the Linux process httpd.
In addition to CPU Usage and Memory Usage, we can also monitor how many httpd processes are running.
Next, let's create a Linux process monitor for sshd on the same Linux VM using the Regex Pattern Filter Type. The process looks like this.
[root@web1 ~]# ps -ef | grep -i sshd | grep -v grep root 1009 1 0 2021 ? 00:00:01 /usr/sbin/sshd -D root 12741 1009 0 16:15 ? 00:00:00 sshd: root@pts/0
Creating a monitor for it using Filter Type Regex Pattern:
This is just one way to do it, any Regex that matches will work. Going to the VM Summary Page and the Metrics tab you can now see the details for the Linux process sshd.
You can now create Alerts for CPU, Memory, and the Number of SSHD processes.
Finally, let's use the Pid File Filter Type to monitor the crond process. This filter type checks for the existence of the file holding the process ID, then checks to confirm it's running. Our crond process looks like this:
[root@web1 httpd]# ps -ef |grep -i crond root 766 1 0 2021 ? 00:01:07 /usr/sbin/crond -n root 16370 12745 0 16:59 pts/0 00:00:00 grep --color=auto -i crond
Our Pid File looks like this:
[root@web1 run]# ls -latr crond.pid -rw-r--r-- 1 root root 4 Apr 29 2021 crond.pid [root@web1 run]# pwd /var/run
Its contents look like this:
[root@web1 run]# cat crond.pid 766
This is the Process ID the monitor will be looking for. Back to Telegraf, this is what the monitor looks like.
Back to the VM Summary Page Metrics tab, we'll now see details for the crond process.
Telegraf Agents are powerful, Windows Service/Process monitoring and Linux Process monitoring are just two use cases, enjoy!