top of page
Brock Peterson

VMware Aria Operations Dynamic Thresholding

At its core, VMware Aria Operations is a powerful data analytics engine: consuming data from vSphere, NSX, vSAN, and other non-VMware sources, analyzing that data, and providing feedback to the consumer. Part of analyzing that data is to determine what's normal and what's not, using things like Dynamic Thresholds (DTs) for abnormality monitoring. But, how are these DTs established, let's explore!




You can toggle them on/off here and configure the time at which they are calculated daily.


In product, they look like this.



I've selected a couple VM metrics and toggled on DTs via the highlighted icons in the toolbar. You can now see the DTs as the grey bar surrounding the metric chart. Hovering over it will show you the details.



You can use these DTs when creating Symptom Definitions which is a nice alternative to Static Thresholds.



But how are they calculated? Here's how!



Aria Operations filters data into six groups, ultimately applying different algorithms to each to determine applicable DTs. Let's briefly explore the six groups.


  • Multinomial Data: integer values only, like {0,1} or {2,4,6,8}, with a maximum of five different values. Here's an example:

If data is identified as Multinomial the Multinomial Data algorithm is applied to determine Dynamic Thresholds. If not Multinomial, we consider the next filter, Transient.


  • Transient Data: multi-modal data, meaning constant number values for a certain period, then another value for a period of time, etc. I think of these mathematically as continuous step functions.

If data is identified as Transient Data the Transient Data algorithm is applied to determine Dynamic Thresholds. If not Transient, the next filter is checked, which is Semi-Constant Data.


  • Semi-Constant Data: checks the data against its “almost constant” behavior. I think of this data as "spikey", here's an example:

If data is identified as Semi-Constant the Semi-Constant algorithm is applied to determine Dynamic Thresholds. If not Semi-Constant, the data is checked against the Trendy Data filter.


  • Trendy Data: checks data against the Mann-Kendall Trend Test, which categorizes data as Trendy or Non-Trendy. For Trendy Data, it is further categorized into Linear and Non-Linear. Here's an example of Trendy-Linear:

If data is Trendy then Dynamic Thresholds are determined based on the Trendy algorithm. If the data isn't Trendy, we check the Sparse Data Filter.


  • Sparse Data: explores data gaps, their frequency, and distribution over time.

Data is classified as Sparse Data if gaps have a uniform distribution over time in which case the Sparse Data algorithm is applied to determine Dynamic Thresholds. If not Spare Data, we go to the final filter which is Variable Data.


  • Variable Data: this is the final filter which categorizes data as either High- Variability or Low-Variability and applies Dynamic Thresholding algorithms accordingly. Here's an example of Variable (Low-Variability):


There are six different DT algorithms, one for each set of data. While the formulas themselves are proprietary, they are described in great detail here. This paper was presented by VMware Engineers at the 11th International Conference on Autonomic Computing. An updated article by the same authors can be found here.


Generally, each algorithm will first consider the periodicity of the data before making calculations and eliminate irrelevant historical data before making DT predictions for the future as well. Which is another use case for DTs, if for example the algorithms predict heavy workload on a certain day, the user can allocate more compute resources that day, and remove them the following day.


DTs can be used to predict abnormally low resource consumption as well, say for example certain workloads are used for cyclical runs (maybe nightly backups), DTs can be used to alert if/when those backups aren't running.


Dynamic Thresholds are a powerful way to monitor abnormality in Aria Operations! Thanks to VMware engineers Naira Grigoryan, Arnak Poghosyan, and Ashot Harutyunyan for consulting on this article.






Comments


bottom of page