vROps Sizing and Architecture
Updated: Apr 27, 2022
There are several factors that go into sizing a vROps cluster, from vCPU and memory, to network latency, and disk. The new online sizing tool (https://vropssizer.vmware.com) in cooperation with the older spreadsheet style sizing document (https://kb.vmware.com/s/article/78495) can be used to properly size your vROps cluster, but first let's explore standard vROps reference architectures.
In general, your vROps cluster will look something like this (green indicates an active service, white indicates an inactive service):
Every vROps cluster will have a Master Node, an optional Master Replica Node for high availability (HA), optional Data Nodes, optional Remote Collector Nodes, and optional Witness Nodes for continuous availability (CA). Descriptions of each: https://docs.vmware.com/en/vRealize-Operations-Manager/8.1/com.vmware.vcom.vapp.doc/GUID-4BCED85C-950D-4B6D-B96E-7876393BE556.html
Master Node - the initial, required node in vROps. All other nodes are managed by the master node. In a single-node installation, the master node manages itself, has adapters installed on it, and performs data collection and analysis.
Master Replica Node - the master replica node is required for vROps HA. For more information on HA:
Data Node - in larger environments, data nodes help with workload. Specifically, they host adapters performing data collection as well as analysis. Larger deployments usually include adapters only on the data nodes so that master and master replica nodes can be dedicated to cluster management.
Remote Collector Codes - distributed deployments might require a remote collector node that can span firewalls, collect data from a remote source, reduce bandwidth across data centers, and/or reduce workload on the vROps cluster. Remote collectors only gather objects for inventory, they don't store data or performing analysis.
Witness Node - to use vROps continuous availability (CA), the cluster requires a witness node. If network connectivity between two fault domains is lost, the witness node acts as a decision maker regarding the availability of vROps. For more information on CA:
Now that we have node descriptions, how are they nodes used to build a vROps cluster? Let's start with the most straight forward design and work our way up to the largest available 16 node cluster.
Single node cluster - by definition, this will be a single master node. This configuration is used in smaller environments and often times during POCs or trials. The master node is the first (and only required) node in vROps, all other nodes are managed by the master node.
Two node clusters - note that anything beyond two nodes requires larger than size S nodes.
Master node and master replica node cluster - this is the smallest configuration available for those requiring HA. HA creates a replica of the master node on the master replica node, protecting it from failure.
Master node and data node - this configuration is used when a single master node can't handle data collection and analysis for the entire environment. Adapters and analysis of data are spread across both nodes in this cluster configuration.
Master node and a remote collector node - this cluster is created for environments that are distributed, perhaps span firewalls, and/or have remote locations. While the remote collector doesn't perform any analysis, it is used to collect data and help with workload.
Three node clusters
Master node, master replica node, and data node - similar to 2a above, but with an added data node to assist with data collection and analysis.
Master node, master replica node, and remote collector - similar to 2a above but with an added remote collector to assist with data collection and distributed locations.
Master node, data node, and remote collector - similar to 2c above, but with an added data node to help with collection and analysis.
Master node and two data nodes - similar to 2b above, but with an additional data node to assist with data collection and analysis.
Master node and two remote collectors - similar to 2c above, but with two remote locations.
And so on...
16 node clusters - this is the largest cluster supported in vROps 8.1. For more information on the largest clusters: https://kb.vmware.com/s/article/2093783
Now that we know about clusters and the nodes that make them up, how big should these nodes be? There are two sizing models, one for Master Nodes, Master Replica Nodes, Data Nodes, and Witness Nodes, the other for Remote Collector Nodes. Let's explore the former, in terms of vCPU and RAM they are sized as follows. It is best practice for all nodes in your cluster to be the same size:
XS - 2 vCPU and 8GB RAM
S - 4 vCPU and 16GB RAM (up to 32GB RAM)
M - 8 vCPU and 32GB RAM (up to 64GB RAM)
L - 16 vCPU and 48GB RAM (up to 96GB RAM)
XL - 24 vCPU and 128GB RAM
Remote Collector Nodes are sized as follows, it is best practice for all your Remote Collector Nodes to be the same size:
Standard - 2 vCPU and 4GB RAM (up to 8 GB RAM)
Large - 4 vCPU and 16GB RAM (up to 32GB RAM)
The amount of disk allocated to each node will depend on your collection interval, retention period, factored growth, and availability configuration (HA, CA, or neither). There are two ways to calculate vROps cluster sizing, both based on the number of objects and metrics being collected:
https://kb.vmware.com/s/article/78495 - this is the standard sizing guide that has been used since vROps 6.3. It includes vCPU, RAM, Disk, Network latency, Datastore latency, and IOPS requirements.
https://vropssizer.vmware.com - this is the newer online sizing tool that allows you to input vSphere objects as well as third party targets. If you're re-sizing an existing vROps cluster, you can use the Other Data Sources option at the bottom and enter your total object and metric counts. Once done click VIEW RECOMMENDATIONS to see the recommendations.
Both will give you vCPU, RAM, Disk, and IOPS. Latency requirements are given as follows:
Network Latency for data nodes < 5ms
Network Latency for remote collectors < 200ms
Network Latency for agents (to nodes) < 20ms
Datastore Latency < 10ms
A properly sized vROps cluster will provide a stable platform for your data analytics engine. It's important to review your sizing periodically, as your environments grows or contracts. It's trivial to allocate more vCPU, memory, or disk to existing nodes. It's also quite easy to add nodes to your cluster.