vROps Cluster Architecture: Stand-Alone, HA, and CA
Updated: May 9, 2022
We last discussed vROps Clustering almost two years ago, it's time for an update! Let's first define the Nodes that comprise vROps Clusters. They are formally documented here, but it'll be helpful to quickly list them.
Primary Node - the initial and only required Node in vROps. All other Nodes are managed by the Primary Node. In a Single-Node installation, the Primary Node does everything.
Data Node - these have adapters installed on them, collect data, and perform analysis. Larger deployments usually include adapters only on the Data Nodes so that Primary and Replica Nodes can be dedicated to cluster management.
Replica Node - vROps High Availability (HA) and Continous Availability (CA) requires that you convert a Data Node to a Replica Node. This is a replica of the Primary Node, in the event the Primary Node fails.
Witness Node - vROps Continuous Availability (CA) requires a Witness Node. The Witness Node acts as a decision maker regarding the availability of vROps.
Remote Collectors - distributed deployments might require a Remote Collectors (RCs) that can navigate firewalls, interface with remote data sources, reduce the bandwidth across data centers, or reduce the load on the vROps Analytics Cluster. Remote Collector Nodes only gather objects for the inventory, without storing data or performing analysis. In addition, Remote Collector Nodes might be installed on a different operating system than the rest of the cluster.
It's important to note that Primary Nodes and Replica Nodes are also Data Nodes. The Analytics Cluster is all Primary, Replica, and Data Nodes. The vROps Cluster includes the Analytics Cluster and any Remote Collector Nodes.
Outside the vROps Cluster, you can also have Cloud Proxys (CPs). Originally, these were the Remote Collectors for vROps Cloud deployments, but have since been designed to replace RCs entirely. Their sizings can be found here. A Stand-Alone deployment might look something like this.
You can build a vROps Cluster several different ways: Stand-Alone, High Availability (HA), or Continuous Availability (CA).
Let's start with the base option (pictured above), Stand-alone options look like this.
Single Primary Node Cluster - in this deployment, your vROps Primary Node will perform all functions: Admin UI, Product UI, REST API, Data Storage, Collection, and Analytics. These are frequently deployed for trials or POCs. Primary Nodes are sized according to how many objects and metrics they will be ingesting, details can be found here.
Primary Node and at least one Data Node, but can include up to 16 Data Nodes. Data Nodes can do everything the Primary Node can, just not serve the Admin UI. They are often used to take workload off the Primary Node. Note that the Primary Node is also a Data Node.
Primary Node and at least one Cloud Proxy (CP), but can include up to 60 CPs. CPs were formerly known as Remote Collector Nodes (RCs) and are used to navigate firewalls, consume data from a remote source, reduce bandwidth across Datacenters, and more. They only collect metrics, they don't store any data or perform any data analysis. RCs are a part of the vROps Cluster, whereas CPs are not.Primary Node, at least one Data Node (up to 16), and at least one CP (up to 60).
The Stand-Alone options visually look like this.
There are several best practices when building vROps Clusters, for example: deploy Nodes in the same vSphere cluster in a single data center and add only one node at a time to a cluster allowing it to complete before adding another node. For more details on best practices go here.
Customers will often front their vROps Cluster with a load balancer to avoid any service outages in the event a Data Node is lost. That load balancer can point at the Primary Node or any Data Nodes, as they all serve the UI. If the Primary Node is lost however, there will be data loss and a Cluster rebuild.
vROps 6.0 introduced HA, giving us some protection from the loss of an Analytics Node (Primary Node, Replica Node or Data Node). To be clear, vROps HA is not a Disaster Recovery (DR) strategy, but it does provide some protection against data loss. Similar to our non-HA Cluster designs, we simply add a Replica Node, giving us the following.
Primary Node and Replica Node
Primary Node, Replica Node, and up to 16 Data Nodes
Primary Node, Replica Node, and up to 60 CPs
Primary Node, Replica Node, up to 16 Data Nodes, and up to 60 CPs
As described here, vROps HA creates a replica of the Primary Node in what is called the Replica Node, and protects the Analytics Cluster against the loss of a Data Node. vROps uses a PostgreSQL Database spread across all Data Nodes (Primary, Replica, and Data Nodes) to store all data, so if we lose the Primary Node, the Replica Node gets promoted to Primary and we run without data loss. If we lose a Data Node, that data is also available on the Primary/Replica Nodes (think RAID5), so there will be no loss of data. If we lose more than one Data Node we will experience data loss.
Best Practices for a vROps HA Cluster deployment can be found here. In the end, your vROps HA Cluster will look something look like this.
You could front you vROps HA Cluster with a Load Balancer as before, pointing at your Primary Node, Replica Node, and and Data Nodes.
vROps 8.0 introduced Continuous Availability and the concept of Fault Domains. I think of vROps CA as vROps HA with the Replica Node in a different physical location, along with paired Data Nodes and a Witness Node to keep track of everything.
vROps CA protects us against the loss of an entire Fault Domain, ie Datacenter. As described here, with CA, the data stored in the Primary Node and Data Nodes in Fault Domain 1 is continuously synced to the Replica Node and Data Nodes in Fault Domain 2. vROps CA requires at least one Data Node in addition to the Primary Node and they must be paired, ie a Data Node in Fault Domain 1 requires a Data Node in Fault Domain 2.
There is a third node, called the Witness Node, which neither collects nor stores data. Rather, it determines which Fault Domain the vROps Cluster should be running in. Think of it as a traffic manager of sorts, routing traffic based on the health of the vROps Primary Node.
Ideally, you would have three physical locations here, but Fault Domains can be defined as you wish. vROps CA deployments provide you with the most protection available today. Similar to Stand-Alone clusters and HA clusters, customers can front their vROps CA Cluster with a Load Balancer to send users to the active cluster. For more information on vROps and to obtain a trial go here!