When you deploy Azure Stack HCI, you deploy it in the form of a cluster, with multiple nodes participating. A cluster is basically a group of otherwise independent servers (nodes) that are interconnected to work with each other to provide features like scalability, high availability, load balancing, etc. and many more.
An HCI cluster collects all the compute and storage resources available on each individual node into one big pool and leverages them as and when required. This ties back in with the S2D (Storage Spaces Direct) concepts we discussed in the last chapter since S2D provides the software-defined storage for AzS HCI, so you will notice us talking about some S2D concepts while talking about clusters. So if you’re not familiar with S2D yet, I’d recommend reading the previous post first and revisit this.
To create an AzS HCI cluster, you first have to install the AzS HCI OS on each participating node. You can do that either manually or simply purchasing the validated hardware from an OEM like Dell with AzS HCI OS already installed on it. Once you’ve done that, you can then use Windows Admin Center (WAC) to create the HCI cluster that uses S2D and optionally, SDN (Software-defined networking).
You have two deployment options, standard cluster with at least 2 nodes in a single site (for redundancy), or a stretched cluster with at least 4 nodes spanning across two sites with 2 nodes per site.
However, remember that it is recommended that if you plan to scale up from 2 nodes to more, it is wise to start with minimum 4 nodes (per site). The reason relates to how S2D works – 2 or 3-node clusters will implement “Mirroring” for resiliency and as you add more servers to the cluster, S2D gives you the option to switch to “Parity”, which is more storage efficient. So for transition, the mirror must be “broken”, meaning you’ll lose the copies of your backup data. Beyond 4 nodes, it is very easy to add more nodes to the cluster. All you must do is to plug in a new node, connect to the domain and it will be automatically detected and absorbed into the cluster.
Once you’ve deployed the cluster, next step is to register the cluster with Azure. Since AzS HCI is an Azure service, it needs to connect to Azure every once in a while. Remember that it only connects to Azure for billing and syncing metadata, and none of your actual organization data is leaving your premises. Now the only thing left to do is to validate that the cluster is fit to be used in production environment and you’re good to go.
Once the cluster is set up, WAC is where you’ll be spending most of your time monitoring and managing them . You manage the cluster from a remote computer with WAC installed on it and connected to the cluster, rather than on a host server in the cluster. You can also use PowerShell to manage the clusters.
Stretched Clusters
AzS HCI has a built-in DR function called “stretched clusters”. As the name suggests a stretched cluster is deployed spanning two sites, and the data is replicated synchronously or asynchronously so that data is accessible from the other site if one fails. If one site fails, all the workloads will be failed-over to the other site, automatically. These sites can be anywhere – two different rooms within the same building, two different buildings, two cities or two countries.
Stretched clusters can be configured in one of the two configurations – active-active or active-passive. In the active-active configuration both the sites are functional and provide resources to clients. Since data is written at both the sites simultaneously, data replication happens bi-directionally to make sure data is in sync in both the sites.
In active-passive configuration, data replication is uni-directional from the active site to passive site which is only used when active site goes offline.
Cluster Sets
Currently, a cluster requires minimum 2 nodes and a maximum of 16 nodes. In case of a stretched cluster, you require minimum 2 nodes per site (total 4) and a maximum of 8 nodes per site (total 16). However, there is a way to push these limits by combining the clusters to form “Cluster Sets” to create an HCI deployment of hundreds of servers. Cluster set limits have been validated to up to 64 servers, but you can go beyond that as there’s no set upper limit.
There are many benefits to using Cluster Sets such as increased resiliency, guest VM live migrations, Azure-like availability sets and fault domains to name a few.
There are also a few limitations which you should keep in mind before you plan your deployments.
Alright, let’s end this one here. Clusters are at the core of an AzS HCI deployment so I’m sure we’ll be talking about them more in upcoming parts. But until then, hopefully this has given you a reasonable understanding of clusters and why they’re important. As always, to learn more about clusters, head over to the official documentation here.
Next, we will talk a little about the networking concepts in AzS HCI. See you in the next one!