SQL Server High Availability Solutions on Azure VMs

Published by Marco Obinu on

As mentioned in a previous post, you can choose between several options when it’s time to implement high availability for your SQL Server solutions hosted on Azure VMs.

I love to talk about SQL Server, high availability, and Azure VMs during my speeches. I prepared a series of templates that permit you to make some hands-on experience with:

Microsoft recently announced this last option, and it seems to be a real game-changer in this field. You can deploy a top-performance cluster similar to the ones you can create on-premises, avoiding the double-sized storage and the load balancer you typically need with other solutions. Unfortunately, both Azure Shared Disks and the support for DNN are still in previous. They’re full of limitations, but they look very promising.

In the repo, you can also find a script to deploy a single VM based upon my optimized ARM template.

Now, let’s open my SQLIaaSVMPlayground repo and deploy something.

How to use the templates

I prepared a series of PowerShell scripts you can use to deploy the environments. I usually open them in VSCode and execute them step-by-step via F8: my advice for you is to do the same thing. I managed dependencies between steps but never tried to run the whole script in a single shot.

Start creating a base infrastructure that all HA solutions will share. You can create it using the appropriate PS1 script.

Then, choose the corresponding PS1 script in the root folder to deploy the HA solution you want. 
Inside the script, you can find instructions to deploy VMs and related assets.

Once the VMs are ready, copy the subfolder related to the chosen HA solution to the first VM of your cluster. From that node, you can open ISE or VSCode and execute the PS1 script you copied. Again, in a step-by-step fashion and following the instructions contained in it.

The Azure Shared Disks scenario has some exceptions. Due to the preview, these disks are available only in the West Central US region. To avoid messing with global peering, I duplicated the base infrastructure in that region.

I usually prefer to follow this multi-step approach for my solutions, since it permits me to easily reuse my sample scripts as a base for my production environments. It’s not a fully-automated solution but, with a few mods, I can cover a whole lot of different scenarios. Theoretically, you can create an end-to-end template that deploys all the VMs and applies all the Guest-OS configurations, but I prefer to invest time this way only when I need to implement the same exact thing every time.

Deploying the base infrastructure

The base infrastructure contains:

  • a virtual network
  • a domain controller with related assets (availability set, disk, etc.)
  • a storage account that will act as the cloud witness for clusters
  • a proximity placement group that you can optionally assign to the VMs during your experiments.
Base infrastructure

Base infrastructure diagram

To deploy this infrastructure, run 01-EnvironmentPrepation.ps1. Edit the 00-CommonVariables.ps1 script to customize some of the default values I used.

As mentioned before, the Azure Shared Disks scenario is identical but isolated in the West Central US region. Adapt and deploy it by using the 00b-CommonVariables-WestCentralUS.ps1  and 01b-EnvironmentPrepation-WestCentralUS.ps1 scripts.

Deploying the High Availability solution

It’s now time to choose your HA flavor and deploy it!

Always On Availability Group

In this scenario, you’re deploying two stand-alone VMs with related, separated SQL Server instances. You then configure the Windows Server Failover Cluster backend to manage the Availability Group and its listener, that you set on SQL Server. The system uses the storage account created in the base infrastructure as a cloud witness, and you can associate the previously-created PPG to test its effectiveness.

Since I decided to use Windows Server 2019 as the operating system, you’ll see a DNN used to manage the cluster instance network name. An Azure Load Balancer exposes the clustered IP address assigned to the AG Listener.

Availability Group infrastructure

Availability Group infrastructure

It’s one of my favorite architectures for mission-critical deployments on Azure.

PROS:
  • There’s no shared storage to mess with, so you can define your preferred storage configuration in terms of disk tiers and caching options. From this perspective, performances are at their top;
  • Suitable for legacy scenarios, starting from SQL Server 2012+.
CONS:
  • It’s quite complex to manage: the replication acts at the database level, and you need to align the instances for external entities (DBATools is your friend, here);
  • The synchronous replication required for HA introduces some latency in the activities;
  • It’s expensive: in the vast majority of cases, you need SQL Server Enterprise Edition to implement it. Basic Availability Groups isn’t as suitable as the full version for typical scenarios;
  • Databases are in replica, so you need to provision 2x storage capacity.
Let’s do it!

Once the base infrastructure is ready, use the 02-AlwaysOnAG.ps1 script to deploy the VMs and related assets. When specified in the script, copy the contents of the AlwaysOnAG folder to the first cluster node, and execute AVG-GuestClusterConfig.ps1 in a step-by-step fashion. Remember to update the storage account key for the witness!

SQL Server FCI with Storage Spaces Direct

In this scenario, you’re deploying two nodes SQL Server Failover Cluster Instance. Volumes provisioned on a Storage Spaces Direct pool act as the required shared storage. The system uses the storage account created in the base infrastructure as a cloud witness. You can associate the previously-created PPG to test its effectiveness.

Again, a DNN manage the cluster instance network name. An Azure Load Balancer exposes the clustered IP address assigned to SQL Server FCI.

SQL Server FCI with Storage Spaces Direct

SQL Server FCI with Storage Spaces Direct

This solution is ideal when you aim at the storage performance, but you want to keep an eye at the costs. 

PROS:
  • It’s a typical Failover Cluster Instance. The failover acts at the instance level, and you don’t have to mess in keeping different instances aligned;
  • You can potentially add a lot of disks to obtain high performance (of course, you must size the VM accordingly);
  • You can implement it with SQL Server Standard Edition.
CONS:
  • You need at least two disks on each machine to form the S2D pool;
  • Volumes are mirrored, so you need to provision 2x storage capacity.
  • Adding additional disks to the pool can be tricky, depending on the number of columns you used while creating volumes;
  • All the disks are absorbed in the same pool. You can create different volumes for data and logs, but they’re on the same set of drives, and you can’t assign specific caching settings.
  • Not for the legacy system: Windows Server 2016 and SQL Server 2016, at the very least.
Let’s do it!

Once the base infrastructure is ready, use the 03-S2DFCI.ps1 script to deploy the VMs and related assets. When specified in the script, copy the contents of the S2DFCI folder to the first cluster node, and execute S2DFCI-GuestClusterConfig.ps1 in a step-by-step fashion. Remember to update the storage account key for the witness!

SQL Server FCI with Azure Premium File Share

In this scenario, you’re deploying two nodes SQL Server Failover Cluster Instance. An Azure Premium File Share acts as the required shared storage, avoiding storage waste due to replication. SMB shares host your databases. The system uses the storage account created in the base infrastructure as a cloud witness. You can associate the previously-created PPG to test its effectiveness.

Again, a DNN manage the cluster instance network name. An Azure Load Balancer exposes the clustered IP address assigned to SQL Server FCI.

SQL Server FCI with Azure Premium File Share

SQL Server FCI with Azure Premium File Share

This solution is ideal when you’re looking for easy day-by-day management.

PROS:
  • It’s a typical Failover Cluster Instance. The failover acts at the instance level, and you don’t have to mess in keeping different instances aligned;
  • Potentially, you can reserve the minimum space you need to host your databases (no 2x capacity here);
  • You can implement it with SQL Server Standard Edition;
  • Legacy-friendly: available from Windows Server 2012+ with SQL Server 2014+
CONS:
  • To achieve performance, you need to preallocate more space, since they’re directly correlated. This aspect has a significant impact on costs: Premium File Shares are more expensive than Azure Disks. You need to find a balance between size and performance, keeping bursting into consideration.
  • For medium to high-performance requirements in the storage compartment, an S2D FCI can be cheaper, even considering the additional storage needed for replication.
Let’s do it!

Once the base infrastructure is ready, use the 04-PFSFCI.ps1 script to deploy the VMs and related assets. When specified in the script, copy the contents of the PFSFCI folder to the first cluster node, and execute PFSFCI-GuestClusterConfig.ps1 in a step-by-step fashion. This last script contains a new approach to executing CMDKEY commands on remote systems: I described it here

Remember to update the storage account key for the witness!

SQL Server FCI with Azure Shared Disks and DNN

In this scenario, you’re deploying two nodes SQL Server Failover Cluster Instance. Azure Shared Disks that support iSCSI reservations act as the required shared storage. The system uses the storage account created in the base infrastructure as a cloud witness. Your availability set has a PPG associated since Shared Disks require it.

A DNN manages the cluster instance network name. Additionally, after the SQL FCI setup, I defined an additional DNN to control the SQL FCI DNS name, renaming the default-created Virtual Network Name. With DNNs both for cluster and SQL FCI, you can avoid creating the Azure Load Balancer. The solution is way easier than in other scenarios.

SQL Server FCI with Azure Shared Disks and Distributed Network Names

SQL Server FCI with Azure Shared Disks and Distributed Network Names

This architecture will be the new reference design for high-performance, high-availability solutions that don’t have requirements for Always On AG.

I’ll write an article soon about its fundamentals.

PROS:
  • It’s a typical Failover Cluster Instance. The failover acts at the instance level, and you don’t have to mess in keeping different instances aligned;
  • Cluster nodes share their disks, as in on-premises scenarios;
  • You can differentiate between data and log disks, applying different settings;
  • You can implement it with SQL Server Standard Edition;
CONS:
  • Still in preview, limited to the West Central US region;
  • You have a list of limitations due to the preview. I.e., as for now, you can’t use ReadOnly caching on data disks;
  • Documentation is still unclear about how to manage the IP assigned initially to the VNN after you insert DNNs in the solution.
Let’s do it!

As mentioned before, due to the preview availability limited to the West Central US region, you have to re-deploy a new base infrastructure to test this scenario. Use the 06-SHDFCI.ps1 script to provision it: it includes the execution of both 00b-CommonVariables-WestCentralUS.ps1  and 01b-EnvironmentPrepation-WestCentralUS.ps1 scripts in its first steps. Then, it provisions the VMs and related assets. When specified in the script, copy the contents of the SHDFCI folder to the first cluster node, and execute SHDFCI-GuestClusterConfig.ps1 in a step-by-step fashion. Remember to update the storage account key for the witness!

Single VM

Last but not least, if you only need to test a stand-alone, optimized, domain-joined SQL VM, let’s deploy it by using the 05-SingleVM.ps1 script.

Of course, remember to provide the base infrastructure before starting with the VM deployment!


Marco Obinu

Curious by nature, talkative geek who can speak in front of a public or a camera, in love with technology, especially SQL Server. I want to understand how things work and to solve problems by myself during work as in my many hobbies.

9 Comments

Kiran · 10 September 2020 at 04:48

Thanks for the excellent article ! How much does it cost to implement these HA solutions using your scripts on Azure ?

    Marco Obinu · 11 September 2020 at 12:53

    Thanks for your feedback Kiran! I’m usually very conservative about costs in my lab environments: you should be able to test everything on a free Azure account if you keep each machine up for a few hours. The only exception is represented by the “FCI with shared disks” template: I prepared the scripts when this configuration was in preview and available in the West US region, where the VM offering was very restricted. I used the smallest I was able to find, but they’re bigger than the VMs used in the other templates.
    Anyway, feel free to scale down the different templates to deploy a cheaper environment.
    Leaving apart the size of the DC, that is already relatively small, you can edit the ARM template parameter files you can find in the folders for each HA solution. You can choose a smaller VM, as well as smaller disks. Please keep in mind that the sizing used in these scripts, as well as any additional scale-down you may introduce, is meant to deploy lab environments: for production use, you need to scale-up everything, depending on your needs and following the MS best practices.

Daniel · 9 February 2021 at 05:49

Absolutely clear and complete article that clarifies all available alternatives for HA. Thanks for taking the time to create disk and sharing it.

Daniel · 9 February 2021 at 22:27

Hi, I have a question.
In a regular on-prem environment, the 2nd virtual machine does not have to be licensed for SQL Server on an FCI cluster, since the cluster is Active-Passive and only one instance is running at the same time. So, the question is…should I create a 2nd Windows-Only VM and manually install SQL Server on it to benefit from this?
Thanks!

    Marco Obinu · 9 February 2021 at 22:40

    Hi Daniel! If you’re using the pay-as-you-go way to license SQL on Azure, you need to license both nodes. Otherwise, if you have SQL licenses covered by Software Assurance, you can leverage the Azure Hybrid Benefits to license only the active node. Both nodes can be deployed starting from the official images on the marketplace (or from my ARM template, that is based upon them) and then you can use the SQL VM resource provider to change the licensing model on the fly.

      Daniel · 9 February 2021 at 22:48

      Wow, I didn’t know I had to license both nodes even in FCI when in Azure.
      Thanks so much for clarifying this!

      Regards,
      Daniel

        Marco Obinu · 9 February 2021 at 22:51

        It’s true only if you decide to pay the SQL license “embedded” in VM monthly fee. I usually work with Azure Hybrid Benefits with my customers, so in terms of licensed nodes you obtain the same conditions you normally have on-prem (you pay only for the active one).

          Daniel · 9 February 2021 at 22:57

          Well, in my case I don’t have software assurance and I will go for PAYG with the embedded SQL Standard License. So, If my understanding is correct, in this scenario I will have to license *both* nodes for SQL Standard even when only one instance will be active, right?

          Marco Obinu · 9 February 2021 at 23:01

          I think you’re right. But I’m not a licensing expert, so please consult your license provider to obtain an official confirmation. Anyway, if I recall well, the free passive node is a benefit of the software assurance, so you should probably license both nodes also on your on-prem cluster…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: