SQL Server on Azure IaaS: HA and cost savings

Published by Marco Obinu on 19 January 202019 January 2020

In the previous article, I gave you an overview of the complexity that arises when you’re sizing a single SQL Server VM in Azure IaaS. Now, it’s time to introduce High Availability and costs reduction topics.

Azure IaaS and HA

Usually, an on-premises production environment has some sort of HA; probably, if you’re evaluating its migration to Azure, it’s reasonable you’re expecting the same level of availability you had on-prem.

We can obtain different levels of availability in our Azure virtual datacenter; normally, a higher level of availability corresponds to higher complexity in the architectural design.

First of all, you should define how much data you can lose in case of troubles (aka the Recovery Point Objectives of your solution) and how much time your system can stay offline before incurring huge money loss (aka the Recovery Time Objectives). This permits you to understand which kind of protection you should implement in your architecture.

On Azure, you have to assign Premium SSD disks to your VM to obtain a defined Service Level Agreement. VMs with Standard Storage, even Standard SSD, doesn’t offer you any kind of SLA.

A VM with Premium SSD disks offers a 99,5% SLA: this means your VM can be reached the 99,5% of the time, but you have no safety at the guest OS or application level. You can always protect the VM and SQL Server with backups but, if you have to recover from a big issue, your RPO and RTO will probably be more similar to a Disaster Recovery scenario than an HA one.

You can increase the availability and the SLA of your service by implementing native SQL Server technologies – like Always On Failover Cluster Instance or Availability Groups – in a solution composed by more than one VM. You can choose to distribute your servers in Availability Set to obtain a 99,9% SLA, or between different Availability Zones. This latter approach increases to SLA to 99,99%, which is a really significant uptime; anyway, it should be used with a grain of salt because it also raises the latency in the replication of the data between your nodes.

Once you defined the SLA required for your implementation, you need to choose the right HA technology. You can choose between what you already use for on-premises deployment – FCI or AG – but they come with some differences from the architectural point of view, due to Azure peculiarities.

If you have SQL Server Enterprise licenses, your best chance is to use Always On AG. With AG, you can obtain a Business Continuity / Disaster Recovery solution – which gives you both HA and DR protection – with a single technological component. They’re ideal also to extend your on-prem database system to the cloud.

Otherwise, if you are low on budget and have only SQL Server Standard licenses in your pocket, you can deploy a system based on Always On Basic AG or Always On FCI.

How to implement the storage for clusters?

All the solutions presented above, including FCI, require dedicated storage on each node. This raises the price of the solution, and it’s one of the main differences between an Azure implementation and an on-premises one. While in an on-prem datacenter you can use shared disks on a remote SAN for clustering, on Azure you need to leverage storage replication – with Storage Spaces Direct – to share data between nodes.

Shared disks for Azure solutions were announced at Ignite 2019; I hope they’ll arrive soon. In the meanwhile, you can look at Azure Premium File Shares to host the shared data of your cluster. This solution reproduces what you can achieve on-prem with a SQL cluster that uses SMB shares to archive its files on a scale-out file server; while it’s fascinating, it should be well evaluated in terms of cost-performance ratio.

What about networking?

Networking is another considerable difference between on-prem and Azure for HA scenarios.

In an on-prem cluster, clustered IP are represented by virtual adapters that can obtain their own IP address. On Azure, IP addresses are assigned to NICs connected to a VM; Azure DHCP is not aware of any virtual adapter inside the VM, and it’s not able to assign an address to it. You can always hardcode a free IP address in the cluster configuration, but:

you’re exposing your system to possible IP conflicts if other resources are deployed to the same subnet;
you may incur in routing issues

A load balancer in front of your cluster is the right way to manage this aspect and can govern all the clustered IPs. Of course, this raises the complexity of your solution, since you have to define load balancing rules, ports and probes, and integrate them with your clustering config.

If you add these aspects to the ones related to sizing we discussed last time, it may appear to you that SQL VMs on Azure are a nightmare. Well, they’re not so easy, but they still have their role in the big picture of cloud architectures.

Now, it’s time to write something about costs.

Are SQL VMs convenient?

Well, I think the right answer is: “It depends!”.

If you’re aiming to an Iaas architecture for a long term solution, without adequate analysis, you’ll probably end with a quite expensive environment. Usually, you should consider IaaS as an intermediate and temporary step of your cloud journey.

IaaS has its own strategic place when you want to extend an on-premises datacenter, or you’re looking for performance or features that are still not available on the PaaS platform. There can also be other scenarios where IaaS makes a lot of sense, but usually, if there’s a PaaS service that can support you, you should aim to it.

Anyway, Azure offers several ways to save some money while using an SQL Server VM.

For a dev/test environment, you can consider to deploy it on a Dev/Test subscription, that permits you significant savings on licensing fees.

The Pay-As-You-Go licensing model is ideal for an environment that can be turned off when not used. With PAYG, you’re paying licensing fees included in the VM fare, that is billed when the machine is turned on. This means that you’re not paying the license when your environment is turned off, and it can be a considerable saving compared to the full licensing fee.

On the other side, for a 24/7 environment that will remain active for at least 12 months, you should consider acquiring a reservation for the compute capacity. This reserve you discounted prices. Also, you can purchase Windows and SQL licenses on-premises and cover them with Software Assurance, to leverage Azure Hybrid Benefit and move them to the cloud. On the long term, AHB has a lot of sense over the PAYG model.

Additionally, the new SQL Server 2019 licensing conditions have terrific innovations for BCDR scenarios: if you’ve covered your licenses with SA, you can now deploy a DR node on Azure without additional licensing costs. With the previous licensing model, the cross-premises node had to be fully licensed. For more information, put an eye on the licensing guide.

So, in the end, is it worth it?

Of course!

An IaaS VM can be the right solution to a problem that PaaS is not able to solve yet – even with all the complexities discussed in these last two articles.

You can deploy a performant and resilient system, but the design phase requires strong expertise, as well as they’re typically needed to architect an on-prem environment. So, it’s usually a good idea to give a call to someone who has some hands-on experience on this platform, or at least to read the official documentation very well.

See you next time to start talking about PaaS!

Stay tuned!