Posts Automate Azure DR with Azure Policy
Post
Cancel

Automate Azure DR with Azure Policy

Automate Azure Backups and Azure Site Recovery with Azure Backups - AzureIs.Fun

Today’s blog post is part of the community event Azure Spring Clean. We will discuss the importance of automating operational acceptance criteria, such as Azure Backups and Azure Site Recovery.

Operational Acceptence

Long before the widespread of the cloud, companies established the procedure called Operational Acceptance Testing. This is a procedure during which we test several aspects of our systems. We use this to verify if everything is created as stated in the design. More importantly, we demonstrate the functionality of everything before releasing the system to be used in production.

If you aren’t familiar with this procedure, here is an example of just a few things that we need to test on a Server:

  • Verify that documentation exists and it is up to date
  • Test OS and server operability and stability
  • Test system performance
  • Verify that server contains required updates, software, modules, developing environment, etc.
  • Failover (depending on configuration)
  • Network configuration and connectivity
  • Verify that monitoring and alerting is configured
  • Verify that Backups are configured and that the latest backup can be recovered

This should be enough to give you an idea. There are many more specific settings that we need to test, depending on the environment and purpose of the particular server. Since today’s article is about DR, I will stop at the backups.

In my experience, in larger companies where we have multiple teams and layers of IT infrastructure, and established processes, it is difficult for a person who requested a server to get it before it passes OAT. And that is great.

In smaller organizations, however, this can be a problem. When these servers are created on request to the service desk, they will at least configure monitoring, and DR. Azure simplifies the creation of the new servers. Managers and application owners often do not request that from a service desk and create these VMs themselves.

How many times did you or someone in your organization create a server to test something that later became a production server? Did it pass OAT? Are you monitoring it? Does it have DR? Did you try that DR? Did you create documentation for it? In every Azure environment I ever touched, I found at least a few servers that didn’t have any of that.

Luckily in Azure, we can utilize different tools to automate or audit most of this, and one of the greatest tools for that is Azure Policy. With Azure Policy, we can enable monitoring, OS updates, and even DR.

Azure Backups

While there are other backup solutions that we can implement in Azure, using Azure Backups is straightforward, secure, and can be very cost-effective. We can use it to backup our on-premises environment, as well as Azure IaaS VMs, Managed Disks, File Shares, Blobs, SQL Servers, and Azure Databases.

To avoid having production services and resources without functional backups, we can use Azure Policy to check if our resource is in production and perform an audit to inform us that there is no backup. Or automatically turn on backup for that resource.

Prepare Azure Backups environment

Before you can start using backups, you need to do a few things:

  • When using Azure Backup for the first time, we must register Azure Resource Provider first:
1
Register-AzResourceProvider -ProviderNamespace "Microsoft.RecoveryServices"
  • We also need a Vault. Depending on what you want to backup, this can be either Recovery Services Vault or Backup Vault:

Azure Vault Types

1
New-AzRecoveryServicesVault -Name $RSVName -ResourceGroupName $RGName -Location $Location
  • The third thing we need is a Protection Policy. We can have multiple Backup Policies for different workload or environment types:
1
Get-AzRecoveryServicesBackupProtectionPolicy -WorkloadType "AzureVM" -VaultId $VaultID

Here are the workload types:

Azure Backup Policy Configuration

And these are the parameters of Azure Backup Policy: Azure Backup Policy Configuration

  • We can now manually or automatically associate our workloads with this Backup Policy we created.

Azure Policy for Azure Backups

Azure Policy can audit and enable Azure Backups based on the specific parameters. This will work great in our scenario, where we want to ensure that our production Virtual Machines are being backed up.

The first Azure Policy we can use is a Built-In Azure Policy, that will allow us to Audit if Backup is Enabled on Azure Virtual Machines. Here is that Policy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
{
  "properties": {
    "displayName": "Azure Backup should be enabled for Virtual Machines",
    "policyType": "BuiltIn",
    "mode": "Indexed",
    "description": "Ensure protection of your Azure Virtual Machines by enabling Azure Backup. Azure Backup is a secure and cost effective data protection solution for Azure.",
    "metadata": {
      "version": "3.0.0",
      "category": "Backup"
    },
    "parameters": {
      "effect": {
        "type": "String",
        "metadata": {
          "displayName": "Effect",
          "description": "Enable or disable the execution of the policy"
        },
        "allowedValues": [
          "AuditIfNotExists",
          "Disabled"
        ],
        "defaultValue": "AuditIfNotExists"
      }
    },
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "type",
            "equals": "Microsoft.Compute/virtualMachines"
          },
          {
            "field": "id",
            "notContains": "/resourceGroups/databricks-rg-"
          },
          {
            "field": "Microsoft.Compute/imagePublisher",
            "notEquals": "azureopenshift"
          },
          {
            "field": "Microsoft.Compute/imagePublisher",
            "notEquals": "AzureDatabricks"
          }
        ]
      },
      "then": {
        "effect": "[parameters('effect')]",
        "details": {
          "type": "Microsoft.RecoveryServices/backupprotecteditems"
        }
      }
    }
  },
  "id": "/providers/Microsoft.Authorization/policyDefinitions/013e242c-8828-4970-87b3-ab247555486d",
  "type": "Microsoft.Authorization/policyDefinitions",
  "name": "013e242c-8828-4970-87b3-ab247555486d"
}

The second very useful Azure Policy is to enable Azure Backups for Virtual Machines to use the existing Azure Backup Policy and Azure Recovery Services Vault, based on the value of a specific Tag.

In our case, this will be a tag Environment, with value Production. This Policy will ensure that Azure Backup will be enabled for all production VMs that did not have that enabled.

You can find this Azure Policy HERE.

Azure Site Recovery

ASR is a full Disaster Recovery Solution, that allows us to replicate our data almost in real-time to a different location, and perform failover if needed. Azure offers various ways to achieve global high availability, and this particular solution is not really intended to be used to replicate from one Azure Region to another. That scenario is supported and you can use it that way.

Azure Site Recovery is near and dear to me because it single-handed saved my client from a forest fire that reached the data center.

Azure Policy for Azure Site Recovery

If you already have your Azure Site Recovery configured and running, you can now audit and enable it at scale with Azure Policy.

The first Policy we can use for this purpose is Audit virtual machines without disaster recovery configured to find any VMs that do not have it enabled. You can find this Azure Policy HERE.

The second Azure Policy is to Configure disaster recovery on virtual machines by enabling replication via Azure Site Recovery. This Azure Policy can be assigned to a Resource Group, and it will automatically include all new Virtual Machines in that Resource Group during creation. For the existing VMs, we can always run the Remediation Tasks to enable it. The Policy Definition is HERE.

OAT Automation

This article focuses only on Azure Backups and Site Recovery. However, we can use Infrastructure as a Code, or automation tools such as Azure Policy or PowerShell to automate more from our Operation Acceptance Test list. For example, it can be effortless to enable Azure Update Services or Azure Monitoring and remove that from the manual task list for every new service or VM.

Going back to Azure Backups, with some help from PowerShell, we can also automate Azure Backup Restore tests and perform them regularly, so we can confirm that everything is working before we need it. More on that topic in a future article.

Azure Spring Clean 2022

This blog post is part of the third annual Azure Spring Clean, an excellent initiative by Joe Carlyle and Thomas Thornton. The Azure Spring Clean gathers community-created content focused on managing, governing, and cleaning Azure tenants. It is a superb content selection, so make sure you check it out.



Thank you for reading and keeping your cloud clean.

Vukašin Terzić

Updated Mar 15, 2022 2022-03-16T07:29:25+01:00
This post is licensed under CC BY 4.0