The Realities of Azure Update Management

Overview

 

Azure Update Management has been around for a while, and it does what Microsoft say, i.e. it patches servers. But what if you want more than that?

What if you want to know how to deploy the components that support Update Management?

What if you want to know how to monitor the patching state of your VMs?

What if you want to add in some automation to help you?

What if you want to know the realities of using Update Management?

Here are my realities based on my experience with it:

Easy enough, follow the guides. There are plenty of those that take you through the basics of adding in VMs, patching them up to date and lording it over your newly found compliance.

Points to note:

When you first enable Update Management on an automation account that links to an existing workspace, EVERY machine reporting into that workspace appears in the Update Management console, despite having not been configured for Updates. This is some weird behaviour that does go away, although it could take days. Those machines won’t get patched but it’s still really annoying having all these VMs in the console when they shouldn’t be. It doesn’t stop you using Update Management in the normal way, it just clutters up the console.

Should you be deploying using Terraform, be aware the resource “azurerm_log_analytics_solution” needs the Update Management automation account and the workspace to which it reports to be in the same resource group.

I did not deploy a WSUS server as a source for updates, my VMs were getting them directly from Microsoft.

Adding a VM through the console is quick and easy and painless. But it is manual and boring.

If you want to automate this through a runbook, the one Microsoft provide isn’t very good and when I first tried to use, didn’t actually work.

You can have a look here:

It’s also written using two years+ old AzureRM modules, so not ideal. It has limited functionality and I ended up writing my own to fit requirements so be aware, you may have to as well, I wouldn’t rely on the Microsoft one.

I only required it to patch Azure VMs, through it can do on prem as well.

The console and its data are not up to date, things that happen take time to be reflected in the console.

Time from powering on/adding in a VM to appearing in the console — up to three hours!

Time from a new patch released to appearing on the list — up to 15 hours!

Time from patch deployment to results to come through — up to an hour!

Compliance scans are every 12 hours on a Windows VM, though you can restart the agent in an attempt to get it to do within 15 minutes or so. The time and date of last compliance scan is at least shown in the console.

These are useful points to know for when trying to test. If you build or power on a VM for testing, be aware you’ll have to wait for it to appear first.

If you configure a deployment run, the results may not be as they first appear. I was surprised by the outcome of the deployment below.

The deployment run had within it a single VM, which required a single patch. That patch failed to install yet the VM and the deployment run were triumphantly marked as a success.

Microsoft confirmed to me this is expected behaviour as the status of the VM and the run depend on whether the run was started and ended. Not what actually happened on the run, i.e. did you install all the patches I asked you to?

For me, this caused was a problem as I was banking on that data from which to raise alerts.

Run update detail
Run update detail

The other issue it causes is that the run history, as shown below, summarises everything and at first glance you’d think there were plenty of successes here, but actually some of them failed to install the required patches and you wouldn’t know unless you went and checked each run.

Run list history
Run list history

The one failure I do have in the list below relates to an issue with the run and therefore a failure. Ironically the issue with the run was “MaintenanceWindowExceeded”, which isn’t true, because, as you can see, it took 17 minutes from a window of 100 (120–20 as the final 20 minutes are for reboots). It still had 83 minutes left to install patches. Another issue with the data presented to us.

Should you want to have your Windows VMs report to more than one Log Analytics Workspace (Linux can only report to one workspace), for example a separate workspace for your Update Management data, then you may want to consider the below.

  • You can’t do this through the console. If a VM already reports to another workspace, it’s greyed out and you can’t enable Update Management on it. You can add a second workspace other ways, such as in the Microsoft Agent config on the VM itself or invoking a local Powershell command from a runbook utilising Invoke-AzVMRunCommand. In order to add to Update Management, though, you need the VM reporting to the workspace and then you still need to go and alter the DefaultMicrosoftComputerGroup saved query on the workspace. Once both these are in place your VM will eventually show up in Update Management.
  • The Powershell cmdlet Get-AzVMExtension can be used to check if a VM is reporting to a workspace (perhaps to see if you need to onboard into UM) but it will only ever return one workspace, due to a bug (https://github.com/Azure/azure-powershell/issues/13815). I couldn’t see a pattern that helped predict which workspace was returned but it only ever returned one. The order of the workspace listed in the agent console on the VM had no impact.
  • It doesn’t appear though Terraform or ARM have a way of defining the MMA extension with more than one workspace. It only seems to accept a single input. This makes idemopotency difficult given that you can’t specify two workspaces. When Terraform does a plan, it will only retrieve one workspace back (see point above), which causes deployment issues.
  • Information around management of the agent can be found here: https://docs.microsoft.com/en-us/azure/azure-monitor/agents/agent-manage

Update Management orchestrates the installation of updates on your machine but it doesn’t provide a repository for them. Whatever config your server has for update retrival is what Update Management will use.

The consequence of this is that identical VMs with different respository settings may show different compliance results so it’s helpful to have consistency with the source of your updates.

Microsoft do not have an internal endpoint for Windows updates, so it’s likely your source may be the internet unless you have a WSUS server. Either way, the updates themselves are over HTTP with the metadata over HTTPS.

Microsoft list the URLs required for Windows Update here:

After checking the Windows Update log on the VM and correlating with the proxy logs, I also found additional URLs needed to be allowed. Here is a page listing the URLs required for the appropriate build of your OS. Those with Windows Update in the description field were required and I allowed them through. Things then started operating much more smoothly.

Windows 10, version 1903, connection endpoints for non-Enterprise editions – Windows Privacy

If your Microsoft Management Agent on your VM wants to talk to Log Analytics, it will have to go over the Internet*. This is documented here: https://docs.microsoft.com/en-us/azure/azure-monitor/agents/log-analytics-agent#network-requirements

This is a consideration from a security point of view. Traffic is over TLS and you can limit the opinsights URLs by prepending the workspace ID to them (I think you can do the automation one as well with the automation account ID, the blob storage one seemed random and I couldn’t pin it down).

This may not be a concern but it is worth being aware of.

*You may not need to go over the internet if you leverage the Private Link for Azure Monitor Service (AMPLS). https://docs.microsoft.com/en-us/azure/azure-monitor/logs/private-link-security

Trying to retrospectively fit this in might be difficult given the impact it would have and, aside from all the reworking, the limitations of the service meant it was not an option for me.

Depending on your requirements, this may or may not be something out the box you can crack on with. My requirements were to extract data into an Event Hub and then dashboard it up or alert from it.

To get the data that appears in the console about the update runs and the machines in those update runs I found best extracted through the Azure Management API under the Automation section.

To get data about the onboarding runbook I enabled diagnostic logs on the automation account.

https://docs.microsoft.com/en-us/azure/automation/automation-manage-send-joblogs-log-analytics#configure-diagnostic-settings

To get patch level data about compliance and what happened on each VM per patch run, I used Azure Monitor to stream data from Log Analytics (tables Update and UpdateRunProgress) into Event Hub. At the time of writing this feature is in preview and there are some quirks and caveats with it but it does give the data I want, i.e. did a patch succeed or fail on a particular server on a particular update run.

Update Management does allow include/exclude KB numbers so you can control the patches installed. This doesn’t extend to version, though, so Linux packages will automatically get updated to the latest version unless you version control it on the box. For most Microsoft updates you won’t have this issue, though Defender AV updates have the same KB number covering multiple versions so it’s likely your production machine will get a different version than your staging machine.

If you want to patch your production environment with only patches that have been through staging, you’ll have to manage that yourself using the above option of include/exclude.

Any patches installed outside of Update Management will show up in a compliance scan but you won’t know when they were installed without additional logging, the Log Analytics fields relating to this were blank. You should be able to work out a date range between compliance scans, though.

It is not currently possible to initiate a compliance scan or a patch run on demand from the console.

There is no option to rollback or retry failed patches from the console, this will require manual intervention.

Update Management is worth using, it saves additional infrastructure, is integrated into other Azure services and it will achieve the goal of getting your VMs patched.

BUT

It is probably only one piece of your patching solution and will need other components around it in order to fulfil your requirements.

There are considerations around deployment and management, which is the reason I have written this, to give you some information ahead of time.

If you want a much more hands off approach, have a look at Automatic VM Patching from Azure.