Loading…

Virtualisation | Cloud | Strategy

Azure Patch Management

Patch!

There are certain things in the IT world, that are if we’re being frank are unforgivable, chief amongst those is failing to patch the operating systems in your IT estate. So what options do we have for Azure Patch Management?

There is a very small set of circumstances where you might be forgiven this cardinal sin.  Perhaps your running a legacy application that requires an OS that cannot be patched (HMS Windows XP?). In those circumstances, expectation would be that you take comprehensive steps to protect the workloads using alternative methods, network isolation etc..  If you are not dealing with a workload in those circumstances then there is really no excuse for not having a patching regime!

Moving workloads into the cloud is not an exception to this rule.  If anything moving workloads into the cloud means that you should be nailing down these processes and improving compliance to them.  There needs to be measures put in place to make sure that workloads are appropriately managed for patching.  These measures might look very different for IaaS, PaaS and SaaS workloads. However, they do need to be in place and documented.

PaaS and SaaS

When we are dealing with PaaS and SaaS workloads in Azure, patch management isn’t something that the end user needs to manage directly.  Due to the nature of those services the patching is undertaken by Microsoft.

What does need to be taken into consideration, especially in PaaS solutions where you have a higher degree of control, is that any critical solutions are deployed across update and fault domains. If at the time of deployment you fail to design your PaaS solution across the required number of update and fault domains, the SLA that you hold with Microsoft is going to be meaningless.  I should explain those terms briefly:

An update domain is Microsoft’s way of ordering resources in such a way that when resources within an update domain are updated the service provided remains online. Microsoft works with five update domains (0,1,2,3,4). Update domains are for planned events.

A fault domain is a Microsoft configured logical grouping of resources that share a common power source and network switch. By default there are a maximum of three fault domains (0,1,2), this is dependant on the Azure location deployed to. Fault domains are for unplanned events.

Working with fault and update domains your App Services are going to be deployed following the pattern in the table below.

VM Fault Domain Update Domain
AppSrvN1 0 0
AppSrvN2 1 1
AppSrvN3 2 2
AppSrvN4 0 3
AppSrvN5 1 4
AppSrvN6 2 0
AppSrvN7 0 1
AppSrvN8 1 2
AppSrvN9 2 3
AppSrvN10 0 4
AppSrvN11 1 0

In order to gain the benefits of the Microsoft listed SLAs the deigns of you PaaS services will need to take into consideration this pattern.  The minimum number of fault and update domains that the service must be deployed across to leverage the SLA is two.  Any less than that and the outage will be referred back to the user.

IaaS

These configurations also apply when building IaaS services in the form of availability sets.  This is great if you have a service that you can load balance. However, not all legacy IaaS services will be able to take advantage of this, and this is again something that you should consider when planning your deployments.  That legacy app that is pinned to a particular OS, yes you might be able to technically move the service into an IaaS cloud instance… However, it won’t be provided Azure’s 99.95% SLA.  The best offer you’ll get is 99.9% and that’s if you deploy to premium storage tiers.

“For all Virtual Machines that have two or more instances deployed in the same Availability Set, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.

For any Single Instance Virtual Machine using premium storage for all Operating System Disks and Data Disks, we guarantee you will have Virtual Machine Connectivity of at least 99.9%.”

Those SLAs are still pretty good, and I daresay better than most organisations are able to guarantee.  But to get that there are configurations and planning steps that have to be taken.

I digress, however it is important to understand this limitations when planning deployments to Azure.

Microsoft Operation Management Suite (OMS)

Grouped together under the OMS banner Microsoft has release features to assist in the management of your Azure cloud services. Including OS patching and update management.  Azure Patch Management!

This was perceived to be a gap as recently as December 2016.  IT seemed slightly ridiculous to most of us working in Azure that when it came to IaaS services, if you wanted to do patch management properly then you needed to configure a third party solution or indeed look to older Microsoft services such as MBSA and WSUS!

Now included under OMS update management is going to plug this gap, and provide patch management services without a trip to the marketplace.

Quick Update Management Overview

If you are like me, you like to see a diagram detailing how a service works before you look to put it into operation. Azure Patch Management or Update Management is no exception.  The diagram I’ve included below comes from Microsoft and is a view for how the solution assesses and applies updates to all connected Windows Servers.

The solution extends to cover Linux virtual machines as well.  The overview of that is effectively the same diagram replacing the Windows Agent for a Linux agent.

Very Quick OMS Configuration

Configuration of the OMS resources is a little bit buried, I don’t really understand why, the services enabled by OMS are things that Microsoft should be highlighting.

To enable these services we need to create a OMS Workspace. This can be created by browsing to the ‘Log Analytics‘ blade and selecting add to create a new OMS Workspace. Complete the details as needed.

 

To connect already deployed VMs to this workspace browse to the ‘Workspace Data Sources‘ select ‘virtual machines‘ and an IaaS VM.  This will present an option to connect the VM to the workspace. All very intuitive if a little hidden.

It will come as no surprise to find that these monitoring agents can also be deployed via ARM also, you need to provide the OMS Workspace ID and OMS Workspace Key which can be found in the OMS Portal under ‘Settings > Connected Sources.  The ARM template snippets look like the below for a Windows VM;

And like this for a Linux VM;

Once you’ve created this Workspace, you have a choice, you can either continue to access the services via the Azure monitor blades or you can browse to the OMS Portal.  That’s made available from the ‘Log Analytics‘ blade or you can browse to it via the url, which will be ‘https://[workspacename].portal.mms.microsoft.com

The portal provides a familiar look and feel for those who are used to the Azure ARM portal. From within there we can configure all manner of management goodies. Certainly too many to dig into on this post, but perhaps something I’ll pick up on later.

To deploy update management (which is Azure Patch Management) we can browse the ‘Solutions Gallery‘ and select ‘Update Management‘;

The description confirms that this is the service that we have been looking for.  However, it requires some additional services and configurations before we can proceed.  From the text we can see it’s going to require Azure Automation, which if you think about what it’s doing and how it’s working access to this building block should be no surprise.

Lets give the service what it wants.

With Azure Automation accounts in place we can now add the Update Management solution to our OMS portal.  Once that has been configured the Update Management solution will be visible within the OMS portal.

OMS Update Management

When the solution as finished those initial assessments, the link on the OMS portal will change;

As you can see the single VM I’d configured the agent on is missing a security patch. Fortuitous for the purposes of this write up, but annoying as i’d deployed the VM from the marketplace not 2 hours previous!

You might ask where that information is coming from.  This is the first slightly annoying weakness in the solution.  It’s being pulled from the source that the windows server is set to pull updates from.  As my freshly deployed VM is set to pull updates from Microsoft, it’s being validated against that.  If we want to gain any kind of control over patch approval and baselines we’re going to require a WSUS server…  So there is the first trade off, if the solution requires a WSUS server for patch approval and baselines….  why not just manage patching through and IaaS WSUS instance.

The same is going to be true for Linux servers.  They are going to provide a report based against whatever repo the server is pointed at.

Putting that aside…

Digging into the Update management solution provides some useful information, in a format that should be accessible to management and service reporters.

We can configure deployments against individual virtual machines or groups.  To configure groups browse to the search service and use the search to bring up the collection of virtual machines you’d like to group together.  These need to return a distinct list of computers.

Keeping it simple I’ve created a group based upon ‘computer=vs* | distinct computer‘.  This will return a list of all distinct computers (virtual machines) that contain ‘vs‘ in the name.

As you can see, the search is savable to a computer group. This means we can use it as a target for deployment. The deployment schedule configuration should be intuitive enough for me not to cover here.  Optionally schedule one time or recurring deployment schedules.

With the schedule locked in we can sit back and wait for deployment to occur.

Weaknesses

There are still some pretty glaring omissions here in terms of process, that I could just ignore….  but I’m not going to.

Whilst Update Management in OMS gives us Azure Patch Management in the roughest sense, the gaps in the process are still big enough to warrant major concern.

  1. There is no patch approval mechanism.
  2. The deployment operations covers the when, not the what.
  3. It requires an agent per VM.
  4. It still needs WSUS to provide approval and baselines.
  5. Removing machines from the solution is laborious.
  6. It requires Windows Server 2012 or higher.

Now for some environments this might be good enough.  I would suggest those environments are going to be few and far between.  I suspect the majority of us will be sticking to WSUS or other third party management tools. Which is really very annoying.

Simon