Anyone who has worked in IT Operations long enough knows that many incidents and service outages happen due to recent application configuration changes. What is more, if that application has some SLA, it is common for incident handling routines to explicitly instruct you to check for such changes as one of the first troubleshooting steps.
Despite Azure providing many options for tracking and analyzing resource configuration changes, those tools are spread across different services and sections of the portal. While some of them can be used out-of-the-box, others, like Change analysis in VM insights, Change Tracking in Azure Automation, or saving Activity logs to Log Analytics workspaces, require additional configuration before use.
Last year, Microsoft announced the availability of multiple Change Analysis features. What makes it great is that it uses Azure Resource Graph under the hood, which has an extensive list of SDKs. From the practical perspective, that allows you to pull the data about changes from the graph into many Azure-native monitoring solutions as well as third-party ones. Here I will explore how you can view the history of such configuration changes in Azure Monitor Workbooks.
Pull data about changes in Azure Monitor Workbooks
In Azure Monitor Workbooks, you can use the Azure Resource Graph data source to execute supported KQL queries and display their results in various formats. For example, you can use the following query to pull all changes for the last 14 days scoped to a resource group:
The resulting table might look like the following:
Unfortunately, the Change Analysis data source, which is in preview as of now, scopes its results to a single resource only. That might not be so helpful in cases when a change in another resource causes an alert linked to your target resource. For example, an error in your web app might be caused by a changed or deleted certificate in a Key Vault.
List active alerts in Azure Monitor Workbooks
Obviously, having a list of active alerts in the same workbook alongside the configuration changes would be helpful to correlate between them. Previously, there was a separate data source for pulling the information about Azure Monitor alerts, but now the alert info is available via Azure Resource Graph. The following query will return the list of all active Azure Monitor alerts in the target resource group:s
A sample result of that query:
Another option to get the alert information is to pull it with an Azure Resource Manager query.
Improve troubleshooting experience with Azure Monitor Workbooks
Sometimes you might need to get back to resolved alerts to review and analyze them. You can achieve that by using parameters in Azure Monitor Workbooks.
For example, you can add parameters to display alerts that fired during the last N hours/days and optionally include resolved alerts to your results. After that, you can reference those parameters in your query to make it more dynamic:
Also, you can improve your user experience by formatting the results with colorful icons and using column formatting:
If you need to analyze changes preceding alerts fired beyond the last 14 days, you will need to schedule exporting those logs to some external store like Log Analytics and modify your workbook to pull the change info from a workspace instead of Azure Resource Graph directly.
Because, from the troubleshooting perspective, it makes sense to look into the changes that happened before the alert was fired and not after, you can take an extra step and make the results in your change history view depending on a specific alert you select in your alerts view. For that, you can define a workbook parameter that is populated when you click on a row in your alert grid view and use it in a follow-up change history query to filter the changes that happened earlier than the alert fired:
The improved change history view:
You can also extend your workbook with information about resource and workload health in the target scope, so you can immediately check if an outage is related to resource availability.
Feel free to check my GitHub repository for the complete sample workbook template for correlating between alerts and configuration changes in Azure.
What is your experience analyzing changes that caused application performance or availability issues? What tools do you use the most to review configuration changes in Azure resources? Please share your thoughts in the comments 👇
Member discussion: