Practical aspects of running a CMDB for Azure resources: Fundamentals

Keeping track of your IT resources is not an easy task, especially in enterprise-scale environments with hundreds of business applications, thousands of services, and myriads of dependencies among all of them. The problem gets even worse when speaking of the cloud – the flexibility of cloud services can be a curse to anyone who tries to maintain the infrastructure structured and organized.

As an Azure architect, I’m often involved in projects where customers struggle to manage their cloud services according to their expectations. Here are just a few complaints I hear a lot of times:

Nobody knows what application some cloud resources belong to.
It is unclear who owns what in a cloud environment; responsibilities are blurred or completely undefined.
Auditing and compliance reporting are problematic because the requirements for data security, service availability, and disaster recovery are not mapped to the used cloud services.
The operational cost per application in Azure is unknown, and some workloads can even be obsolete.

Most of that complaints originate from a lack of proper configuration management and missing CMDB for cloud resources, which is a crucial component for infrastructure operations and support. However, running a CMDB for cloud services is a challenging task on its own: the advance of deployment automation and the ability to spin up new environments from coded templates in minutes make CMDB information outdated quite fast.

Some ITSM/CMDB vendors try to compete with that challenge and offer automated solutions that crawl your cloud environments like Azure subscriptions and create corresponding configuration items for the discovered resources. Nevertheless, those configuration items should be enriched with additional information to be of use for access, cost, business continuity management, etc. Otherwise, those items have little value for you.

The idea of getting the data into your CMDB in an automated fashion and then updating/linking it manually to resource owners, cost centers, business applications, and so on doesn’t look sustainable and scalable. So, what can we do with that? Here is where Azure tags can help you with that.

Resource tagging

In their essence, Azure tags are just simple key/value pairs, or, in other words, labels that can be put on your resources. Despite being such a simple concept as adding metadata to your resources, the tags are a really powerful and useful tool. Basically, they can keep all the information needed for your configuration management processes alongside your cloud resources, enabling the most sophisticated automation and integration scenarios. Let’s look at some suggested tags from the perspective of the problem statements I presented in the beginning.

Application

Surprisingly, you don’t need any cloud resources themselves. What you really need is the value it brings to your application or service and how it improves its overall value to the business. When you go to the Azure portal and check resource properties, they tell you nothing about why a specific resource was deployed and whether you still need it or not as business value is not something the machine can decide on.

First of all, it would make sense to establish a connection between your application/service as a logical entity and the infrastructure (cloud) resource it depends on. Although in relatively small and simple environments, those connections can be created and maintained manually, in enterprise-scale infrastructures with hundreds of business applications and thousands of infrastructure resources, you had better invest enough time and resources in setting up an Application Performance Monitoring (APM) solution that can discover resource dependencies and map them to your apps. For example, Azure-native Application Insights can create an application map (aka service map) for your application so that you can visually analyze what infrastructure resources the app actually consists of.

Secondly, as you might know, before the Azure Resource Manager deployment model was introduced, each (classic) resource deployed in Azure existed independently, and there was no native way to group related resources. Fortunately, in the ARM model, the Azure product team introduced a very helpful abstract of resource groups to logically group all resources related to specific applications or services. So, if I were you, I would review your deployment patterns and make good use of resource groups.

Lastly, to improve user experience and provide some anchors for CMDB processes, I suggest adding Application Name and Service Map tags to all your cloud resources. The former should clearly indicate what (business) application a resource belongs to, while the latter should link to the app/service map. Ideally, that link should refer to a dynamic dashboard in the APM solution of your choice. At the minimum, it should point to a documentation page with a diagram explaining your app design. Also, it wouldn’t heart to label Azure resources with a Documentation tag containing a link to your app’s documentation or wiki site.

Ownership and responsibility

The Owner tag mostly represents the high-level primary contact in an organization for all inquiries regarding the labeled resource. It can be an application, system, or service owner, depending on the kind of workload you have in operation. Even if you miss all other information about a resource, you still have somebody to talk to and clarify them. However, in many organizations, such an application owner can be a high-level manager or a non-technical person. In that case, you might look into having a supplementary contact, like a Technical Contact tag, that is responsible for all technical matters regarding the resource.

When choosing the format for those tags, it is best to use values that clearly define your organization’s user account entities. From a practical perspective, a user email address or UPN works best as you don’t have to take an additional step and look for a user email somewhere when you need to contact that person.

The key thing to pay attention to when working with the tags representing contact information is that the tag value is just string data, and it is not connected to the user account in any way. So, to keep that information up to date, you should also set up and possibly automate processes to track any changes in user employment in your organization, like transferring to another position or dismissal. As soon as it happens, you should clarify the new resource contact and update the tags accordingly. Otherwise, you might end up with just a bunch of non-existing user emails and lots of orphan resources.

It is also worth adding additional metadata about the department or Business Unit the resources belong to. It can significantly simplify different audits, inventory, and other management processes. Apart from that, such information can also be useful for ownership disputes when the previous owner is not in charge anymore, and you need to identify a new one.

Security, availability, and recovery

Many organizations operate in regulated industries or have internal security policies with strict data processing requirements. It is common to have data classification and label all data stored and processed within a company in order to be compliant with such security requirements. From the technical point of view, you can employ a Data Profile tag for your resources to classify them by the sensitivity of data stored/processed, e.g., public, confidential, restricted, internal, etc.

You can manually classify a whole application by the most sensitive data profile it operates or is intended to in simple scenarios. For more complex cases, you might want to look for specialized solutions like Azure SQL Data Discovery & Classification and Azure Security Center.

From the operational perspective, there can be different requirements for supporting business applications or services, aka different SLAs. As a best practice, it is common to define a few service classes like Bronze, Silver, and Gold for your business services so that you don’t have to clarify all support-related information for each application/service individually. So, why don’t you put a Service Class tag on your Azure resources? By having such information in resource metadata, you can automatically apply corresponding service-level targets to the incidents affecting your application and handle them according to predefined routines.

Business criticality is another overlooked aspect of running an application or service. In essence, it usually represents the aggregated value for such time-specific objectives as recovery time (RTO) and recovery point (RPO). Basically, having them defined for your criticality levels allows you to put Criticality (aka Disaster Recovery) tags on your cloud resources and implement auditing controls to ensure that applications meet the objectives.

Cost and lifecycle

It is common in enterprise-level organizations to use the hierarchy of cost centers for financial control and accountability. Cloud resources, from a financial perspective, have their operational costs, which are often required to be charged to the respective cost centers. Consequently, it would be helpful to assign a Cost Center tag to each cloud resource you run to be able to filter your cloud consumption data by that tag value.

In addition to the Owner and Technical Contact tags I mentioned, in some cases, you might look for introducing a supplementary Budget Approver tag if application and budget ownership are segregated between different people or units. In that case, if you need to introduce changes that impact cloud resource costs, you know whom to contact for approval.

As cloud resources can generate costs only when they exist, it also might make sense to label them with a Planned Decommission Date tag. For example, it can be the project end date for project-related workloads, while for long-living services, it can be the next inventory checkup date. Of course, such information can be somewhat redundant if you have well-established processes for managing your applications and tracking their cloud resource dependencies. Nevertheless, it is still worth knowing that you should turn off the resources that are not in use anymore to save on them.

What about non-Azure resources like on-premise servers or AWS EC2 instances, you might ask? Well, with the release of Azure Arc, you can now onboard the workloads hosted outside of Azure for configuration management and monitoring by Azure means. So, the same approach for resource tagging can be applied to them too.

In conclusion

In the end, regardless of what CMDB vendors promise you in their whitepapers, you should understand that there is no magic solution for Azure, or any other cloud, inventory. As you might have already noticed from the tag’s explanations, mostly the challenge of running a CMDB for Azure resources has little to do with the cloud itself. Apart from the technical tool, you must have proper configuration management processes and people who can ensure their execution.

Designing such a solution for your cloud environment is not a trivial task because the optimal solution will vary depending on organization maturity, business needs, and regulatory requirements. Certainly, there are some baselines and best practices, as I mentioned previously, but what is of absolute necessity in an enterprise organization, e.g., cost centers and chargebacks, can be overkill in a small company where it is just enough to know your overall subscription cost.

In the next post in this series, I will share some tips regarding the technical aspects of implementing appropriate resource tagging processes.