Practical aspects of running a CMDB for Azure resources: Fundamentals

Keeping track of your IT resources is not easy, especially in enterprise-scale environments with hundreds of business applications, thousands of services, and myriads of dependencies. The problem gets even worse when speaking of the cloud—the flexibility of cloud services can be a curse to anyone who tries to maintain the infrastructure structured and organized.

As an Azure architect, I’m often involved in projects where customers struggle to manage their cloud services according to their expectations. Here are just a few complaints I hear a lot of times:

Nobody knows what application some cloud resources belong to.
It is unclear who owns what in a cloud environment; responsibilities are blurred or undefined.
Auditing and compliance reporting are problematic because the cloud services used do not map to data security, service availability, and disaster recovery requirements.
The operational cost per application in Azure is unknown, and some workloads can be obsolete.

Most of those complaints originate from a lack of proper configuration management and missing CMDB for cloud resources, which is a crucial component for infrastructure operations and support. However, running a CMDB for cloud services is a challenging task on its own: the advance of deployment automation and the ability to spin up new environments from coded templates in minutes make CMDB information outdated relatively fast.

Some ITSM/CMDB vendors try to compete with that challenge by offering automated solutions that crawl your cloud environments, like Azure subscriptions, and create corresponding configuration items for the discovered resources. Nevertheless, those configuration items should be enriched with additional information for access, cost, business continuity management, etc. Otherwise, they have little value for you.

The idea of getting the data into your CMDB in an automated fashion and then updating/linking it manually to resource owners, cost centers, business applications, and so on doesn’t look sustainable and scalable. So, what can we do with that? Here is where Azure tags can help you with that.

Resource tagging

In essence, Azure tags are just simple key/value pairs, or, in other words, labels that can be put on your resources. Despite being such a simple concept as adding metadata to your resources, tags are a really powerful and valuable tool. Basically, they can store all the information needed for your configuration management processes alongside your cloud resources, enabling the most sophisticated automation and integration scenarios. Let’s look at some suggested tags from the perspective of the problem statements I presented initially.

Application

Surprisingly, you don’t need any cloud resources themselves. What you really need is the value they bring to your application or service and how they improve its overall value to the business. When you go to the Azure portal and check resource properties, they tell you nothing about why a specific resource was deployed and whether you still need it or not, as business value is not something the machine can decide on.

First of all, it would make sense to establish a connection between your application/service as a logical entity and the infrastructure (cloud) resource on which it depends. Although in relatively small and straightforward environments, those connections can be created and maintained manually, in enterprise-scale infrastructures with hundreds of business applications and thousands of infrastructure resources, you had better invest enough time and resources in setting up an Application Performance Monitoring (APM) solution that can discover resource dependencies and map them to your apps. For example, Azure-native Application Insights can create an application map (aka service map) for your application so that you can visually analyze what infrastructure resources the app consists of.

Secondly, as you might know, before the Azure Resource Manager deployment model was introduced, each (classic) resource deployed in Azure existed independently, and there was no native way to group related resources. Fortunately, in the ARM model, the Azure product team introduced a helpful abstract of resource groups to group all resources related to specific applications or services logically. So, if I were you, I would review your deployment patterns and make good use of resource groups.

Lastly, to improve user experience and provide some anchors for CMDB processes, I suggest adding Application Name and Service Map tags to all your cloud resources. The former should clearly indicate what (business) application a resource belongs to, while the latter should link to the app/service map. Ideally, that link should refer to a dynamic dashboard in the APM solution of your choice. At the minimum, it should point to a documentation page with a diagram explaining your app design. Also, it wouldn’t hurt to label Azure resources with a Documentation tag containing a link to your app’s documentation or wiki site.

Ownership and responsibility

The Owner tag mainly represents the high-level primary contact in an organization for all inquiries regarding the labeled resource. It can be an application, system, or service owner, depending on the kind of workload you have in operation. Even if you miss all other information about a resource, you still have somebody to talk to and clarify them. However, in many organizations, an application owner can be a high-level manager or a non-technical person. In that case, you might look into having a supplementary contact, like a Technical Contact tag, responsible for all technical matters regarding the resource.

When choosing the format for those tags, it is best to use values that clearly define your organization’s user account entities. From a practical perspective, a user email address or UPN works best, as you don’t have to take another step and look for a user email somewhere when you need to contact that person.

The key thing to pay attention to when working with the tags representing contact information is that the tag value is just string data and is not connected to the user account in any way. So, to keep that information up to date, you should also set up and possibly automate processes to track any changes in user employment in your organization, like transferring to another position or dismissal. You should clarify the new resource contact and update the tags accordingly as soon as it happens. Otherwise, you might end up with just a bunch of non-existing user emails and many orphan resources.

It is also worth adding additional metadata about the department or Business Unit to which the resources belong. That can significantly simplify different audits, inventory, and other management processes. Such information can also be helpful for ownership disputes when the previous owner is no longer in charge, and you need to identify a new one.

Security, availability, and recovery

Many organizations operate in regulated industries or have internal security policies with strict data processing requirements. It is common to have data classified and labeled all data stored and processed within a company to be compliant with such security requirements. From the technical point of view, you can employ a Data Profile tag for your resources to classify them by the sensitivity of data stored/processed, e.g., public, confidential, restricted, internal, etc.

In simple scenarios, you can manually classify a whole application by the most sensitive data profile it operates or is intended to in. You might look for specialized solutions like Azure SQL Data Discovery & Classification and Azure Security Center for more complex cases.

From the operational perspective, there can be different requirements for supporting business applications or services, aka different SLAs. As a best practice, defining a few service classes like Bronze, Silver, and Gold for your business services is standard so that you don’t have to clarify all support-related information for each application/service individually. So, why don’t you put a Service Class tag on your Azure resources? With such information in resource metadata, you can automatically apply corresponding service-level targets to the incidents affecting your application and handle them according to predefined routines.

Business criticality is another overlooked aspect of running an application or service. Essentially, it usually represents the aggregated value for such time-specific objectives as recovery time (RTO) and recovery point (RPO). Having criticality levels defined allows you to put Criticality (aka Disaster Recovery) tags on your cloud resources and implement auditing controls to ensure that applications meet the objectives.

Cost and lifecycle

It is common in enterprise-level organizations to use the hierarchy of cost centers for financial control and accountability. From a financial perspective, cloud resources have operational costs, which are often required to be charged to the respective cost centers. Consequently, it would be helpful to assign a Cost Center tag to each cloud resource you run so that you can filter your cloud consumption data by that tag value.

In addition to the Owner and Technical Contact tags I mentioned, you might consider introducing a supplementary Budget Approver tag if application and budget ownership are segregated between different people or units. In that case, if you need to introduce changes that impact cloud resource costs, you know whom to contact for approval.

As cloud resources can generate costs only when they exist, it also might make sense to label them with a Planned Decommission Date tag. For example, it can be the project end date for project-related workloads, while it can be the next inventory checkup date for long-living services. Of course, such information can be somewhat redundant if you have well-established processes for managing your applications and tracking their cloud resource dependencies. Nevertheless, it is still worth knowing that you should turn off the resources that are not in use anymore to save on them.

What about non-Azure resources like on-premise servers or AWS EC2 instances, you might ask? Well, with the release of Azure Arc, you can now onboard the workloads hosted outside of Azure for configuration management and monitoring by Azure means. So, the same approach for resource tagging can be applied to them too.

In conclusion

Regardless of what CMDB vendors promise you in their whitepapers, you should understand that there is no magic solution for Azure or any other cloud inventory. As you might have noticed from the tag’s explanations, the challenge of running a CMDB for Azure resources mostly has little to do with the cloud itself. Apart from the technical tool, you must have proper configuration management processes and people who can ensure their execution.

Designing such a solution for your cloud environment is not trivial because the optimal solution will vary depending on the organization’s maturity, business needs, and regulatory requirements. Indeed, there are some baselines and best practices, as I mentioned previously, but what is of absolute necessity in an enterprise organization, e.g., cost centers and chargebacks, can be overkill in a small company where it is just enough to know your overall subscription cost.

In the next post in this series, I will share some tips regarding the technical aspects of implementing appropriate resource tagging processes.