Pages

Wednesday, November 8, 2023

Behind the scenes: Terraform's Deletion and the Mysterious Auto-Restoration of Azure Ad Enterprise Apps

Context:

A few weeks ago, an unexpected situation unfolded in one of the customer's production environment. It all started when a member of their team decided to pull the trigger on "terraform destroy" command. 

Their intention was to remove a specific app registration from Azure AD that they deployed with a Terraform package, however, little did they know that this (un-tested) action would set off a chain of events that none could have predicted, as the command ended up deleting several other Microsoft's first party enterprise applications from the Azure tenancy. 

Issue:

This activity left the production environment in chaos. You might be wondering why this matters? because, those first-party MS enterprise applications are the backbone of many services within the Azure AD tenant. Their sudden deletion created disruptions throughout the environment, affecting numerous other apps and programs that could be relying on those apps.

Troubleshooting & terraform quirk:

After things calmed down and everyone understood the consequences, the team started looking into what happened. Our mission was clear: discover the mystery of why the "destroy" command had such far-reaching consequences, affecting not only the intended target but also other critical apps of the Azure AD environment.

While the production environment was being restored back to the normal state by manually restoring those deleted enterprise apps, we decided to re-create the scenario in a safe demo tenant of Microsoft. Our experiment worked flawlessly, confirming the destructive behavior, however, it didn't resolve our current issue and answered "why".

Later, we stumbled upon an issue reported in Terraform' s Azure AD provider that highlights the behavior of the destroy command and appears to be a bug. If you have used the setting "use_existing=true" in your terraform code as shown in an example below to set the linkage between your Azure AD app and other SPNs, the destroy command goes  a rampage, deleting not only your app but also every relying linked SPNs it could find, regardless of their origin. E.g. even Microsoft's first-party enterprise applications in this case such as SharePoint Online, Exchange Online, Intune, and MS Teams. 

resource "azuread_service_principal" "sharepoint" {
  application_id = data.azuread_application_published_app_ids.well_known.result.Office365SharePointOnline
  use_existing   = true
}

With that, it explains the question, "why".

Auto-restore Puzzle:

While we managed to re-create the scenario in one of the internal demo tenancy, we stumbled upon a surprising observation as some of the enterprise apps we believed were gone for good started reappearing (without any manual restore operations), and it left us intrigued.

After running short of making guesses, we ended up reaching out to Azure support for answers and they confirmed the existence of automatic restoration and explained its unique behavior. 

When a user accesses services like MS Teams, Exchange, SharePoint, or OneDrive in the tenancy, it triggers Microsoft's underlying services to use the first-party enterprise apps. If it detects any of these apps missing or deleted, it performs the automatic restore of missing apps.

While this is generally the case for most apps, there are a few that don't follow the rule. E.g. Microsoft Intune API, it didn't want to join in this magical recovery process and who knows, there might be more apps with similar behavior hiding within Azure AD's depths. 

Learnings:

So, what's the moral of this story here? well, the lesson is simple. Before running any commands in your production environment, especially when the command name sounds a little scary, think twice or thrice. 

This post also attempts to uncover the myth surrounding this undocumented automatic restoration behavior of Azure AD enterprise apps. Hope it helps someone who is equally surprised to see their enterprise apps automatically restoring without any manual updates.


Friday, September 29, 2023

Using Azure Function's Managed Identity for Service Bus Output Bindings

This short post is intended to share experiences while working on the following scenario:

  • Azure function with managed identity
  • Output bindings configured for service bus queue

Although this may seem straightforward, we encountered some issues in making it work. The difficulties stemmed from the lack of clear documentation on this topic and the dependence on the package extension version used in the solution for service bus connectivity.

While it's commonly understood that a function app needs to define the connection string to the service bus, and this works well when the connection string contains the service bus's secret keys, questions arise when the function app needs to utilize its managed identity for communication with the service bus.

What's the solution?
In summary, sharing the required format for the connection string, which should be present either in your local.settings.json file or in the application configuration of the function app in the Azure portal.


Key takeaways:
  • Note the double underscore in the setting's name; the suffix, 'fullyQualifiedNamespace,' signifies that the function app should utilize its managed identity for communication with the service bus. 
  • When you define the setting in the above format, there's no need to specify the connection property in the output bindings of the attribute in your code. By default, the runtime will search for the connection string using this name i.e."AzureWebJobsServiceBus".
  • If you have the connection name property initialized in your code, the key's name will change to 'yourconnectionnameusedincode__fullyQualifiedNamespace'

For additional reference, you can visit https://learn.microsoft.com/en-us/azure/azure-functions/functions-identity-based-connections-tutorial-2#connect-to-service-bus-in-your-function-app however, please note that the documentation can be somewhat tricky to understand and implement, which is why this post exists.
 
Hope this helps someone! 

Sunday, March 12, 2023

Managing Azure VMs / Arc enabled server configuration drifts made easy with Azure Auto-Manage - Part-1

Managing servers across different environments may be complex, especially when dealing with a range of configurations, security rules, and compliance needs. Thankfully, Azure Arc provides a uniform management experience for hybrid and multi-cloud environments, allowing you to scale the deployment, administration, and governance of servers. 

To further simplify configuration management, you can use Desired State Configuration (DSC) packages to define and enforce the desired state of your server infrastructure and one of the recent offering of Microsoft Azure i.e. Azure auto-manage could help you do it in an efficient way. 

What is Azure Auto-manage?

Azure Auto manage is a Microsoft service that helps to simplify cloud server management. You may use Auto manage to automate routine management tasks for your virtual machines across various subscriptions and locations, such as security updates, backups, and monitoring. This keeps your servers up to current, secure, and optimized for performance without having you to spend a lot of time and effort on manual operations. You can read more about it here - https://learn.microsoft.com/en-us/azure/automanage/overview-about

Note that this blog assumes that you have a basic knowledge about Azure and Desired State Configuration (DSC) in general. If you are not familiar with these technologies, it is recommended that you brush up your skills by going through the official Microsoft documentation at https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/dsc-overview. This will help you better understand the concepts and features discussed in this blog and make the most of the Azure and DSC capabilities.

This two-part blog series will center around the following processes in its first post:

  • Know your pre-requisites 
  • Creating and compiling the DSC file
  • Generating configuration package artifacts
  • Validating the package locally on the authoring machine and check compliance status
  • Deployment options for the package

Scenario: To keep things simple and easy to understand, this post will not create a complicated real-world scenario. Instead, it will use a simple DSC script that ensures that the server always maintains a text file on its disk drive as an example. This will be the state to maintain throughout the post. However, it's important to note that there are no restrictions on referring to this concept and extending your implementation by introducing your own logic in your PowerShell scripts or using different DSC providers, such as registry keys.

The illustration below can be used to visualize the entire implementation workflow, and the following steps will provide a detailed explanation of each.



The following example and steps have been executed and tested on a Windows operating system configuration, however you could use the Linux OS too but a few steps and command might vary. 

Pre-requisites 
Before proceeding with the sample script and implementing the scenario, it is crucial to configure the machine and confirm that all necessary prerequisites are installed on it. This machine is typically known as an authoring environment.

Here are some key artifacts you would need to have in the authoring environment
  • Windows or Ubuntu OS
  • Powershell v7.1.3 or above
  • GuestConfiguration PS Module
  • PSDesiredStateConfiguration Module

Create and Compile DSC script:
Given this context and scenario, let's examine what the DSC script file would resemble.


The script above imports the PSDscResources module, which is necessary for the proper compilation and generation of the .mof file as a result of running the DSC script.

I have observed that individuals with limited experience in DSC often become confused after preparing their DSC script and are uncertain about how to compile it to produce the output, which is the .MOF file. 

To compile the DSC script and generate the .MOF file, you can follow these steps: Open the PowerShell console (preferably PS 7 or higher), navigate to the directory where the DSC file is saved on your local authoring environment, and then execute the .ps1 DSC file.

What is a MOF file?
The .MOF file generated after compiling DSC (Desired State Configuration) is a binary file that contains the configuration data and metadata necessary to apply the desired configuration to a target node. The MOF file is consumed by the Local Configuration Manager (LCM) on the target node to ensure that the system is configured to the desired state.

Generate configuration package artifacts:
After generating the .MOF file, the next step is to create a configuration package artifacts from it. This requires running specific commands to achieve it and as a result, the artifacts are bundled as a .zip file. 

You can run the command below in your authoring environment

Please be advised that there are several command parameters that you should be familiar with. You can refer to the official documentation for a more detailed understanding of these parameters. However, the most critical parameter is the "Type" parameter, which can accept two values: "Audit" and "AuditandSet".

The value names themselves suggest the action that the LCM (Local Configuration Manager) would take once the deployment artifacts are produced. If you create the package with the "Audit" mode, the LCM will simply report the status of the machine if it deviates from the desired state. On the other hand, creating a package with the "AuditandSet" mode will ensure that the machine is brought back to the desired state according to the DSC configuration you have created.

The .zip file will be produced in the directory where your PowerShell console location is currently set. If you are interested in examining the contents of the zip file, you will find the package artifacts similar to the following.


The "modules" directory encompasses all the essential modules needed to execute the DSC configuration on the target machine once LCM is triggered. Additionally, the "metaconfig.json" file specifies the version of your package and the Type, as previously discussed in this post. The presence of the version attribute in this file indicates that you can maintain multiple versions of your packages, and these can be incremental as you continue making changes to your actual DSC configuration PowerShell files.

Validation and compliance check:

After generating the package, the subsequent step involves validating it by running it locally in the authoring environment to ensure that it can perform as expected when deployed to the target machines.

Typically, this is a two-step process where the first step involves checking the compliance of the machine, followed by running the actual remediation.

As mentioned earlier, the second command takes into account the Type parameter value present in the metaconfig.json. This implies that if the package is designed solely for auditing the status, the remediation script will not attempt to bring the machine to the desired state. Instead, it will only report it as non-compliant.

Deployment options:

Before deploying the package to your target workloads, there are a few things you should keep in mind. Firstly, the package should be easily accessible during deployment so that the deploying entity can read and access it. Secondly, you should ensure the presence of the guest configuration extension to enable guest configurations on your target Windows VMs. Additionally, make sure that the target servers have managed identities created and associated.

To ensure that the package is accessible, one option is to upload it to a central location, such as Azure Storage. You can choose to store it securely and grant access to it using shared access signatures. In the next part, we will explore how to access it during the deployment steps. Optionally, you can also choose to sign the package with your own certificate so that the target workload can verify it. However, ensure that the certificate is installed on the target server before starting the deployment of the package to it.

Regarding the second point mentioned above, i.e., ensuring that the target workloads (Azure VMs or Arc enabled servers) have their managed identities, a recommended best practice is to use Azure policy / initiative and assign it to the scope where your workloads are hosted. This policy ensures that all the prerequisites for creating the package deployment, such as performing guest assignments, are correctly met.

Here is the initiative that I have used in my environment and as you can see it contains 4 policies in total that would ensure all the requirements are met before you deploy the package.

Guest Assignment:

After uploading the package zip to the accessible location and assigning the initiative to target workloads, the final step would be to deploy the package. Azure provides a dedicated resource provider, known as the Guest Configuration provider, to assist you in this process through the guest configuration assignment resource. You can read more about it here https://learn.microsoft.com/en-us/rest/api/guestconfiguration/guest-configuration-assignments

You can also access all the guest configuration assignments through Azure portal through a dedicated blade i.e. Guest Assignments  

As the Guest Configuration resource is an Azure resource, you can deploy it as an extension to new or existing virtual machines. You can even integrate it into your existing Infrastructure as Code (IaC) repository for deployment via your DevOps process, or deploy it manually. Additionally, it supports both ARM and Bicep modes of deployment.

As an example - here is what the bicep template of this resource looks like 

While deploying the Guest Configuration resource manually or via DevOps can work, it's recommended to use Azure Policies to ensure proper governance in the environment. This ensures that both existing and new workloads are well-managed and monitored against the configurations defined in the DSC file. In the next post, we will discuss this in detail and leverage a custom Azure Policy definition to create the Guest Assignment resource. We will also explore the various configuration options available.

As we bid adieu to this blog post, let's remember to keep the coding flame burning and the learning spirit alive! Stay tuned for the part 2, where we shall delve deeper into the exciting world of Azure policies and custom policy definitions.

Monday, February 6, 2023

Building a Terraform template to securely push application telemetry to App insights workspace bypassing local authentication

Azure Application Insights workspace is a cloud-based service for monitoring and analyzing the performance of applications. It provides real-time insights into the application's behavior, such as request and response times, user behavior, and error rates. 

In the past, Azure Application Insights was primarily used programmatically through its web APIs or various SDKs by providing an instrumentation key. This instrumentation key was required to interact with the platform and extract insights about the application's performance or query the data stored in it. However, this experience was limited because it lacked native identity authentication, making it challenging to secure the instrumentation key. Developers had to take extra precautions to secure the key and store it, which added an additional overhead to the development process. This absence of native identity identification made the workplace open to possible security breaches and unauthorized access to data.

Recently, Microsoft has made significant changes to Azure Application Insights Workspace to support Azure Active Directory (Azure AD) authentication. This has enabled developers to opt-out of local authentication and use Managed Identities instead. 

By using Managed Identities, telemetry data can be exclusively authenticated using Azure AD, providing a more secure and streamlined way of interacting with the platform. With this change, developers no longer need to worry about managing and storing the instrumentation key securely, as the authentication is handled by Azure AD. This improves the security of the telemetry data and reduces the overhead associated with managing authentication credentials. 

This blog post assumes that the reader has a basic understanding of the Azure Active Directory integration enablement for Azure Application Insights Workspace. If not, it will be recommended that you do the reading on MS learn and know details of it and also take a look at feature pre-requisites. 

The focus of this blog post is on how to configure Azure AD integration using a Terraform template and validate it using a sample .NET web API that talks to the Application Insights Workspace securely using its managed identity when deployed on an Azure Web App. 

Let's take a look at what a terraform template looks like that is responsible for deploying below resources

  • Resource group.
  • App service plan.
  • Web app with it's system assigned managed identity.
  • Log analytics workspace along with app insight resource.
  • Role assignment to grant required permission to the web app's managed identity on the app insights resource.

There are a few key points that need to be focused on.  Firstly, the flag "local_authentication_disabled" must be set to "true" in the Application Insights configuration. This disables local authentication and enables the use of Azure AD for authentication. Secondly, the Azure resource role "Monitoring Metrics Publisher" is a pre-requisite for communication between the telemetry publisher and the Application Insights Workspace. This role must be assigned to the managed identity of the web app resource in order for it to be able to communicate with the Application Insights resource. 

Focusing on these two points will ensure that the Terraform template is set up correctly and the web app is able to communicate with the Application Insights securely using Azure AD authentication.

Now that the Terraform template for configuring Azure AD integration has been discussed, it's time to focus on verifying the setup. The easiest way to do this is to write a sample web API code and deploy it to the Azure Web App resource that was provisioned in the previous step. This will allow us to see if the telemetry data starts flowing to the Application Insights Workspace. 

For this post, a .NET 6 web API project with VS 2022 will be created with minimal code that configures the connectivity between the web app resource and the Application Insights. This project will be deployed to the web app and the telemetry data will be monitored in the Application Insights Workspace to confirm that the integration has been set up correctly.

Here is how the Program.cs of web api could look like - hard-coding in it for brevity
Also, note that in order to integrate the AAD-based authenitcation in your source code, it is important to refer to the correct version of SDKs and for that reason, you might need to install the Application Insights .NET SDK starting with version 2.18-Beta3

Two important points to mention from the sample code above i.e. Firstly, the use of the "ManagedIdentityCredential" provider to perform authentication using the managed identity. This allows the web API to communicate with the Application Insights Workspace securely using Azure AD authentication. Secondly, the connection string contains the instrumentation key and ingestion endpoint.

At this point, it may seem counterintuitive that the instrumentation key is still being used despite the goal being to not specify it. However, the instrumentation key is still required for configuring the connection between the web API and the Application Insights Workspace. The reason the instrumentation key is still used is because it acts as a identifier for the Application Insights Workspace, and allows the "ManagedIdentityCredential" provider to reach the correct resource. The provider uses the instrumentation key to establish the connection between the web API and the Application Insights Workspace.

It is important to note that, since local authentication is disabled in the Application Insights, only Azure AD objects such as managed identities can successfully authenticate to it and the "Monitoring Metrics Publisher" role must be granted to the managed identity in order to allow it to communicate with the Application Insights Workspace.

With this setup in place, you should be ready to start seeing telemetry data from your application in the Application Insights Workspace.

In summary, when working with local authentication disabled in Application Insights Workspace, it is essential to use the "Monitoring Metrics Publisher" role in addition to the instrumentation key in order to publish telemetry data and by following this setup, you can ensure that your telemetry data is securely sent to the correct Application Insights Workspace, while taking advantage of the enhanced security and ease of use provided by Azure AD authentication and managed identities.