Terraform in Azure Cloud Shell — a few ways of authentication and how they work

Vitaly Belkin
11 min readMar 21, 2024
Photo by Erik-Jan Leusink on Unsplash

I’ve recently started dabbling in Terraform and just so happens that I have also discovered Azure Cloud Shell as a handy tool to use straight from your web browser. Both in conjunction make it easy to deploy resource groups and/or resources straight into your subscription (or even create whole subscriptions given right permissions) — I’m sure there’s more that I haven’t found yet. And you do it all without the hassle of any additional authentication — Terraform tool is already good to go from the box as soon as you launch the shell in your browser window!

But how does that work behind the scenes? How does it know who to authenticate as? Are there any limitations? I took a stab at the whole setup, read a bunch of docs and here’s my watered-down take of Terraform authentication in Azure Cloud Shell ¯\_(ツ)_/¯

Part 1: Figuring out how the bicycle works

The default way to authenticate Terraform in Azure Cloud Shell is via az CLI. You can check your current account and subscription via ‘az account show’. Azure PowerShell is also available (if you’re using PowerShell terminal and not Bash), but whatever account that module is under - it doesn’t matter for now, since Terraform is not touching it.

Now, at what point does the Terraform go off and authenticate? Not straight away. ‘terraform init’ only gets your working directory ready. So, it must be doing it when you ‘terraform plan’, then? Nah, not always. Let’s set TF_LOG_PROVIDER to DEBUG (since TRACE gives us way too much useless info andTF_LOG_CORE does not give us the stuff we need — we can’t set both at once! Use TF_LOG otherwise) and route logs to somewhere with TF_LOG_PATH

Let’s put together a small and simple main.tf file:

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~>3.0"
}
random = {
source = "hashicorp/random"
version = "~>3.0"
}
}
}

provider "azurerm" {
features {}
}

data "azurerm_resource_group" "vt-rg" {
name = "vt-resource-group"
}

this data block assumes that there is a resource group “vt-resource-group" under a subscription chosen by default in the currently logged in account with az CLI. So, when you run ‘terraform plan’ and Terraform tries to confirm that the object exists via a network API, it should be matching with an object in your Azure infrastructure (there are local-only data sources but we’re not looking at those today).

Okay, let us run ‘terraform init’ and then ‘terraform plan’. The output after the second command should be something like this:

Cool, it found the resource group and confirmed that the main.tf config is already matching what we have in the cloud. But how did it do that? Let’s see some logs:

[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: az-cli invocation: az version -o=json: timestamp=2024-03-19T22:08:49.996Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: az-cli invocation: az account show -o=json: timestamp=2024-03-19T22:08:50.621Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: az-cli invocation: az account get-access-token --scope https://graph.microsoft.com/.default -o=json: timestamp=2024-03-19T22:08:51.398Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Using default subscription ID from Azure CLI: "[REDACTED]": timestamp=2024-03-19T22:08:52.480Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Using client ID from Azure CLI: "[REDACTED]": timestamp=2024-03-19T22:08:52.480Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Generated Provider Correlation Request Id: [REDACTED]: timestamp=2024-03-19T22:08:52.480Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: az-cli invocation: az account get-access-token --scope https://management.azure.com/.default -o=json: timestamp=2024-03-19T22:08:52.840Z

Yea, so it does invoke az CLI and gets an access token to check if our set subscription has a resource group we mentioned in the config.

Naturally, if we try to set the subscription through az CLI to the one we have limited access to (or none) and run ‘terraform plan’ then we’d get an error:

Okay, fair enough, we lack the permissions under newly set subscription for Terraform to check if the resource group is really there. What if we keep the “faulty” subscription but remove the resource from the config?

Huh? It compared against our real infra? But we still lack the permissions as subscription is unchanged. Let’s look at the logs again:

[INFO]  provider: configuring client automatic mTLS
[DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.96.0/linux_amd64/terraform-provider-azurerm_v3.96.0_x5 args=[".terraform/providers/registry.terraform.io/hashicorp/azurerm/3.96.0/linux_amd64/terraform-provider-azurerm_v3.96.0_x5"]
[DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.96.0/linux_amd64/terraform-provider-azurerm_v3.96.0_x5 pid=14255
[DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.96.0/linux_amd64/terraform-provider-azurerm_v3.96.0_x5
[INFO] provider.terraform-provider-azurerm_v3.96.0_x5: configuring server automatic mTLS: timestamp=2024-03-19T23:58:46.489Z
[DEBUG] provider: using plugin: version=5
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: plugin address: address=/tmp/plugin724767084 network=unix timestamp=2024-03-19T23:58:46.903Z
[DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
[DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.96.0/linux_amd64/terraform-provider-azurerm_v3.96.0_x5 pid=14255
[DEBUG] provider: plugin exited
[INFO] provider: configuring client automatic mTLS
[DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/random/3.6.0/linux_amd64/terraform-provider-random_v3.6.0_x5 args=[".terraform/providers/registry.terraform.io/hashicorp/random/3.6.0/linux_amd64/terraform-provider-random_v3.6.0_x5"]
[DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/random/3.6.0/linux_amd64/terraform-provider-random_v3.6.0_x5 pid=14263
[DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/random/3.6.0/linux_amd64/terraform-provider-random_v3.6.0_x5
[INFO] provider.terraform-provider-random_v3.6.0_x5: configuring server automatic mTLS: timestamp=2024-03-19T23:58:47.525Z
[DEBUG] provider: using plugin: version=5
[DEBUG] provider.terraform-provider-random_v3.6.0_x5: plugin address: address=/tmp/plugin1893390677 network=unix timestamp=2024-03-19T23:58:47.555Z
[DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
[DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/random/3.6.0/linux_amd64/terraform-provider-random_v3.6.0_x5 pid=14263
[DEBUG] provider: plugin exited

That’s the whole log file. And this time it didn’t even use az CLI to get in touch with Azure! Maybe there are some other logs that we’re missing, and it does actually authenticate? For the sake of argument, let’s logout of Azure CLI and see what output we get then:

Hmm, what “infrastructure”? No account to use, but still, it tells us that we’re good to go.

Okay, I know, in this instance it is just a matter of proper command-line communication between the tool and the user, and in reality, there’s nothing really wrong with the output, although it may be just a wee bit misleading in my opinion. If I am interpreting it incorrectly, please, let me know :)

So anyway, we figured that, out of the box, Terraform resorts to az CLI authentication when using the Azure Cloud Shell. Alternatively, we could supply a Service Principal (which is a risk by itself as you essentially create redundant object with permissions) and chuck in credentials either in the .tf config itself or in the environment. But there’s also a third way that lets us skip the az CLI authentication altogether — it requires a managed identity set up for the Terraform to use. But we don’t have one set up on Azure Cloud Shell… Do we?

Part 2: Slapping a second handlebar onto the bicycle

This blog post gave me an interesting idea. Given its age, I am not sure how much of it is still relevant, but if my trail of thought is not wrong, essentially, Azure Cloud Shell is a VM — it even has an attached storage. As said in the post, a managed identity endpoint is present in the container. Let’s check it:

Okay, we get an access token without using executing ‘az account get-access-token’. But who’s it for?
Now, lets forget about az CLI for the moment, as we’re going to be talking out of its scope. You can even ‘az logout’ so that it doesn’t confuse you any more than it should. Obviously, the endpoint is tied to the Azure Cloud Shell instance, and no matter what account the az CLI tool is under (or how many times you decide to az login), the token returned will be the same Managed Service Identity that was assigned to the Azure Cloud Shell instance. You can even see some of its traces if you run ‘Get-AzContext’ first time you log in to the Cloud Shell:

This is the account that’s set up there initially. Therefore, the token we receive from localhost:50342 is for that Managed Service Identity which, essentially, is you.
Another interesting thing I’ve noticed is that in the blogpost Edwin mentions how on a real Azure VM the has an instance metadata service endpoint enabled — 169.254.169.254 and if you decide to curl that inside a Cloud Shell you’ll see that it also works — it gives you exactly the same token as the initial endpoint we probed. Maybe there has been some changes behind the scenes over the last few years and Cloud Shell is now much closer to an actual Azure VM instance?

So anyway, we now know that we have an MSI, and Terraform can be used with MSI. We’ll need to set a few more env variables so that Terraform understands what we want it to do. ARM_USE_MSI should be set to true and we also need to point Terraform to the metadata endpoint — we can either give it the MSI_ENDPOINT variable, or manually set ARM_MSI_ENDPOINT to the same value.
Moving on, already know that without any data blocks Terraform only tells us it goes off and checks our infrastructure but doesn’t actually go and do it, so we need to get our data block back in the main.tf file (by this point it should be looking exactly same way as in the beginning). Let’s also add a line use_msi = true to our ARM provider block otherwise Terraform won’t bother checking environment variables for any additional configuration (alternatively you could set the endpoint in the config too).

Let’s try terraform plan again:

So, now it needs us to explicitly specify the subscription ID. Remember, when we were using az CLI it automatically fetched our default subscription, so it must be doing something differently now. If we look at the log file:

[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Performing GET Request to "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com": timestamp=2024-03-20T22:03:35.441Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: GET http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com: timestamp=2024-03-20T22:03:35.441Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Reading Body from GET "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com": timestamp=2024-03-20T22:03:35.492Z
[ERROR] provider.terraform-provider-azurerm_v3.96.0_x5: Response contains error diagnostic: tf_rpc=Configure @caller=github.com/hashicorp/terraform-plugin-go@v0.19.0/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_detail="" tf_proto_version=5.4 diagnostic_summary="building account: unable to configure ResourceManagerAccount: subscription ID could not be determined and was not specified" tf_provider_addr=provider tf_req_id=51964990-93d2-590f-8602-7aad44a253e6 @module=sdk.proto diagnostic_severity=ERROR timestamp=2024-03-20T22:03:35.493Z

Right, so this time it does go to the IMDS endpoint and gets a token. But because the az CLI is not used, it does not know what subscription it should look in for the mentioned resource group. Okay, then we specify the subscription ID (either in the config or in the env it doesn’t matter) and run terraform plan one more time:

And the logs:

[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Performing GET Request to "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com": timestamp=2024-03-20T22:23:04.991Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: GET http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com: timestamp=2024-03-20T22:23:04.991Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Reading Body from GET "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fgraph.microsoft.com": timestamp=2024-03-20T22:23:05.035Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Generated Provider Correlation Request Id: [REDACTED]: timestamp=2024-03-20T22:23:05.035Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Performing GET Request to "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com": timestamp=2024-03-20T22:23:05.360Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: GET http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com: timestamp=2024-03-20T22:23:05.360Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: Reading Body from GET "http://localhost:50342/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com": timestamp=2024-03-20T22:23:05.396Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: AzureRM Request: GET /subscriptions/[REDACTED]/providers?api-version=2022-09-01 HTTP/1.1

You may notice that initially it queries Graph API and gets a token from there before heading off to Azure and grabbing another one. To be honest, I am not quite sure why does it do that, my hunch would be that it has to do something with Microsoft’s OAuth implementation (try curling IMDS endpoint without specifying the resource parameter to see what I mean), but don’t quote me on that. If you try and manually run the curl command with resource parameter set to “graph.microsoft.com” it still returns an access token, so it’s not an authorization workflow either. IMDS is not intended to be used behind a proxy, so I doubt BURP would work here and I can’t really envision the implementation, but you can try it and let me know :)
The laziest thing I could think of is spinning up a python server locally in the Cloud Shell and pointing Terraform to grab a token from there:

import http.server
import socketserver
import json
import logging

PORT = 8910
LOG_FILE = 'server.log'

data = {<response_from_IMDS_endpoint_goes_in_here>}

class Handler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
logger.info(f"Request path: {self.path}")
logger.info(f"Request headers:\n{self.headers}")

self.send_response(200)
self.send_header('Content-type', 'application/json')
self.end_headers()
self.wfile.write(json.dumps(data).encode())

logging.basicConfig(filename=LOG_FILE, level=logging.INFO)
logger = logging.getLogger(__name__)

with socketserver.TCPServer(("", PORT), Handler) as httpd:
print("Serving at port", PORT)
httpd.serve_forever()

I didn’t even bother parsing the parameters here. You could go a bit further and try proxying these requests to the actual IMDS endpoint and then intercept back and see what you get, but I’ll also leave that as an exercise to the reader.
Naturally, ‘terraform plan’ fails now because the second request returns the same token as the first, and when Terraform does GET to Azure REST API it does not like it:

[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: GET https://management.azure.com/subscriptions/[REDACTED]/providers?api-version=2022-09-01: timestamp=2024-03-21T00:37:47.090Z
[DEBUG] provider.terraform-provider-azurerm_v3.96.0_x5: AzureRM Response for https://management.azure.com/subscriptions/[REDACTED]/providers?api-version=2022-09-01: HTTP/2.0 401 Unauthorized

Logs from our little dumb python server don’t give us much insight either:

Wrapping up…

I’m not exactly sure I’d be able to say what was the point of this blog post. What did we learn? Well, we scratched the surface on the Terraform authentication in the Azure Cloud Shell, we figured there are a few ways it can reach to Microsoft’s services — via az CLI that handles all authentication via signed in account, and via IMDS endpoint that essentially servers as a System-assigned Managed Identity which also happens to be you.
Yea, we could also go ahead and look at Service Principle authentication, but there are a few ways of doing it and it might be a better to leave it for a separate writeup. From a security standpoint, it does make sense to implement SP given the right permissions and scope, although we’re left with an issue of Client ID and Secret exposure at the very least anyway. However, do not forget that whatever identity you have in your Azure Cloud Shell — it has your own permissions (and it is actually kind of a Service Principle too).

I’d be glad to hear your thoughts and comments, and maybe there are things I’ve missed or misinterpreted completely — let me know anyways!

P.S.

try running Connect-AzAccount -identity -AccountId <account name> inside the Cloud Shell, and by account name I mean any account name. If you’re curious about what’s happening behind the scenes — PowerShell has a debugging mode :)

--

--