Permissions Required For Accessing Databricks Workspace with Data Factory Managed Identity
Hello!
Let’s not mince words; managing secrets via Key Vault is useful but can be painful to refresh, espeicially if you’re unable to automate the process of refreshing and updating secret values. And in deference to this I like to use Managed Identities where I can. Managed Identities are an entry in the Enterprise Application Registry in Azure Active Directory that is linked to a trusted Azure Resource. This resource could be a VM, or in the case that I am writing about today, a Data Factory. The Data Factory is then granted access to another resource (in this case a Databricks Workspace), and then the process of managing the identity is done in the background: no need for a service principal or an entry as an App Registration in Azure Active Directory. This means that we do not need to share client ID’s and secrets. If you would like to read more on Managed Identities and authentication/authorisation in Azure then go and have a read of this article.
Now the potentially niggly bit here can be is granting the access of one resource to another. This is not found in the built-in Azure RBAC role of Contributor, which is typically one granted out to users. And in the announcement from Microsoft last November the “how to” demonstrated that in order for a Data Factory Managed Identity to have access to the Databricks Workspace, you will need to assign the Data Factory the Contributor role on the Workspace. And unfortunately I did not have the means to assign this role.
I really wanted to use a Managed Identity, because then I don’t have to deal with managing Key Vault secrets in a deployment pipeline. Not that that is a bad thing, but I’m just done with having to deal with secret management when there is a better way.
Amazingly, lurking in the comments of the “how to” is a comment that states that you can use the SCIM API to grant the Managed Identity access to the Workspace without having to use the RBAC role. So by using terraform I can add the Data Factory Managed Identity to the Workspace as an admin. All I’d need is the application ID of the Managed Identity from the Enterprise Application Registry. Neat!
So now my Data Factory can connect to both a Storage Account and a Databricks Workspace using it’s Managed Identity and I don’t have to deal with secrets expiring when I’m deploying my application.