Back when I first started working with Azure Databricks I used bearertokens to authenticate both on my dev machine and in Azure DevOps pipelines. It wasn’t especially challenging to add a masked variable containing a bearertoken that never expired to a pipeline and then pass that in. However over the past 12 months or so things have matured to the point where we really should use premium workspaces, because on the standard tier workspaces everyone is an admin and that’s the only permissions level. So if you want to do things properly and secure your workspace you need to pay a premium, literally.
Anyway, the point about premium workspaces and bearertokens is that the bearertoken being used would have to come from an account that is an admin, as really there should be no issue in an admin account being used to deploy, and this is true for most types of deployments irrespective of technologies. So if you use a Service Principal you can then create a Service Connection in Azure DevOps with the credentials of the SP, and then use one of the tasks that use that Service Connection to connect to Azure.
Now I say “one of those tasks” because initially I thought of using an Azure Powershell task. However because I need access to the objectId, secret and tenantID to authenticate (more on this later) what I also really wanted to do was to not have to configure these extra parameters for authentication: I didn’t want to have to use masked variables in my pipeline. I hoped that I should be able to access these values from the Azure Context, as the PowerShell script would be connected to Azure because we were using a Service Connection. This makes the setup far more convenient than the bearertoken method.
Sadly these values cannot appear to be accessed when using an Azure Powershell task. But you can with the Azure CLI task. If you set
true then this “adds service principal id and key of the Azure endpoint you chose to the script’s execution environment. You can use these variables: $env:servicePrincipalId, $env:servicePrincipalKey and $env:tenantId in your script.” Handy!
Using azure.databricks.cicd.tools we can pass everything we need except DatabricksOrgId from the envvars toauthenticate.
- task: [email protected] inputs: azureSubscription: 'bzzzt Dev Service Principal' ScriptType: pscore scriptLocation: inlineScript inlineScript: | Install-Module azure.databricks.cicd.tools Connect-Databricks -ApplicationId $($env:servicePrincipalId) -Secret $($env:servicePrincipalKey) -TenantId $($env:tenantId) -DatabricksOrgId 1234567 -Region NorthEurope -Verbose Get-DatabricksSparkVersions addSpnToEnvironment: true
But how does it authenticate in the first place? The Service Principal authentication uses the app id and secret of the SP to authenticate with Azure Active Directory. The response includes an access token, which then can be used to authenticate with the databricks APIs (Think of the Databricks API’s as a collection of APIs, one for jobs, clusters, notebooks, secrets etc etc).
However, this access token cannot be used with the secrets API, only bearertokens will work. There is no explanation why this is the case, but maybe because it is an Azure Databricks-backed secret, authentication must take place with a bearertoken generated by/on the workspace. So, what you have to do is use the access token to authenticate with the Databricks Token API. This creates a bearertoken for the SP which can then be used to authenticate with the Secrets API.