Managing Secrets in Azure Databricks
Written by: Chris Sutcliffe
In most cases when connecting to external data sources, authentication is required. Since we all know it’s not best practice to hardcode credentials in your code, in this blog we discuss the ways you can secure your credentials, control access to these and then reference them in your notebooks.
There are two options in Databricks, either use a Databricks backed secret or an Azure Key Vault backed secret.
In this blog, we’ll begin with the Azure Key Vault backed secret.
Step 1: Azure Key Vault
If you already have a key vault, you can skip straight to Step 2. If not, follow the steps below…
Login to your Azure Portal and select Create a Resource
Select your subscription and resource group, then enter a unique Key vault name. Finally choose your region and the pricing tier
Click Review + create and if it passes validation, click Create.
The next step is to create secrets for your data sources credentials, one for the username and one for the password. In your new Key Vault, select Secrets and choose Generate/Import.
Enter an appropriate name and the value, e.g. Name: username Value: db_reader, then click Create. Repeat this step for the password.
Next you will need to get the DNS name of your key vault and the Resource ID . To do this, choose Properties and copy the DNS name and Resource ID and paste into notepad to use in the next step.
Step 2: Create a Secret Scope
Currently there is no direct access via the GUI to create a secret scope, so you have to navigate to this page by adding “secrets/createScope” to the URL e.g. https://<region>.azuredatabricks.net/?o=XXXXXXXXXXXXXXXX#secrets/createScope
Type in an appropriate scope name and choose a manage principal from the drop down list. This indicates which users have the MANAGE permission on this. This allows users to read and write to this scope.
In the DNS Name and Resource ID dialog box, paste in the values you copied from Step 1, then click Create.
NOTE: Currently there it is not possible to manage the secret scope in the portal once you have created one, to manage it, you need to use the Databricks CLI. To do this, see: Installing the Databricks CLI (coming soon).
Step 3: Managing your Secret Scope
Using the Databricks CLI, you can first list the current list of principals by using the following command:
databricks secrets list-acls --scope <secret scope name>
Which will output something like this:
Principal Permission -------------------------- ------------ MANAGE username@domain.com
To add principals by using the command below:
databricks secrets put-acl --scope <secret scope name> --principal <username@domain.com> --permission READ
You can then run the list command again to check that the new principal has been added and verify they have the correct access.
NOTE:
MANAGE - Allowed to change ACLs, and read and write to this secret scope.
WRITE - Allowed to read and write to this secret scope.
READ - Allowed to read this secret scope and list what secrets are available.
Step 4: Referencing the secret in your Databricks notebook
In your Databricks notebook, using the Databricks Utilities, using the following commands you can then refer to your secrets stored in Azure Key Vault, using the Databricks Secret Scope. Just replace the appropriate values with your own:
These variables can then be used in your external data source connection strings!
Summary
In this blog we walked you through the steps involved in storing your credentials safely in your Azure Key Vault and then being able to use these in your Databricks notebooks.
At BizOne, our consultants are experts in implementing solutions on Azure and Azure Databricks. If you're interested in hearing more about how we can help your organization, contact us below for a free demo!