Categories
Databricks Integrations

Databricks Mounts: How to Connect Azure Storage to Your Workspace

When you start working with real data in Azure Databricks, one of the first challenges you face is getting your data into the environment. Your data typically lives in Azure storage — a Data Lake or Blob Storage account — and your Databricks notebooks need a way to access it. Mounts are the classic solution to this problem. They create a shortcut inside Databricks that points to your external storage, making cloud storage feel like a local directory. This guide walks through everything from the underlying file system concepts to the full step-by-step setup, including creating the Azure resources, securing credentials, and mounting the storage.

What Is DBFS?

DBFS stands for Databricks File System. It is a distributed file system that comes built into every Azure Databricks workspace. When you interact with files in Databricks — whether through notebooks, magic commands like %fs, or the dbutils.fs utilities — you are working through DBFS.

DBFS serves as an abstraction layer. Behind the scenes, it maps to Azure Blob Storage that was automatically provisioned when your workspace was created. This default storage is sometimes called DBFS root storage. When you write a file to /tmp/output.csv in Databricks, that file is actually stored in the Azure Blob Storage account tied to your workspace.

You can explore the DBFS root by running:

python

dbutils.fs.ls('/')
```

Or using the magic command:
```
%fs
ls /

You will see several default directories including /databricks-datasets/ (sample data for learning), /FileStore/ (files accessible via the web UI), /tmp/ (temporary storage), and /user/ (per-user storage).

DBFS root storage is convenient for quick experiments and temporary files, but it has limitations. It is tied to the workspace, every user in the workspace can access it, and it is not the right place for production data. For real projects, your data lives in dedicated Azure storage accounts that your organization controls. That is where mounts come in.

What Are Databricks Mounts?

A mount is a pointer that maps a path inside DBFS to an external storage location. Once you mount an Azure Data Lake Storage container to a path like /mnt/data, every notebook in the workspace can read and write to that storage using the simple path /mnt/data instead of the full Azure storage URL.

Without a mount, accessing a file in Azure Data Lake Storage Gen2 looks like this:

python

df = spark.read.csv("abfss://container@storageaccount.dfs.core.windows.net/sales/2024/report.csv")

With a mount at /mnt/data, the same operation becomes:

python

df = spark.read.csv("/mnt/data/sales/2024/report.csv")

The mounted path is shorter, easier to remember, and — critically — decouples your notebook code from the specific storage account details. If you ever migrate to a different storage account, you update the mount definition once rather than editing every notebook.

Mounts persist across cluster restarts and are available to all clusters and all users within the workspace. You set them up once and they remain active until you explicitly unmount them.

The standard convention is to mount storage under the /mnt/ directory in DBFS. You can organize mounts by environment, project, or data layer:

/mnt/raw          → raw ingested data
/mnt/curated      → cleaned and transformed data
/mnt/analytics    → business-ready aggregates

What You Need Before Mounting

To mount Azure Data Lake Storage Gen2 to Databricks securely, you need four things in place: an Azure Data Lake Storage Gen2 account to store your data, a Service Principal that acts as the identity Databricks uses to authenticate, an Azure Key Vault to securely store the Service Principal’s credentials, and a Databricks Secret Scope that links to the Key Vault so your notebooks can retrieve those credentials without exposing them.

The following sections walk through creating each of these components.

Step 1: Create Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 (ADLS Gen2) is Microsoft’s recommended storage solution for analytics workloads. It combines the scalability of blob storage with a hierarchical file system namespace, which makes it perform well with big data frameworks like Spark.

To create an ADLS Gen2 account:

Open the Azure Portal and search for “Storage accounts”. Click “Create” and fill in the basics. Choose your subscription and resource group — ideally the same resource group where your Databricks workspace lives for simplicity. Give the storage account a globally unique name (lowercase letters and numbers only, between 3 and 24 characters). Select your region — choose the same region as your Databricks workspace to minimize latency and avoid cross-region data transfer charges.

On the “Advanced” tab, the critical setting is “Enable hierarchical namespace” — this must be checked. This is what makes the storage account a Gen2 Data Lake rather than a regular Blob Storage account. The hierarchical namespace gives you true directory operations and better performance for analytics workloads.

Leave the remaining settings at their defaults for now and create the account.

Once the storage account is created, go to it and create a container. A container is the top-level organizational unit — think of it as a root folder. You might create containers like raw, curated, and analytics, or a single data container depending on your structure.

Navigate to “Containers” in the left sidebar of the storage account, click “+ Container”, give it a name, and set the access level to “Private” (the default). You will control access through the Service Principal, not through public access.

Step 2: Create a Service Principal

A Service Principal is an identity in Azure Active Directory (now called Microsoft Entra ID) that applications use to authenticate. Rather than mounting storage using your personal credentials — which would break when your password changes or your account is disabled — you create a Service Principal specifically for Databricks to use.

In the Azure Portal, navigate to “Microsoft Entra ID” (or search for “App registrations”). Click “New registration”. Give it a descriptive name like databricks-storage-access. Leave the redirect URI blank and register it.

Once registered, you need three pieces of information from this page. The Application (client) ID is the Service Principal’s username, essentially. The Directory (tenant) ID identifies your Azure AD tenant. Note both of these down — you will need them later.

Next, create a client secret, which is the Service Principal’s password. Go to “Certificates & secrets” in the left sidebar, click “New client secret”, give it a description like “Databricks mount key”, choose an expiry period, and click “Add”. Copy the secret value immediately — Azure only shows it once. If you navigate away before copying it, you will need to create a new one.

You now have three values: the client ID, the tenant ID, and the client secret.

The final step for the Service Principal is granting it access to your storage account. Go back to your ADLS Gen2 storage account, navigate to “Access Control (IAM)” in the left sidebar, and click “Add role assignment”. Assign the role Storage Blob Data Contributor to your Service Principal. This role allows Databricks to read, write, and delete data in the storage account through this identity.

Search for the Service Principal name you registered, select it, and save the assignment. The permissions may take a minute or two to propagate.

Step 3: Create an Azure Key Vault

You now have a client secret that grants access to your storage account. You should never paste this secret directly into a notebook — it would be visible to anyone who can see the notebook, it would appear in version control, and it would be a security risk. Instead, store it in Azure Key Vault.

Azure Key Vault is a managed service for securely storing secrets, keys, and certificates. Databricks can connect to Key Vault to retrieve secrets at runtime without ever exposing them in your code.

In the Azure Portal, search for “Key vaults” and click “Create”. Select your subscription and resource group, give the vault a name, and choose the same region as your other resources. Under “Access configuration”, select “Vault access policy” as the permission model — this is simpler to set up than Azure RBAC for Key Vault when working with Databricks.

Create the vault, then navigate to it and add your secrets. Go to “Secrets” in the left sidebar and click “Generate/Import”. Create three secrets:

The first secret stores the client ID. Give it a name like databricks-client-id and paste the Application (client) ID as the value.

The second secret stores the tenant ID. Name it databricks-tenant-id and paste the Directory (tenant) ID.

The third secret stores the client secret. Name it databricks-client-secret and paste the client secret value you copied earlier.

Use descriptive, consistent naming for your secrets. You will reference these exact names from Databricks.

You also need to note two properties of your Key Vault for the next step: the Vault URI (which looks like https://your-vault-name.vault.azure.net/) and the Resource ID (found under “Properties” in the left sidebar — it looks like /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.KeyVault/vaults/{vault-name}).

Step 4: Create a Databricks Secret Scope

A Databricks secret scope is the bridge between your Databricks workspace and Azure Key Vault. Once linked, you can use dbutils.secrets.get() in your notebooks to pull secrets from Key Vault without any credentials appearing in your code.

To create an Azure Key Vault-backed secret scope, navigate to your Databricks workspace URL and append #secrets/createScope to the end. For example:

<br>https://adb-xxxxxxxxxxxx.x.azuredatabricks.net/#secrets/createScope

This opens a hidden configuration page (it is not accessible through the normal UI navigation). Fill in the following fields:

Scope Name — a descriptive name like databricks-scope. This is what you will reference in your code.

Manage Principal — set this to “All Users” if everyone in the workspace should be able to use the scope, or “Creator” if only you should have access.

DNS Name — paste the Vault URI from your Key Vault (e.g., https://your-vault-name.vault.azure.net/).

Resource ID — paste the full Resource ID from your Key Vault properties.

Click “Create” and the scope is linked.

You can verify it works by running this in a notebook:

python

dbutils.secrets.listScopes()

You should see your new scope in the list. Then confirm you can access the secrets:

python

dbutils.secrets.list("databricks-scope")

This will list the secret keys (but not their values). To retrieve a value:

python

client_id = dbutils.secrets.get(scope="databricks-scope", key="databricks-client-id")

The value will be usable in your code but redacted in any notebook output.

Step 5: Mount ADLS Gen2 Container to Databricks

With all the pieces in place — storage account, Service Principal with access, Key Vault holding the credentials, and a Databricks secret scope linked to Key Vault — you can now create the mount.

Run the following in a Databricks notebook:

python

# Define the mount configuration
storage_account_name = "yourstorageaccount"
container_name = "raw"
mount_point = "/mnt/raw"

# Retrieve credentials from the secret scope
client_id = dbutils.secrets.get(scope="databricks-scope", key="databricks-client-id")
tenant_id = dbutils.secrets.get(scope="databricks-scope", key="databricks-tenant-id")
client_secret = dbutils.secrets.get(scope="databricks-scope", key="databricks-client-secret")

# Build the OAuth configuration
configs = {
    "fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": client_id,
    "fs.azure.account.oauth2.client.secret": client_secret,
    "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
}

# Create the mount
dbutils.fs.mount(
    source = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/",
    mount_point = mount_point,
    extra_configs = configs
)

print(f"Successfully mounted {container_name} at {mount_point}")

Once this runs successfully, you can immediately access your storage:

python

# List files in the mounted container
dbutils.fs.ls("/mnt/raw")

python

# Read data using the mount path
df = spark.read.csv("/mnt/raw/sales/2024/transactions.csv", header=True)
df.display()

sql

%sql
-- SQL queries work with mount paths too
CREATE TABLE IF NOT EXISTS sales_data
USING CSV
OPTIONS (path '/mnt/raw/sales/2024/transactions.csv', header 'true')

The mount is now permanent. It survives cluster restarts, workspace restarts, and is available from every notebook and every cluster in the workspace.

Managing Your Mounts

Once mounts are in place, a few commands help you manage them.

List all active mounts:

python

for mount in dbutils.fs.mounts():
    print(f"{mount.mountPoint} -> {mount.source}")

Unmount storage:

python

dbutils.fs.unmount("/mnt/raw")

Remount with updated credentials:

If your Service Principal secret expires or you need to change the configuration, unmount first and then mount again with the new credentials. You cannot update a mount in place.

Check if a mount already exists before creating it:

python

def mount_if_not_exists(mount_point, source, configs):
    existing_mounts = [m.mountPoint for m in dbutils.fs.mounts()]
    if mount_point in existing_mounts:
        print(f"{mount_point} is already mounted")
    else:
        dbutils.fs.mount(source=source, mount_point=mount_point, extra_configs=configs)
        print(f"Successfully mounted {mount_point}")

This helper function prevents errors when running setup notebooks multiple times — attempting to mount to a path that is already mounted throws an error.

A Note on Mounts vs Unity Catalog

Mounts have been the standard way to access external storage in Databricks for years, and they remain fully functional. However, Databricks is actively encouraging migration toward Unity Catalog and external locations as the modern approach to storage access. Unity Catalog provides centralized governance, fine-grained access control, and data lineage tracking that mounts do not offer.

If you are learning Databricks today, understanding mounts is still valuable — many existing workspaces and tutorials rely on them, and they are conceptually simpler. But be aware that for new production environments, Unity Catalog with external locations is the recommended path forward. Mounts are unlikely to be deprecated soon, but the investment in new features is going toward the Unity Catalog model.

Wrapping Up

Mounting storage in Databricks is a multi-step process, but each step serves a clear purpose. Azure Data Lake Storage Gen2 holds your data. A Service Principal gives Databricks a secure identity to access it. Azure Key Vault protects the credentials. A Databricks secret scope makes those credentials available to your notebooks without exposing them. And the mount itself creates a clean, simple path that every notebook in your workspace can use.

Once you have set up mounts for your core data containers, reading and writing data becomes as straightforward as referencing a local directory. It is one of those foundational setups that you do once and benefit from every day.