Wednesday, February 24, 2021

Databricks - Connecting to an Azure Storage Account using a SAS key

This post is about setting up a connection from Databricks to Azure Storage Account using a SAS key.

This method is perfect when you need to provide temporary access with fine grained permssions to a storage account. In this post, I will show how to setup readonly access for a temporary period of time.

Please note: you should try and use pass through credentials, but, that functionality need premium tier databricks. This post is a workaround that utilizes the Shared Access Signature.


  1. Create a Shared Access Signature Key for your Storage Account
    Go to your Storage Account and under "Settings", select "Shared access signature"
    Allowed Services: "Blob"
    Allowed Resource Types: "Container" and "Object"
    Permissions: "Read" and "List"


  2. Click Generate SAS and connection string
  3. You need the "SAS Token"


  4. Python code for Databricks Notebook:

from pyspark.sql import SparkSession
container = "<<containerName>>"
storageAccountName = "<<storageAccountName>>"
sasKey = "<<sas token from step 3>>"; # you should use:#dbutils.secrets.get(scope="scopeName",key="keyName") 
spark.conf.set(f"fs.azure.sas.{container}.{storageAccountName}.blob.core.windows.net", sasKey)
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", "true")

inputfilepath = f"wasbs://{container}@{storageAccountName}.blob.core.windows.net/pathToFile/fileName.parquet"
dataframe = spark.read.option("mergeSchema", "true").parquet(inputfilepath)
dataframe.describe()
display(dataframe)


Comments:

The SAS key doesn't seem to allow you to use the abfs[s]: endpoint from databricks. Instead you need to use the wasb[s]: endpoint. Also, the SAS key doesn't seem to allow you to use the DFS URL (eg: {storageAccountName}.dfs.core.windows.net). So this isn't ideal, but, its a great way to connect using a temporary SAS key

No comments: