This post is about setting up a connection from Databricks to Azure Storage Account using a SAS key.
This method is perfect when you need to provide temporary access with fine grained permssions to a storage account. In this post, I will show how to setup readonly access for a temporary period of time.
Please note: you should try and use pass through credentials, but, that functionality need premium tier databricks. This post is a workaround that utilizes the Shared Access Signature.
- Create a Shared Access Signature Key for your Storage Account
Go to your Storage Account and under "Settings", select "Shared access signature"
Allowed Services: "Blob"
Allowed Resource Types: "Container" and "Object"
Permissions: "Read" and "List" - Click Generate SAS and connection string
- You need the "SAS Token"
- Python code for Databricks Notebook:
from pyspark.sql import SparkSessioncontainer = "<<containerName>>"storageAccountName = "<<storageAccountName>>"sasKey = "<<sas token from step 3>>"; # you should use:#dbutils.secrets.get(scope="scopeName",key="keyName")
spark.conf.set(f"fs.azure.sas.{container}.{storageAccountName}.blob.core.windows.net", sasKey)spark.conf.set("spark.sql.execution.arrow.enabled", "true")spark.conf.set("spark.sql.execution.arrow.fallback.enabled", "true")inputfilepath = f"wasbs://{container}@{storageAccountName}.blob.core.windows.net/pathToFile/fileName.parquet"dataframe = spark.read.option("mergeSchema", "true").parquet(inputfilepath)dataframe.describe()display(dataframe)
Comments:
The SAS key doesn't seem to allow you to use the abfs[s]: endpoint from databricks. Instead you need to use the wasb[s]: endpoint. Also, the SAS key doesn't seem to allow you to use the DFS URL (eg: {storageAccountName}.dfs.core.windows.net). So this isn't ideal, but, its a great way to connect using a temporary SAS key
No comments:
Post a Comment
Remember, if you want me to respond to your comment, then you need to use a Google/OpenID account to leave the comment.