page_type | languages | products | name | urlFragment | description | azureDeploy | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
sample |
|
|
Decrypt blob file sample skill for cognitive search |
azure-decryptblob-sample |
This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be indexed. |
This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be processed and indexed. It is meant to be used in combination with the built-in DocumentExtractionSkill to allow you to index encrypted files without needing to worry about them being stored unecrypted at rest. For more details on how to encrypt files in blob storage, see this tutorial.
A full example of this skill is available in the Azure Cognitive Seach documentation.
In addition to the common requirements described in the root README.md
file, this function requires key get access to the Azure Key Vault resource where the key that was used to encrypt the files stored in Azure Blob Storage lives. This access should be granted by setting an access policy on the Key Vault with the principal being the Azure Function instance that the skill is deployed to.
This function doesn't require any application settings.
{
"values": [
{
"recordId": "record1",
"data": {
"blobUrl": "http://blobStorage.com/myencryptedfile",
"sasToken": "?sas=123&otherSasInfo=456"
}
}
]
}
{
"values": [
{
"recordId": "record1",
"data": {
"decrypted_file_data": {
"$type": "file",
"data": "<base64 encoded decrypted file data>"
}
},
"errors": null,
"warnings": null
}
]
}
In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "decryptblobfile",
"description": "Downloads and decrypts a file that was encrypted in Azure Blob Storage",
"uri": "[AzureFunctionEndpointUrl]/api/decrypt-blob-file?code=[AzureFunctionDefaultHostKey]",
"httpMethod": "POST",
"timeout": "PT30S",
"context": "/document",
"batchSize": 1,
"inputs": [
{
"name": "blobUrl",
"source": "/document/metadata_storage_path"
},
{
"name": "sasToken",
"source": "/document/metadata_storage_sas_token"
}
],
"outputs": [
{
"name": "decrypted_file_data",
"targetName": "decrypted_file_data"
}
]
}
It is suggested to follow up this custom skill with a DocumentExtractionSkill that looks like the following:
{
"@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
"parsingMode": "default",
"dataToExtract": "contentAndMetadata",
"context": "/document",
"inputs": [
{
"name": "file_data",
"source": "/document/decrypted_file_data"
}
],
"outputs": [
{
"name": "content",
"targetName": "extracted_content"
}
]
}
It is also suggested to add the configuration parameter "dataToExtract": "storageMetadata"
to your indexer definition when running an indexer with this skill. This ensures that the indexer does not fail before the skillset is given a chance to execute, and the content and metadata that would normally be extracted with the default dataToExtract
option contentAndMetadata
will be extracted instead by the DocumentExtractionSkill.