page_type

languages

products

name

urlFragment

description

azureDeploy

sample

csharp

azure

azure-cognitive-search

azure-key-vault

azure-storage

Decrypt blob file sample skill for cognitive search

azure-decryptblob-sample

This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be indexed.

https://raw.githubusercontent.com/Azure-Samples/azure-search-power-skills/main/Utils/DecryptBlobFile/azuredeploy.json

DecryptBlobFile

This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be processed and indexed. It is meant to be used in combination with the built-in DocumentExtractionSkill to allow you to index encrypted files without needing to worry about them being stored unecrypted at rest. For more details on how to encrypt files in blob storage, see this tutorial.

A full example of this skill is available in the Azure Cognitive Seach documentation.

Requirements

In addition to the common requirements described in the root README.md file, this function requires key get access to the Azure Key Vault resource where the key that was used to encrypt the files stored in Azure Blob Storage lives. This access should be granted by setting an access policy on the Key Vault with the principal being the Azure Function instance that the skill is deployed to.

Settings

This function doesn't require any application settings.

Deployment

Sample Input:

{
    "values": [
        {
            "recordId": "record1",
            "data": { 
                "blobUrl": "http://blobStorage.com/myencryptedfile",
                "sasToken": "?sas=123&otherSasInfo=456"
            }
        }
    ]
}

Sample Output:

{
    "values": [
        {
            "recordId": "record1",
            "data": {
                "decrypted_file_data": {
                    "$type": "file",
                    "data": "<base64 encoded decrypted file data>"
                }
            },
            "errors": null,
            "warnings": null
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "name": "decryptblobfile", 
    "description": "Downloads and decrypts a file that was encrypted in Azure Blob Storage",
    "uri": "[AzureFunctionEndpointUrl]/api/decrypt-blob-file?code=[AzureFunctionDefaultHostKey]",
    "httpMethod": "POST",
    "timeout": "PT30S",
    "context": "/document",
    "batchSize": 1,
    "inputs": [
        {
            "name": "blobUrl",
            "source": "/document/metadata_storage_path"
        },
        {
            "name": "sasToken",
            "source": "/document/metadata_storage_sas_token"
        }
    ],
    "outputs": [
        {
            "name": "decrypted_file_data",
            "targetName": "decrypted_file_data"
        }
    ]
}

It is suggested to follow up this custom skill with a DocumentExtractionSkill that looks like the following:

{
    "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
    "parsingMode": "default",
    "dataToExtract": "contentAndMetadata",
    "context": "/document",
    "inputs": [
        {
            "name": "file_data",
            "source": "/document/decrypted_file_data"
        }
    ],
    "outputs": [
        {
            "name": "content",
            "targetName": "extracted_content"
        }
    ]
}

It is also suggested to add the configuration parameter "dataToExtract": "storageMetadata" to your indexer definition when running an indexer with this skill. This ensures that the indexer does not fail before the skillset is given a chance to execute, and the content and metadata that would normally be extracted with the default dataToExtract option contentAndMetadata will be extracted instead by the DocumentExtractionSkill.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DecryptBlobFile

Requirements

Settings

Deployment

Sample Input:

Sample Output:

Sample Skillset Integration

Files

README.md

Latest commit

History

README.md

File metadata and controls

DecryptBlobFile

Requirements

Settings

Deployment

Sample Input:

Sample Output:

Sample Skillset Integration