Skip to content
This repository has been archived by the owner on Oct 29, 2023. It is now read-only.

Latest commit

 

History

History
127 lines (108 loc) · 4.87 KB

File metadata and controls

127 lines (108 loc) · 4.87 KB
page_type languages products name urlFragment description azureDeploy
sample
csharp
azure
azure-cognitive-search
azure-key-vault
azure-storage
Decrypt blob file sample skill for cognitive search
azure-decryptblob-sample
This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be indexed.

DecryptBlobFile

This custom skill downloads and decrypts a file that was encrypted in Azure Blob Storage and returns it back to Azure Cognitive Search to be processed and indexed. It is meant to be used in combination with the built-in DocumentExtractionSkill to allow you to index encrypted files without needing to worry about them being stored unecrypted at rest. For more details on how to encrypt files in blob storage, see this tutorial.

A full example of this skill is available in the Azure Cognitive Seach documentation.

Requirements

In addition to the common requirements described in the root README.md file, this function requires key get access to the Azure Key Vault resource where the key that was used to encrypt the files stored in Azure Blob Storage lives. This access should be granted by setting an access policy on the Key Vault with the principal being the Azure Function instance that the skill is deployed to.

Settings

This function doesn't require any application settings.

Deployment

Deploy to Azure

Sample Input:

{
    "values": [
        {
            "recordId": "record1",
            "data": { 
                "blobUrl": "http://blobStorage.com/myencryptedfile",
                "sasToken": "?sas=123&otherSasInfo=456"
            }
        }
    ]
}

Sample Output:

{
    "values": [
        {
            "recordId": "record1",
            "data": {
                "decrypted_file_data": {
                    "$type": "file",
                    "data": "<base64 encoded decrypted file data>"
                }
            },
            "errors": null,
            "warnings": null
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "name": "decryptblobfile", 
    "description": "Downloads and decrypts a file that was encrypted in Azure Blob Storage",
    "uri": "[AzureFunctionEndpointUrl]/api/decrypt-blob-file?code=[AzureFunctionDefaultHostKey]",
    "httpMethod": "POST",
    "timeout": "PT30S",
    "context": "/document",
    "batchSize": 1,
    "inputs": [
        {
            "name": "blobUrl",
            "source": "/document/metadata_storage_path"
        },
        {
            "name": "sasToken",
            "source": "/document/metadata_storage_sas_token"
        }
    ],
    "outputs": [
        {
            "name": "decrypted_file_data",
            "targetName": "decrypted_file_data"
        }
    ]
}

It is suggested to follow up this custom skill with a DocumentExtractionSkill that looks like the following:

{
    "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
    "parsingMode": "default",
    "dataToExtract": "contentAndMetadata",
    "context": "/document",
    "inputs": [
        {
            "name": "file_data",
            "source": "/document/decrypted_file_data"
        }
    ],
    "outputs": [
        {
            "name": "content",
            "targetName": "extracted_content"
        }
    ]
}

It is also suggested to add the configuration parameter "dataToExtract": "storageMetadata" to your indexer definition when running an indexer with this skill. This ensures that the indexer does not fail before the skillset is given a chance to execute, and the content and metadata that would normally be extracted with the default dataToExtract option contentAndMetadata will be extracted instead by the DocumentExtractionSkill.