Copy Activity, Parameters, Debug and Publishing
Copy Activity, Parameters, Debug and Publishing: 1
Task 1: Create a pipeline and setup authentication to SmartFoods OAuth2 (Token based) API 1
Task 2: Store the access token in a variable 9
Task 3: Retrieve data from API and store in Blob Storage 11
Duration: 30 minutes
In this exercise you create a pipeline to ingest data from SmartFoods web services. Primary learning objectives:
-
Accessing an OAuth2 API
-
Setting up copy activity
-
Setting up parametrized pipelines
-
Using parametrized datasets in activities
-
Creating, setting and reading variables
-
Using Web Activity
In order to access the SmartFoods API we need to first call the endpoint with username and password as the HTTP POST body to get a time-based OAuth token. This token then can be used in further activities to authenticate to the API and retrieve data. The best way to perform this is to use a “Web” activity within data factory.
The easy way to do this is by adding a web activity to the canvas and hardcoding the credentials in it. but this is a serious security breach and as explained before every secret used in ADF should be stored in AKV.
As a result, we will require to add two more “Web” activities before this web activity to first retrieve the credentials from AKV at runtime and pass to the third (main) Web Activity to authenticate to the API and retrieve the token.
-
Click on the plus sing and click on Pipeline to add a new pipeline
-
Rename the pipeline under “general” tab to SmartFoodCustomerApiToBlob
- From the parameters tab create a pipeline parameter and call it “date”
- From the activities bar under “General” drag a “Web” activity to the canvas
-
Rename the activity to AKVUsername
-
Go to your Azure Key Vault in Azure Portal and from secrets select “SmartFoodsApiUsername” (You have created this secret previously)
- Select the current version
- Copy the “Secret Identifier”
-
Paste it in a code editor or Notepad
-
At the end of the URI add
?api-version=7.0
So, it will look something like this:
https://adf-mehdi-dev-kv.vault.azure.net/secrets/SmartFoodsApiUsername/a35670dbbf19471eac4f8390e3c31882?api-version=7.0
-
Repeat the same steps for “SmartFoodsApiPassword” secret
-
Now back in ADF under the web activity and
-
paste the URI with api version added to it in the URL box
-
Change Authentication to MSI (This indicates we have give this instance of ADF access to out AKV so no other auth is necessary)
-
For resource enter:
-
https://vault.azure.net
-
Go back to General tab and tick “Secure Output”
-
Now click “Debug” to test the activity
- Under output tab click the output button to see the activity output
The output should be in form of:
{ "SecureOutput": "**********" }
This is the effect of setting secure output setting
Try it: Try removing the “secure output” tick and re-run debug and see how the output will differ.
- Repeat the same steps and add “AKVPassword” Web activity.
- Add another “Web activity” to the canvas. Rename it to “SmartFoodsLogin” and attach the success connectors from the AKVUsername and AKVPassword to it as below:
- Setup the web activity as below
URL:
https://smartfoods.azurewebsites.net/api/SmartFoodsOauth
Method: POST
Body:
@json(concat('{"username":"',activity('AKVUsername').output.value,'","password":"',activity('AKVPassword').output.value,'"'))
Note1: you need to click on “Add Dynamic Content” to enter the “Expression Editor” before pasting the value
Note2: Expression explanation: For HTTP POST body we need to compose a JSON document and pass the username and password attribute to it. So we use the “Concat” function to add the static and dynamic parts together to compose a JSON document and then pass it to the “JSON” function to format it correctly as a JSON. The resulting JSON will look like below.
{
“Username”: “<value coming from AKVUsername web activity>”,
“password”: “<value coming from AKVPassword web activity>”
}
-
Under general tab tick the “Secure Output” to make sure the tokens are not getting revealed in ADF logs.
-
Debug you pipeline to make sure all activities are successful.
Now we have the token to access the API we need to store the token in a variable to be passed to our copy activity in order to access the API securely.
- Under variables in the pipeline create a new variable “token”
- Add a “Set Variable” activity to canvas
-
add “Set variable” activity.
-
connect to “SmartFoodsLogin” using “Success”.
-
click on variables.
-
For “Name” select “token”.
-
go to Expression Editor for “Value” and set it to:
@activity('SmartFoodsLogin').output.token
-
Rename the activity to “SetAccessToken”.
-
(Optional) Debug to test your pipeline.
In Azure Data Factory, you can use the “Copy” activity to copy data among data stores located on-premises and in the cloud. Also Copy activity allows us to change the data format through the copy process. For example, here we receive the data in JSON format but store as CSV. After you copy the data, you can use other activities to further transform and analyze it.
- First create a pipeline parameter called “date”
-
Drag a copy activity to canvas and connect to “SetAccessToken” activity
-
Rename it to “SmartFoodsCustomersToBlob”
-
Under source
-
Source dataset: “SmartFoodsCustomerApiJson”
-
For “authCode” parameter go to Expression editor and select the “token” variable
-
-
Request method: Post
-
Request body: click on expression editor and enter
@{json(concat('{"trans_date": "',pipeline().parameters.date,'","dataDomain" : "customers"}'))}
Expression explanation: We are composing a JSON for REST request body with two attributes trans_date which we use the pipeline parameter as value and dataDomain hard coded to “customers”
-
For Sink
-
Sink dataset: “SmartFoodsDelimitedTextBlob”
-
Dataset parameters:
-
Folder: “customers”
-
File: Enter Expression editor and enter
-
-
smartfoods_customers_@{replace(pipeline().parameters.date,'-','')}
- Filetype: “csv”
-
Click “Debug”
-
Provide 2020-02-10 as the value for “date” parameter
-
Go to Azure Storage Explorer -> find the respective storage account, container and directory -> Locate the “smartfoods_customers_20200201.csv” file and open it to make sure the data is copied correctly.