Architecture and Deployment Guide

Architecture

This project has developed data transformation system using Microsoft Azure technologies. Following is the list of technologies used in this project:

Azure Storage Account: To persist unstructured raw data files and archival of processed output.
Azure Data Factory: Data orchestration and scheduled job execution
Azure SQL DW/DB: Structured output storage and querying
OnDemand HDInsight Cluster: To execute spark job for cleaning and structuring of data
Batch Account: For execution of custom ADF activities
Power BI: Data visualization

User will drop or design system which can drop web log files to particular container in Azure Storage Account provisioned by this deployment at at regular intervals. In the backend Azure Data Factory will pickup these files at the same frequency and perform operations on it to clean and structure the data. Structured data will be persisted in SQL DB / SQL DW. This structured data will be visualized using Power BI.

Deployment Steps

Step 01: Create Azure AD App

Create Azure AD application and obtain App id and Authentication key by following steps mentioned at this link.

Step 02: Deploy Azure Resources

Hit deployment linkto open Deploy to Azure page.
Select appropriate Directory and Subscription.
Select option Create New for Resource Group and enter desired resource group name.
Enter appropriate user name and password in Sql User Name and Sql Password field respectively. Note down this credentials as it would be used for accessing SQL Server provisioned by this deployment.
Enter Azure AD App Id and Authentication Key obtained in step 01 as Service Principal ID and Service Principal Key respectively.
Press Next button at the bottom to start deployment.

Step 03: Activate Azure Data Factory Pipeline

Visit Azure Portal and open resource group provisioned through this deployment.
Open Azure Data Factory instance inside the resource group.
Click Author and Monitor button present on Azure Data Factory overview blade to open ADF studio.
Select Author tab on ADF studio.
Press Triggers button present at the bottom of ADF Studio.
Press Edit button present on PipelineTrigger.
On Edit Trigger pane set Start Date, Recurrence settings and check Activated checkbox.
Press Finish button to finalize edits.
Press Publish All button present on top of ADF studio to activate data factory.

Step 04: Make a data source connection in Power BI

Open Power BI Desktop file ThirdEye - ARM Template-Pattern 1 Insights.pbix placed in current repository, on your local machine.
Hit Home > Edit Queries button present on menu bar.
On Query Editor window, press menu Home > Data source settings.
Select data source present on Data source settings window and press button Change Source present at the bottom of window.
On SQL Server database window, enter the name of SQL server present in Azure resource group provisioned by this deployment. Press Ok button at the bottom.
Again select data source present on Data source settings window and press button Edit Permissions present at the bottom of window.
On Edit Permissions window, press Edit button.
On next popup window, ensure Database tab is selected and enter credentials of SQL server supplied while initiating this deployment. Press Save button to save credentials.
Close all popup windows, press Close & Apply button present on menu bar of Query Editor window to finalize change in database connection.
.
Now have a look on all visuals present on Power BI file.

Power BI Visuals

Power BI Visuals are described in ThirdEye - Web Log Analytics - Power BI Visuals.PPTX file, present in this repository.

ThirdEye's Open Source Software (OSS) Solutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly