-
Notifications
You must be signed in to change notification settings - Fork 5
Home
This project processes Apache compliant web log files to derive inferences hidden in web log data. This project cleans, structures web log data and visualizes it.
This project has developed data transformation system using Microsoft Azure technologies. Following is the list of technologies used in this project:
- Azure Storage Account: To persist unstructured raw data files and archival of processed output.
- Azure Data Factory: Data orchestration and scheduled job execution
- Azure SQL DW/DB: Structured output storage and querying
- OnDemand HDInsight Cluster: To execute spark job for cleaning and structuring of data
- Batch Account: For execution of custom ADF activities
- Power BI: Data visualization
User will drop or design system which can drop web log files to particular container in Azure Storage Account provisioned by this deployment at at regular intervals. In the backend Azure Data Factory will pickup these files at the same frequency and perform operations on it to clean and structure the data. Structured data will be persisted in SQL DB / SQL DW. This structured data will be visualized using Power BI.
Create Azure AD application and obtain App id and Authentication key by following steps mentioned at this link.