Skip to content

Latest commit

 

History

History
156 lines (84 loc) · 6.36 KB

readme.md

File metadata and controls

156 lines (84 loc) · 6.36 KB

Event Driven Architecture with Databricks

image

image

Entire Architecture

image

Event driven push notifications

image

Near Realtime Analytics capability

image

Event driven push notifications - Eventhub read/write with Databricks

To simulate multiple apps submitting events with GPS coordinates, we created a mock notebook to create the simulated data and push it to the eventhub. The main stream would then watch events coming from that eventhub and consume those events when they become available. It would then check if the gps coordinates are near a given store, and if so would add a promotion and write the results back to another eventhub. Following that downstream consumers could listen to those events and push a notification back to the app, allowing the end users to receive tailored promotions to their location.

  • Mock GPS data from app and write to eventhub consumer group dev-readfrom, in the mockgpsdata notebook
  • Ingest GPS data from dev-readfrom, do transformation, write back to eventhub dev-writeto in the eventdrivenstreaming notebook

Mock gps data from multiple app users image

Consume those events and write back those to the WriteTo eventhub, when those GPS coordinates are the proximity of a given store which has a promotion running image

Down stream apps can then listen to the promotion eventhub (WriteTo) and then forward push notifications. In this example, we could use logic apps to become a consumer of the eventhub (for this PoC we left this out of scope but show how it could be done). image

Near Realtime Analytics capability

Events are stored into ADLS from the eventhubs into bronze. Here they have the avro schema format. The first thing we need to do is extract the body and get the raw data from there. Then we enrich the data by joining it onto the store dataset and promotions, so that we can provide a more rich dataset to provide analysis on. We then write the data to Silver. image

To provide an analytics layer we take the Silver data and aggregate it to useful information that our users would be interested in. We use spark streaming to ensure that we get near realtime tables that can provide our users with up to date information. image

When the analysis layer (GOLD) is ready, analyts can use it to query the results image


How to setup the project

clone repo

  git clone https://github.com/magrathj/event-driven-hackathon.git

cd into directory

  cd /event-driven-hackathon

Run workspace inside of dev container

Run the workspace inside of a remote container (in the .devcontainer folder) so that terraform is already install for you.

You will need the remote container extension from VS code: https://code.visualstudio.com/docs/remote/containers

image

Deploy Azure environment (run locally)

Create service principal

image

image

cd into devops/environments/dev

 cd /devops/environments/dev
  terraform init
  terraform plan

Create containers in your lake

image

Create eventhubs

image

Turn on capture events

image

Set up connection between databricks secret scope and key vault

image image

Create the following secrets

image

Mount the workspace

Run the following notebook to mount the workspace to your ADLS

  /EventDrivenPromotions/mountlake

image

Create a logic app

image

References:

Read/Write to eventhub https://docs.databricks.com/spark/latest/structured-streaming/streaming-event-hubs.html

Mock data to eventhub and read back https://docs.microsoft.com/en-gb/azure/databricks/scenarios/databricks-stream-from-eventhubs

Connect keyvault to Azure Databricks https://docs.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes

Run inside of dev container https://code.visualstudio.com/docs/remote/containers

read in avro https://caiomsouza.medium.com/processing-event-hubs-capture-files-avro-format-using-spark-azure-databricks-save-to-parquet-95259001d85f