I am an ML engineer working for a tech-savvy taxi ride company with a fleet of thousands of cars. The organization wants to start making ride times more consistent and to understand longer journeys in order to improve customer experience, increase retention and business returns.
I was employed to creare an anomaly detection service to find rides that have unusual ride time. These were the requirements given by the CEO:
- Rides should be clustered based on ride distance and time and anomalies/ outliers identified.
- Speed was not to be used, as analysts would like to understand long-distance rides.
- The analysis should be carried out on a daily schedule.
- The data for inference should be consumed from the company's data lake.
- The results should be made available for consumption by other company systems.
After much analysis, this was proposed to the client;
- Algorithm type = outlier detection specifically
Density-Based Spatial CLustering of Applications with Noise (DBSCAN)
- Features used were ride time and distance
- System output destination is S3 bucket on AWS. The predictions data is exported as
JavaScript Object Notation (JSON)
- Batch Frequency is Daily.
Here is a pictorial view of the solution
- Select Create environment on the MWAA landing page
- You will then be provided with a screen asking for the details of your new Airflow environment.
- For MWAA to run, it needs to be able to access code defining the DAG and any associated requirements or plugins files. The system then asks for an AWS S3 bucket where these pieces of code and configuration reside. In this example, we create a bucket called mleip-airflow-example that will contain these pieces:
- We then have to define the configuration of the network that the managed instance of Airflow will use.
This can get a bit confusing if you are new to networking, so it might be good to read around the topics of subnets, IP addresses, and VPCs. Creating a new MWAA VPC is the easiest approach for getting started in terms of networking here, but your organization will have networking specialists who can help you use the appropriate settings for your situation.
We will go with this simplest route and click Create MWAA VPC, which opens a new window where we can quickly spin up a new VPC and network setup based on a standard stack definition provided by AWS. - We are then taken to a page where we are asked for more details on networking:
- Next, we have to define the Environment class that we want to spin up. Currently, there are three options. Here, we use the smallest, but you can choose the environment that best suits your needs (always ask the billpayer's permission!)
- Now, if desired, we confirm some optional configuration parameters (or leave these blank, as done here) and confirm that we are happy for AWS to create and use a new execution role
- The next page will supply you with a final summary before allowing you to create your MWAA environment. Once you do this, you will be able to see in the MWAA service your newly created environment