Skip to content

shannon-sbip/datamine

Repository files navigation

Datamine

This is a project that leverages on Magic Links to distribute datasets to pre-approved entities.

Check it out here!

Table of Contents


  1. System Design
  2. Development

System Design


DDD Strategic Design

The goal of strategic design is to formalize the language between stakeholders and can be categorized into 3 categories: (1) Events; (2) Objects; (3) Transactions.

Events represent the past and act as the source of truth; they are stored in databases. Objects are models that represent the current state of the domain and are derived from the events that occur over time. Transactions work with objects within the domain to generate events that changes the various objects of the domain.

The domain of Datamine is as follows:

system-design

Events Objects Transactions
UserEvent User UpdateUser
DownloadEvent - DownloadDataset

The datasets are uploaded in an out-of-band manner by system administrators and is not within the scope of this project.

DDD Tactical Design

tactical-design

Based on the events and objects identified in the Strategic Design stage, the above Entity-Relationship diagram is drawn.

API Endpoints

POST /api/v1/login/
POST /api/v1/user/ 
POST /api/v1/user/update
POST /api/v1/dataset/

The endpoinds are derived from the needs of the personas identified in the Strategic Design stage. The webapp will focus on server-side rendering any user content, while exposing API endpoints that are required by client-side during user interaction.

Capacity Estimation (Data Storage)

  • Int: 4 bytes
  • Char: 2 byte * size
  • Bool: 1 byte
  • UserEvent: (191 size * 2 bytes * 4) + (4 bytes * 4) + 2 bytes = 1546 bytes
  • DownloadEvent: (191 size * 2 bytes * 2) + 4 bytes = 768 bytes

Assuming 100 user updates per day, the database growth rate can be calculated as follows:

  • 100 * (1546 + 768) bytes = 231.4 KB / day

By using Planet Scale database service, there is 5GB of storage available in the free tier:

  • 5,000,000 KB / 231.4 KB = ~21,600 days = ~60 years sustained usage

Cost Estimation

Assumptions:

  • Developer team size: 1 pax
  • 100 daily active users
  • 100 user updates per day
  • 1 downloads per day
  • 1 compressed dataset of size 36 GB that never changes
  • 100 emails sent per day
  • 1 GB of monthly email data sent

Vercel Monthly Cost

Type Cost (per month)
Hobby 0.00 USD
Pro 20.00 USD per team member (max 10)

Planet Scale Monthly Cost

Type Cost (per month)
Free 0.00 USD
Scalar 29.00 USD

Amazon S3 Monthly Cost

Type Calculation Cost (per month)
Storage 36 GB * 0.025 USD 1.00 USD
PUT request 1 PUT Request * 0.000005 USD per request 0.00 USD
GET request (1 downloads per day) * (30 days) * (0.0000004 USD per request) 0.000012 USD
Data Transfer (36 GB) * (30 days) x 0.12 USD 129.60 USD
Total 1.00 USD + 129.60 USD 130.60 USD

Amazon SES Monthly Cost

Type Calculation Cost (per month)
No. of Emails 100 emails per day * 30 days x 0.0001 USD 0.30 USD
Data sent 1 GB per month x 0.12 USD 0.12 USD
Total 0.30 USD + 0.12 USD 0.42 USD

System Architecture

system-architecture

Data Scientist Flow

A1: User visits the WebApp hosted on Vercel's Content Distribution Network.

A2: User logins by entering their email address.

A3: System verifies if the email is valid by referencing the database, and generates a Magic Link.

A4: Upon successful validation, system triggers an email containing the Magic Link to the WebApp.

A5: User recieves the Magic Link in their mailbox and uses it to sign in to the WebApp.

A6: Upon clicking on a valid Magic Link, the user will be greeted with their profile information and a download dataset button.

A7: User triggers the download datasets.

A8: System validates the download request.

A9: System generates a short-lived pre-signed S3 URL.

A10: System generates a new DownloadEvent, updating the download count.

A11: System forwards the pre-signed S3 URL to the user.

A12: User downloads the dataset using the given URL.

Admin Flow

B1: User visits the WebApp hosted on Vercel's Content Distribution Network.

B2: User logins by entering their email address.

B3: System verifies if the email is valid by referencing the database, and generates a Magic Link.

B4: Upon successful validation, system triggers an email containing the Magic Link to the WebApp.

B5: User recieves the Magic Link in their mailbox and uses it to sign in to the WebApp.

B6: Upon clicking on a valid Magic Link, the user will be directed to the admin panel of the WebApp.

B7: User uploads csv file containing the list of users who can download the dataset.

Product Walkthrough


walkthrough

1: User visits the WebApp and enters their email address.

2: User checks their inbox for the Magic Link.

3: User uses the Magic Link to login to the WebApp and is greeted with their profile page. This is where they can choose to download the dataset. If they are Admins, they can navigate to the Admin page to manage the users.

4: The Manage Users page shows the table of all currently active users. Admins can choose to update or add new users using the upload option. The format of the CSV file is as follows:

email name affilation maxDownloadCount validFrom validTo isActive isAdmin
string string string number number number true / false true / false

Note that the validFrom and validTo fields are unix timestamps in milliseconds.

View a video of the walkthrough here!

Development


Quick Start

Ensure you have the latest stable version installed for Node, Python3 & pip, and Docker.

[ Optional ] You have installed LocalStack using pip.

[ Optional ] You have the following extensions/plugins in your IDE installed: ESLint, GitLens, Prisma, Tailwind CSS IntelliSense, CSS Modules

Clone the repository into your chosen directory and run the following commands:

// Install dependencies and setup the project.

yarn

// Start LocalStack in the background in a separate terminal
// Note that you may have to add the binary to your path. (~/.local/bin/localstack)
// LocalStack is not mandatory for local development, but is needed for simulating the downloading of dataset.

yarn localstack:start

// Start a local MySQL instance using Docker.

yarn db:start

// Start the development server. There will be some setup scripts executed before the dev server starts.
// App is usually hosted on http://localhost:3000

yarn dev

// Run all the tests in the project
// These tests should be run often as you develop on it, to catch bugs early.

yarn test

There are predefined users loaded into the database and you may inspect the data file at /src/tests/data/userEvent.json. You may choose any of the valid users defined there to login to the application locally. For instance, you may use admin@example.com.

As the application requires interacting with external Amazon services, those functions are either stub out or depend on LocalStack when developing locally. Important information needed for development can be found in the console where yarn dev was run. For instance, the login flow will print the magic link onto the console for you to interact with it locally.

In case you wish to reset the local database, you may restart the dev server by running yarn dev again.

More commands can be found in the package.json file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published