Skip to content

Latest commit

 

History

History
167 lines (131 loc) · 8.01 KB

README.md

File metadata and controls

167 lines (131 loc) · 8.01 KB

tinyurl

A tinyurl clone service. A classic system design interview question:

How would you design TinyURL?

Visit demo to see the project in action.

Quick Start

Clone the project and run it locally.

# Setup project
./bin/setup.sh

# Run locally
./bin/local.sh

Features

  • Environments: local tests, local deploy, staging deploy, production deploy
  • Unit and integration tests
  • Bootstrap front-end
  • Deployable "serverless" app

TODO

  • Flask app configuration management
  • Python linter
  • More doc strings
  • Continuous Integration (Circle CI or Travis CI)
  • Internationalization (i18n)
  • Lambda cold starts
  • Choose at least 2 subnets for Lambda to run your functions in high availability mode
  • Automatic API docs
  • Disable push to master and require all changes via pull request
  • Analytics dashboard
  • Viral alerts
  • Tracking tags: email, blog, etc.
  • More tests
    • Flask app error handlers
    • Tests for Redis return types
    • Staging integration tests
  • Bugs
    • Duplicate logs in CloudWatch Logs, Lambda appears to modify root / flask logger

References

Design decision

I use FaaS Lambda to support the application for a few good reasons. Given this is a relatively small project, which is seldomly used, Lambda will be very cost effective. In order to host the application with an an EC2 instance an ASG (Auto Scaling Group) or an ECS (Elastic Container Service) will need to keep at least one instance ready at all times regardless of traffic. Running an EC2 24 hours a day costs money. Whereas Lambda does not require any permanently provisioned machines (but it will have cold starts) and is very scalable too.

Database

URLs can be viral which means traffic distribution of unique URLs will not be uniform. Assuming an 80-20 rule: 80% of the traffic is generated by 20% of the URLs. This application is read heavy (redirect from TinyURL) and will no doubt have significantly less writes (create TinyURL).

DynamoDB vs ElastiCache: Redis

DynamoDB can be highly available for the right price, and highly scalable with the right design. However it is not suitable for TinyURL since the partition key, which should uniquely identify the URL, will inevitably reach provisioned thorough-put. When this capacity is reached the application can no longer be serviced by DynamoDB, and thus requests cannot be serviced without manual intervention. Provided that a viral event might happen at any point in the day, it is not acceptable to react based on read traffic.

Redis supports hset and hget which does not suffer from hot partitions from heavy reads and will consistently perform at O(1) time complexity. Redis can be scaled as storage needs increase, persisted and clustered. With the right monitoring solution, capacity planning makes Redis scalable proactively as storage is required.

tl;dr:

  • DyanamoDB is highly available for the right price, but does not scale well under heavy read loads on a hot partition.
  • Redis does well under heavy reads, and can be scaled proactively.

Implementation

In the following examples, please note that a table is used as an abstraction to help illustrate how data is persisted. In reality the there are two dictionary-like associations: short-to-long and long-to-short. The following technique is then used to determine the short ID.

A unique sequential number is tracked simply by querying the size of a set (see hset) with time complexity O(1). In this case, this will be the cardinality of the short ID to long URL mapping. This also helps de-duplicate records if an URL is submitted more than once.

When a new association is to be written, there is a potential race condition among Lambda functions. To resolve this, a watch is setup for conditional execution of a transaction. The write is then re-attempted if the cardinality of the set changes before it completes the transaction. Meaning another Lambda function successfully created an association with the same short ID. In this situation, URLs between Lambda functions can either be identical or distinct, but they do share the same short ID.

Example: Long to short

Given an URL, insert it into table. Assuming the ID is automatically assigned by the database.

id long short
125 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' NULL

Get an id (125) (an auto incremented unique identifier). Convert id into a base-62 string ('cb') which will be the short ID of the long form URL. Update table at id, and update the short ID. This can be done in the same transaction: insert then update.

id long short
125 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' 'cb'

Example: Short to long

Convert short base-62 string ('cb') into a base-10 integer which is used to lookup the entry. Select from table given id (125), and return long form URL.

id long short
125 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' 'cb'

API

Set Endpoint

# Local development: `./bin/local.sh`
export TINYURL_ENDPOINT=http://localhost:5000

# OR, in the cloud: `./bin/provision.sh --stage staging`
export TINYURL_ENDPOINT=https://tinyurl-staging.7okyo.com

Make TinyURL

curl \
    --write-out '%{http_code}\n' \
    --request POST "${TINYURL_ENDPOINT}/api" \
    --header 'Content-Type: application/json' \
    --data '{"url": "http://example.com"}'

Search TinyURL

With Long URL

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/api?url=http://example.com"

With Short ID

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/api?id=a"

Redirect from TinyURL

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/a"

Commands

Command Wrapper for Description
./bin/setup.sh N/A Setup project -- run this for before all others
./bin/test.sh pytest Run tests
./bin/local.sh serverless wsgi Run locally
./bin/provision.sh serverless deploy Provision cloud
./bin/deprovision.sh serverless remove De-provision cloud
./bin/logs.sh serverless logs Get logs from cloud

Arguments

The pytest and serverless arguments can be passed into the underlying CLI tools. For example, to deploy to production use run ./bin/provision.sh --stage production, since ./bin/provision.sh is a wrapper for serverless deploy.

Troubleshooting

AWS DNS is unable to resolve the S3 path for the deploy. To continue developing, try switching the provider region.

Serverless: Recoverable error occurred (Inaccessible host: *.s3.amazonaws.com'. This service may not be available in the us-east-1' region.), sleeping for 5 seconds. Try 4 of 4


Lambda log collection is not supported in ca-central-1.

ServerlessError: No existing streams for the function