Contributing

Overview

At a high level, my program works like this:

It launches a server at localhost:7777 that redirects you to a login page
Once you log in, the program receives an OAuth token it uses to start scanning your emails
We log each message to the console as we scan it, along with any public file URLs we found. At the end, we list all the unique public file URLs we found, along with some basic stats (# of emails scanned, # successful scans, # failed scans).
Well, we don't actually log to the console. If you run yarn start, we pipe the console output to a frontail server, so you can follow the output in the browser.

Email scanning in particular works like this:

Get the IDs for the emails in your inbox, using Gmail's Node SDK
For each ID, get the message associated with that ID
Then, for each message, parse its MIME content tree to find the message's text/plain and text/html parts, as well as any plaintext file attachments
Extract and concatenate all the text from these parts/attachments
Use the get-urls to get all the URLs from the text
Narrow down the URLs to those that look like Google Drive or Dropbox file URLs
Make a request to all these URLs to determine which file URLs are public (based upon status code)
Combine our URLs from all messages, de-duplicate, and return the resulting list

src/index.ts - creates Express server for auth, includes main function for scanning emails
src/message.ts - gets message IDs and retrieves messages, for steps 1 and 2 above
src/lib/extract_urls.ts - exposes getUrlsFromMessage, which given a message does steps 3 to 5 above
src/lib/file_url.ts - exposes getFileUrls, which does step 6 above
src/lib/public_file_url.ts - exposes getPublicUrls, which does step 7 above
src/lib/unique_urls.ts - exposes getUniqueUrls, which de-duplicates a list of URLs for step 8 above

This graph was generated by dependency-cruiser.