At a high level, my program works like this:
- It launches a server at localhost:7777 that redirects you to a login page
- Once you log in, the program receives an OAuth token it uses to start scanning your emails
- We log each message to the console as we scan it, along with any public file URLs we found. At the end, we list all the unique public file URLs we found, along with some basic stats (# of emails scanned, # successful scans, # failed scans).
- Well, we don't actually log to the console. If you run
yarn start
, we pipe the console output to a frontail server, so you can follow the output in the browser.
Email scanning in particular works like this:
- Get the IDs for the emails in your inbox, using Gmail's Node SDK
- For each ID, get the message associated with that ID
- Then, for each message, parse its MIME content tree to find the message's text/plain and text/html parts, as well as any plaintext file attachments
- Extract and concatenate all the text from these parts/attachments
- Use the
get-urls
to get all the URLs from the text - Narrow down the URLs to those that look like Google Drive or Dropbox file URLs
- Make a request to all these URLs to determine which file URLs are public (based upon status code)
- Combine our URLs from all messages, de-duplicate, and return the resulting list
src/index.ts
- creates Express server for auth, includes main function for scanning emailssrc/message.ts
- gets message IDs and retrieves messages, for steps 1 and 2 abovesrc/lib/extract_urls.ts
- exposesgetUrlsFromMessage
, which given a message does steps 3 to 5 abovesrc/lib/file_url.ts
- exposesgetFileUrls
, which does step 6 abovesrc/lib/public_file_url.ts
- exposesgetPublicUrls
, which does step 7 abovesrc/lib/unique_urls.ts
- exposesgetUniqueUrls
, which de-duplicates a list of URLs for step 8 above
This graph was generated by dependency-cruiser.