The source code in this repository is a rapid implementation to automate data retrieval from a more complex web application.
This repository is mainly built around Node and has been tested with v14.17.0 LTS
Clone the repository and enter the directory it is cloned in and make sure you install dependencies with npm install
.
# - HEADLESS={true, false}
# runs without browser or showing browser respectively
# - DEBUG=laevitas-scaper*
# runs displaying results of individual steps
Different ways in which this program can be run
1. get eth expiries without debugging or logging
node index.js eth
2. manual call to show browser and log everything for btc
HEADLESS=false DEBUG=laevitas-scaper* node index.js btc
3. same as the above but without showing browser
npm run verbose
4. same as 2 but shorthand
npm run debug
5. same as 1 but for btc
npm run start
The repository has been optimised to run on Heroku, the steps below will get you set up with the run as a one-time process which sends an email with the download link for scraped data. This process can be run on-demand or at a set interval. All steps and execution will be described below. [WIP]
Some resources that are useful to have a look at
- discord.js npm discord package
- aws-sdk npm aws-sdk package
- heroku scheduler heroku scheduler add-on
- promise-pool npm promise-pool package
- lodash most epic toolbelt for javascript coding
- puppeteer headless chrome browser automation