reference-archive

Extract reference URLs and DOIs from an IEEE-style reference list and download the references to the filesystem.

Features

downloads webpages as a single file in MHTML format
- including client-rendered content
- uses puppeteer-extra-plugin-stealth to evade anti-bot measures
downloads PDF files and other file types from URLs
downloads papers with a DOI from sci-hub
- if you make use of this feature, consider donating to sci-hub

Usage

This software requires node.js, version 18 or newer.

Before you use, install the dependencies:

npm install  # or pnpm install, or yarn install, etc.

Use from the command line

Put your references in a plaintext file, e.g. references.txt
Create a target directory, e.g. ./archive
Run:

node index.js references.txt ./archive

Use as a library

import { extractAndSaveAllURLs } from "./lib.js";

await extractAndSaveAllURLs(
    '[1] First reference. [Online]. Available: https://jfhr.de/reference-archive/example.pdf (Accessed: 2023-07-23)\n' +
    '[2] Second reference. [Online]. Available: https://jfhr.de/reference-archive/example.html (Accessed: 2023-07-23)\n' +
    '[3] Third reference. [Online]. Available: https://jfhr.de/reference-archive/cr.html (Accessed: 2023-07-23)\n' +
    '[4] S. DeRisi, R. Kennison and N. Twyman, The What and Whys of DOIs. doi: 10.1371/journal.pbio.0000057\n' +
    '[5] N. Paskin, "Digital Object Identifier (DOI) System", Encyclopedia of Library and Information Sciences (3rd ed.)\n',
    './archive/'
);

Example

Say you have the following reference list:

[1] First reference. [Online]. Available: https://jfhr.de/reference-archive/example.pdf (Accessed: 2023-07-23)
[2] Second reference. [Online]. Available: https://jfhr.de/reference-archive/example.html (Accessed: 2023-07-23)
[3] Third reference. [Online]. Available: https://jfhr.de/reference-archive/cr.html (Accessed: 2023-07-23)
[4] S. DeRisi, R. Kennison and N. Twyman, The What and Whys of DOIs. doi: 10.1371/journal.pbio.0000057
[5] N. Paskin, "Digital Object Identifier (DOI) System", Encyclopedia of Library and Information Sciences (3rd ed.)

reference-archive would download the following files to your filesystem:

1.pdf    # PDF file from URL
2.mhtml  # Single file web page from URL
3.mhtml  # Single file web page from URL, including client-rendered content
4.pdf    # PDF file with DOI from sci-hub
         # no 5.* - reference 5 has no URL and no DOI

Test

To run tests:

node --test

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
lib.js		lib.js
package.json		package.json
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reference-archive

Features

Usage

Use from the command line

Use as a library

Example

Test

About

Releases

Packages

Languages

License

jfhr/reference-archive

Folders and files

Latest commit

History

Repository files navigation

reference-archive

Features

Usage

Use from the command line

Use as a library

Example

Test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages