Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automated way to fetch a local snapshot of domains #47

Open
mrchrisadams opened this issue Nov 22, 2021 · 5 comments
Open

Add automated way to fetch a local snapshot of domains #47

mrchrisadams opened this issue Nov 22, 2021 · 5 comments

Comments

@mrchrisadams
Copy link
Member

We present daily snapshots here of the green domains dataset, using datasette.

https://datasets.thegreenwebfoundation.org/

We also have an endpoint for fetching the compressed dumps here as well:

https://api.thegreenwebfoundation.org/admin/green-urls

Having an npm script to fetch these would help to make it easy to perform fast local lookups instead of hammering the API all the time.

@fershad
Copy link
Contributor

fershad commented Mar 8, 2023

@mrchrisadams was this ever implemented, or should I put it on the backlog in our roadmap?

@mrchrisadams
Copy link
Member Author

hi @fershad I think sitespeed use this, and they have daily script they use, that relies on using something like wget to fetch the latest file from the list below, then converting it to a JSON file which they load, and store in memory for fast lookups

https://api.thegreenwebfoundation.org/admin/green-urls

You can see some of the logic here:

https://github.com/thegreenwebfoundation/co2.js/blob/main/src/hosting-json.node.js#L22

TBH, now that localstorage is implemented natively in a fair few browsers as well as runtimes like deno, it might be worth revisiting this this to natively support local lookups, as there are speed benefits as well as privacy ones.

Also our snapshots have a "last modified" column, and because the most popular urls are tend to be the most recently updated onesm we could use column to make different size snapshots based on required use case:

This would be ideal for the browser extension.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Storage_API
https://deno.land/manual@v1.31.1/runtime/web_storage_api

@soulgalore
Copy link
Contributor

Hi! No actually we do not update it, I manually update it when I remember, so if we could have a built in way to do that, it would be great :)

@tackaberry
Copy link

Hi @mrchrisadams and @fershad. I can pick this one up.

Would it be accurate to say that both a node (eg run as a npm script, save to disk) and browser (eg download to localhost) approach are needed?

@tackaberry
Copy link

I have part of this done already. It's the node side of things. Im not 100% clear on the use case but we could work through it together.

https://github.com/tackaberry/co2.js/blob/bt/fetch_greenurls_db/data/functions/fetch_greenurls_db.js

Runnning this as node data/functions/fetch_greenurls_db.js downloads the db and creates a local json file or node data/functions/fetch_greenurls_db.js 2021-01-01 do the same thing for the next version after a certain date.

It creates files like:

/co2.js$ ls -la domains*
-rwxrwxrwx 1 b b 335863808 May  9 13:30 domains.db
-rwxrwxrwx 1 b b 101046280 May  9 13:30 domains.db.gz
-rwxrwxrwx 1 b b       183437 May  9 13:30 domains.json

This wouldnt work in a browser but we could adapt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants