API and UI for bulk loading data into Datasette from a URL
Install this plugin in the same environment as Datasette.
datasette install datasette-load
This plugin does not require configuration - by default it downloads files to the system temp directory and swaps them into the current working directory once they have been verified as valid SQLite.
The plugin provides two optional settings to control which directories are used here:
plugins:
datasette-load:
staging_directory: /tmp
database_directory: /home/location
staging_directory
is used for the initial download. Files will be deleted from here if the download fails.
If the download succeeds (and the database integrity check passes) the file will be moved into the database_directory
folder. This defaults to the directory in which the Datasette application was started if you do not otherwise configure it.
Users and API tokens with the datasette-load
permission can visit /-/load
where they can provide a URL to a SQLite database file and the name it should use within Datasette to trigger a download of that SQLite database.
You can assign that permission to the root
user by starting Datasette like this:
datasette -s permissions.datasette-load.id root --root
Or with the following configuration in the datasette -c datasette.yaml
file:
permissions:
datasette-load:
id: root
API tokens with that permission can use this API:
POST /-/load
{"url": "https://s3.amazonaws.com/til.simonwillison.net/tils.db", "name": "tils"}
This tells Datasette to download the SQLite database from the given URL and use it to create (or replace) the /tils
database in the Datasette instance.
That API endpoint returns:
{
"id": "1D2A2328-199E-4D4D-AF3B-967131ADB795",
"url": "https://s3.amazonaws.com/til.simonwillison.net/tils.db",
"name": "tils",
"done": false,
"error": null,
"todo_bytes": 20250624,
"done_bytes": 0,
"status_url": "https://blah.datasette/-/load/status/1D2A2328-199E-4D4D-AF3B-967131ADB795"
}
The status_url
can be polled for completion. It will return the same JSON format.
When the download has finished the API will return "done": true
and either "error": null
if it worked or "error": "error description"
if something went wrong.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-load
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
python -m pytest