-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1721 from CentreForDigitalHumanities/feature/docu…
…ment-frontend-settings update + expand documentation
- Loading branch information
Showing
17 changed files
with
362 additions
and
116 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Downloads | ||
|
||
I-analyzer offers several types of downloads to users. This document gives a high-level overview of the types of downloads that exist and where they are implemented. | ||
|
||
## Downloading search results | ||
|
||
We distinguish between two types of downloads: *direct* download and *scheduled* downloads. | ||
|
||
For the user, a direct download means their browser will start downloading the file then and there. With a scheduled download, the user will receive an [email](./Email.md) when their download is complete. Scheduled downloads are only available if the user is signed in. | ||
|
||
I-analyzer will automatically choose which type of download to use, based on the number of documents. The cutoff point is configured in the [frontend environment](./Frontend-environment-settings.md#directdownloadlimit). | ||
|
||
### Direct downloads | ||
|
||
Direct downloads are executed synchronously. There is an API endpoint to request the download, which will return the requested file. | ||
|
||
### Scheduled downloads | ||
|
||
Scheduled downloads are run with [Celery](./Celery.md). | ||
|
||
The server will query elasticsearch to fetch matching documents. This is done in batches of 10.000 documents using the [scroll api](https://elasticsearch-py.readthedocs.io/en/v8.15.1/api/elasticsearch.html#elasticsearch.client.Elasticsearch.scroll). | ||
|
||
Documents are written to a CSV file in the server file system ([configured with `CSV_FILES_PATH`](./Django-project-settings.md#csv_files_path)) per batch. This means the server does not need to store the complete results in memory. | ||
|
||
When the CSV file is complete, the user receives an email. | ||
|
||
When the user downloads the complete file, they can choose additional options; at this point, this is just a choice for the file encoding. (We offer utf-16 encoding for compatability with Microsoft Excel.) | ||
|
||
File encoding is less time-consuming to process than fetching data, so it is handled at this point rather than in the initial processing. It also means the user can request a different encoding without re-doing the download. | ||
|
||
When the user requests the download, the backend will either stream the file as-is, or, if the encoding needs to be changed, save a *converted* CSV file and stream that. | ||
|
||
## Downloading visualisation results | ||
|
||
### Downloading image files | ||
|
||
When a user views a visualisation, they can always choose between a graphical view and a table. | ||
|
||
With the graphical view, the user can download the graph as a PNG file. We use the `html-to-image` library to render the image from the page. The [VisualizationComponent](../frontend/src/app/visualization/visualization.component.ts) contains a method to select the HTML element that should be rendered, based on the type of visualisation. | ||
|
||
### Downloading table data | ||
|
||
The table view can be downloaded as a CSV. This file is generated by the frontend, using the data it already has available. | ||
|
||
### Downloading full data | ||
|
||
Some visualisations base their result on a sample of documents to limit computation time, but offer the user an option to download statistics for the full data. | ||
|
||
This happens for the term frequency visualisation and the ngram visualisation. | ||
|
||
For these download, a request is sent to the backend and handled asynchronously - similar to the scheduled download. When the user downloads the file, they can choose the encoding, and also pick between long and wide format. | ||
|
||
## Downloads in the database | ||
|
||
The [`Download` model](../backend/download/models.py) is used to keep track of a user's downloads. | ||
|
||
The table includes all search results downloads, and full data downloads for visualisations. It does not include other visualisation downloads, as those are generated in the frontend. | ||
|
||
## Download limit | ||
|
||
Each user account has a download limit. By default, this is 10.000 documents. You can set this in the admin site, to allow individual users to download more documents. | ||
|
||
Use this with caution on production servers. Note that the server may also have request timeouts that will effectively prevent users from being able to download large files, even if they are allowed to generate them. |
Oops, something went wrong.