-
Notifications
You must be signed in to change notification settings - Fork 0
Dataset management
The website content is read from the database, and they cannot be found in static HTML files. The database is populated from 3 spreadsheet files:
- DatasetInformation.xlsx
- DatasetFiles.xlsx
- DatasetMetaInformation.xlsx
All these files are located in the directory
assets/datasets
In the following sections, we will explain in detail what the columns in these files mean.
Descriptive information of the datasets.
ID: search engine optimized (human- and machine-readable) dataset ID, uniquely identifies a dataset
Collector: the entity collecting the dataset — generally either a partner organization, or a Georgia Tech department
Title: the title of the dataset that appears on the website.
DateFrom: each dataset has a label that tells what time the data was collected. This is the start date.
DateTo: each dataset has a label that tells what time the data was collected. This is the end date. If DateFrom and DateTo match, it will not be displayed duplicated.
Description: the description of the dataset that appears on the website.
ImageCaption: the explanatory caption that appears when we click on a dataset image.
ImageFileName: search engine optimized file name of the accompanying image/visualization for the dataset. All images are stored in the folder assets/media/dataset-visualizations
File(s) of the datasets. Some datasets may have multiple files associated with them, for example, they may have shapefiles as well as relation (table) data. These files can be downloaded by the site visitors.
ID: dataset ID (the same as defined in the DatasetInformation).
FileName: the name of the dataset file.
Format: the format of the dataset, which appears as tags on the website.
Column descriptions of the datasets. This is what is displayed on the Meta Information tab.
DatasetId: dataset ID (the same as defined in the DatasetInformation).
Feature: column name — does not have to match the actual column in the dataset file. It is only for displaying information. Not all columns have to be listed.
Description: description of the column.
Comment: if the column has some special term inside, an explanatory comment can be attached to the column. It will appear as a question mark, which can be hovered on to display the comment text.
Running the command python manage.py import_data
loads the data into the database. It executes the code in restapi/management/commands/import_data.py
.
Re-run the import data command. The command deletes all data and re-imports everything from the spreadsheets. This should be changed when the data grows larger.