Skip to content

Dataset management

tonnpa edited this page Mar 19, 2018 · 3 revisions

How do datasets appear on the website?

The website content is read from the database, and they cannot be found in static HTML files. The database is populated from 3 spreadsheet files:

  1. DatasetInformation.xlsx
  2. DatasetFiles.xlsx
  3. DatasetMetaInformation.xlsx All these files are located in the directory assets/datasets

In the following sections, we will explain in detail what the columns in these files mean.

Dataset Information

Descriptive information of the datasets.

ID: search engine optimized (human- and machine-readable) dataset ID, uniquely identifies a dataset
Collector: the entity collecting the dataset — generally either a partner organization, or a Georgia Tech department
Title: the title of the dataset that appears on the website.
DateFrom: each dataset has a label that tells what time the data was collected. This is the start date.
DateTo: each dataset has a label that tells what time the data was collected. This is the end date. If DateFrom and DateTo match, it will not be displayed duplicated.
Description: the description of the dataset that appears on the website.
ImageCaption: the explanatory caption that appears when we click on a dataset image.
ImageFileName: search engine optimized file name of the accompanying image/visualization for the dataset. All images are stored in the folder assets/media/dataset-visualizations

Dataset Files

File(s) of the datasets. Some datasets may have multiple files associated with them, for example, they may have shapefiles as well as relation (table) data. These files can be downloaded by the site visitors.

ID: dataset ID (the same as defined in the DatasetInformation).
FileName: the name of the dataset file.
Format: the format of the dataset, which appears as tags on the website.

Dataset Meta Information

Column descriptions of the datasets. This is what is displayed on the Meta Information tab.

DatasetId: dataset ID (the same as defined in the DatasetInformation).
Feature: column name — does not have to match the actual column in the dataset file. It is only for displaying information. Not all columns have to be listed.
Description: description of the column.
Comment: if the column has some special term inside, an explanatory comment can be attached to the column. It will appear as a question mark, which can be hovered on to display the comment text.

How are datasets loaded into the database?

Running the command python manage.py import_data loads the data into the database. It executes the code in restapi/management/commands/import_data.py.

How to modify/remove/add datasets?

Re-run the import data command. The command deletes all data and re-imports everything from the spreadsheets. This should be changed when the data grows larger.