This project was developed under a previous phase of the Yale Digital Humanities Lab. Now a part of Yale Library’s Computational Methods and Data department, the Lab no longer includes this project in its scope of work. As such, it will receive no further updates.
Trails builds interactive plots of massive datasets. You can use it to visualize large text, image, audio, video or other collections 💫
# install the app
pip install git+https://github.com/YaleDHLab/trails.git
# download a sample JSON dataset
wget https://lab-data-collections.s3.amazonaws.com/wiki-people.json
# process the wiki-people.json file using the "abstract" field for vectorization
trails --input "wiki-people.json" --text "abstract" --label "name"
Here we instruct trails to parse wiki-people.json
. We also indicate the "abstract" attribute in the JSON should be vectorized, and the "name" attribute in the JSON should be used as the label for each observation.
Wikipedia People
Oslo Photographic Collection
Harvard Art Museum Collection
Trails can process text files, image files, JSON files, or other filetypes that have already been vectorized.
Text Inputs
To process a text collection with Trails, provide the paths to your text files:
trails --inputs "texts/*.txt"
Image Inputs
To process an image collection with Trails, provide the paths to your image files:
trails --inputs "images/*.jpg"
JSON Inputs
To process a collection of JSON files with Trails, provide the path to your JSON file(s), then indicate the fields that should be used for each item's label
and text
fields:
trails --input "wiki-people.json" --text "abstract" --label "name"
Custom Vectors
If each object in your collection already has been vectorized, format your inputs as JSON, include the vectors in those JSON files, and specify the field that contains the vector when evoking Trails:
trails --input "birdsong.json" --vector "vec"
Custom Positions
If each object in your collection already has a 2D position, just add an x column and a y column to your metadata and specify those columns when evoking Trails:
trails --input "birdsong.json" -x "longitude" -y "latitude"
Custom Metadata
If you have metadata associated with your objects (e.g. you have a collection of text files and a CSV or JSON file with associated metadata), make sure your metadata has filename
as its first column (in case of CSV metadata) or has filename
as an attribute (in case of JSON metadata). Then you can provide your metadata to the data pipeline as follows:
trails --inputs "images/*.jpg" --metadata "image_metadata.json"
Limit Dataset Size
To artificially limit the size of a dataset, use the --limit
flag to only process a small subset of your collection:
trails --inputs "images/*.jpg" --limit 100
Multiple Plots
If you want to create multiple plots in the same directory, use the --output_folder
flag to specify the directory in which the current outputs will be written
trails --inputs "images/*.jpg" --output_folder "catplot"
Trails uses three pieces of data to create interactive displays:
-
Objects
: Objects are the individual items in your dataset (e.g. a text file, or an image). Each point in the scatterplot corresponds to one object. When a user hovers or clicks on a point, we display the corresponding object. After Trails runs, each object is represented by a single JSON file in./output/data/objects/
. Those files are named0.json
throughn-1.json
, wheren
is the number of objects to be displayed. When it's time to display an object, we populate./output/preview.html
with the data from the object's JSON file. If a user clicks the object preview, we populate./output/tooltip.html
with the data from the object's JSON file. -
Positions
: The position of each object is contained in./output/data/positions.json.gz
. The ith object's position is contained at index position i in this file. -
Colors
: The color of each point in the scatterplot is contained in./output/data/colors.json.gz
. The ith object's color is contained at index position i in this file.
To customize the Trails UI, there are three files you may want to modify:
-
./output/custom.css
: Your custom CSS can go in this file, and these styles will overwrite the default styles. -
./output/preview.html
: To change the way objects look when being previewed, change the HTML template in this file. This HTML is a Lodash template. -
./output/preview.html
: To change the way an object looks when a user clicks on the corresponding preview, change the HTML template in this file. This HTML is a Lodash template.