wallace-thrasher

codename for a website project involving the works of longmont potion castle - you can call me stretchie

Overview

this website allows for searching through subtitles and speakers within the longmont potion castle discography.

this website can currently be viewed at stretchie.delivery and via GitHub Pages

Features

its basic feature is that albums and tracks have pages with the track pages containing the subtitles for the track. its smart feature is that all of this aforementioned data is indexed so that search becomes possible. the neat feature is that the lpc usb collection can be uploaded into the site and then tracks can be easily played, as well as one can jump into a track at the point of when a certain subtitle line is spoken.

Backstory

some time ago, i wanted to know one question - how many calls does alex trebek show up in throughout the discography of lpc?

there are great resources like talkin' whipapedia out there that has detailed info about albums, tracks, their subtitles, and other info, however its data isn't structured in a formal way and therefore is not indexedable in a way that can answer my original question. given that i've been programming since i was in elementary school, i knew i could create something that would tell me, and i wanted it to be something that i could share within the niche community of lpc.

Components

this website is built with the static site generator jekyll. whisper-webui is utilized to analyze the audio tracks and have it output subtitles (what is spoken) that include speaker diarization (determining who says what), which are then transformed into json files. each json file containing a track's speakers and subtitles data must be manually reviewed and corrected as needed. as changes are made, jekyll build recreates the site's pages and combines all JSON data into one single JSON data file (combined_data.json).

because the website is static, there is no server-end processing that occurs (other than serving files) - the searching functions run locally within the browser.

Converting Tracks to Subtitles

i am using whisper-webui (deployed via pinokio) to analyze the .mp3 files using speech-to-text with speaker diarization (who says what) to output subtitle files (.srt)

Converting Subtitles to JSON

i am using this python tool to convert the subtitle files to json, but it also outputs a metadata.json file and a metadata.yml file in accordance to what this project needs

JSON Structure for Albums and Tracks

the main JSON data file resides at /assets/data.json

{
  "Albums": [
    { "Album": "Longmont Potion Castle",
      "Album_Slug": "longmont-potion-castle",
      "Album_Picture": "LPC_1.jpg",
      "Year": 1988,
      "Tracks": [
        {
          "Track_Title": "Longmont Theme 1",
          "Track_Number": 1,
          "Track_JSONPath": "longmont-theme-1.json",
          "Track_Slug": "longmont-theme-1",
          "Aliases": "Wallace Thrasher",
          "Establishments": "UPS",
          "Speakers_Adjusted": "false",
          "Subtitles_Adjusted": "false"
          "USB_Filename": "longmont-theme-1.mp3",
          "Whisper_Model": "distil-whisper/distil-large-v3"
        }
      ]
    }
  ]
}

it is possible that some keys are not present in all tracks, but the necessary ones of Track_Title, Track_Number, Track_JSONPath, and Track_Slug are listed for each track.

JSON Structure for Track Subtitles

the JSON data for each track resides within a folder named as the respective album title's slug with the /assets/json folder

[
    {
        "Index": 1,
        "Start Time": "00:00:02,140",
        "End Time": "00:00:02,920",
        "Speaker": "Woman 1",
        "Text": "Betty Boop Diner."
    },
    {
        "Index": 2,
        "Start Time": "00:00:04,008",
        "End Time": "00:00:08,449",
        "Speaker": "LPC",
        "Text": "Hi, can I please get a take-up or a pick-up?"
    }
]

Under The Hood

when the search pages are accessed, the single combined JSON data (/assets/json/combined_json.data) is retrieved from the server, then lunr indexes the data so that it becomes searchable. lunr currently indexes for two categories - speakers and subtitles.

the keys of USB_Directory and USB_Filename refer to the respective directory and filename of the mp3 that resides on a "LPC Ultimate Session Bundle" usb drive that are occasionally available for sale via lpc's website. these two pieces of data are used to play audio, if the files from the usb collection are uploaded.

Building

to install the project's dependencies, ensure Ruby is installed, then install its necessary gems by running: bundle install; bundle update;

to build, run this command from the jekyll directory: JEKYLL_ENV=development bundle exec jekyll build

to build and start a local web server, run this command from the jekyll directory: JEKYLL_ENV=development bundle exec jekyll serve

when deploying to production, JEKYLL_ENV must be changed to production. the development environment tends to display information within data.json more so than the production environment.

Deployment

commits to the main branch trigger two github actions:

run jekyll build --baseurl "/" to generate the site on the "netlify" branch
run jekyll build --baseurl "/wallace-thrasher" to generate the site on the "gh-pages" branch

the commit to "netlify" is then pulled by netlify to redeploy its copy of the site. the commit to "gh-pages" is then used by github pages

How to Contribute

if you've read this far and have an interest in contributing to this project - it is welcomed and appreciated!

please refer to CONTRIBUTING.md

To-Do's

the to-do list has been moved to TODO.md

Licensing

this project is licensed under the GPLv3, and this license applies to all past versions and branches of the project.

Technical Details

here are various badges related to this project's code and its deployments

-- GitHub Action to publish to GitHub Pages

-- GitHub Action to ready the project for Netlify

-- deployment status to Netlify

-- when last committed to GitHub

-- deployed source code size

-- source code repository size

Name		Name	Last commit message	Last commit date
Latest commit History 482 Commits
.github/workflows		.github/workflows
jekyll		jekyll
python		python
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
gpl-3.0.txt		gpl-3.0.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wallace-thrasher

Overview

Features

Backstory

Components

Converting Tracks to Subtitles

Converting Subtitles to JSON

JSON Structure for Albums and Tracks

JSON Structure for Track Subtitles

Under The Hood

Building

Deployment

How to Contribute

To-Do's

Licensing

Technical Details

About

Releases 18

Packages

Languages

License

willjasen/wallace-thrasher

Folders and files

Latest commit

History

Repository files navigation

wallace-thrasher

Overview

Features

Backstory

Components

Converting Tracks to Subtitles

Converting Subtitles to JSON

JSON Structure for Albums and Tracks

JSON Structure for Track Subtitles

Under The Hood

Building

Deployment

How to Contribute

To-Do's

Licensing

Technical Details

About

Resources

License

Stars

Watchers

Forks

Releases 18

Packages 0

Languages

Packages