Parse Traveller books into other formats.
By books, we mean any piece of writing related to traveller or a similar RPG system (e.g. rulebooks, content books, roll tables, etc.).
Why do we need to parse books?
This is because most books are copyrighted. Distributing the content of these books without explicit permission is illegal. It's pretty safe to assume that there are going to be publishers who won't allow free distribution of their content, for obvious reasons.
NB: This project only contains descriptions of books, not the content within. To get the content, you need to purchase the original books.
As a show of goodwill, we want to explicitly ask publishers if they are okay with this script supporting parsing their content.
Here's a list of publishers who have said that this is fine:
- Stellagama Publishing
- Mongoose Publishing
Feel free to open an issue if you are a publisher and interested in this.
Some publishers might allow free distribution of their content. This is definitely an avenue to look into.
Stellagama Publishing specifically has shown interest in this.
We distinctly split the application into 2 steps:
- parsing books to Traveller objects
- outputting something from Traveller objects (e.g. their JSON representation)
The intention is that the traveller objects in JSON format can be used for any other application.
First, we convert the content within books into a specific format - to Traveller objects.
Traveller objects are defined as different (Traveller themed) sub-models, using type
field to differentiate them:
characteristic
- A characteristic of a character (e.g. strength).item
- An item (e.g. sword, phone, book).skill
- A skill (e.g. athletics, pilot, science).character
- A character.
Each sub-model has its own fields. Sub-models may have further subtypes with additional fields. These are intended to be usable with any edition of Traveller. (But completely untested so Your Mileage May Vary.)
Run traveller-book-parser schema TravObject
for the full JSON schema.
See traveller_book_parser/traveller_models
for the actual models.
The code that runs is identical for all books. This makes it easier to add new books.
To account for differences between books, there are 'book description' files.
These are JSON files describing the book (see book_descriptions
folder for examples).
After parsing the books, we can output the content in various forms. Currently, we only support outputting the direct JSON of all the "Traveller objects" parsed (as described above).
If you have any suggestions or want to help, feel free to get in touch!
-
- This is used by Tabula to extract tables from PDFs.
-
pdftohtml (version 4.x) from XpdfReader
-
This is used to convert PDFs to HTML. To then be parsed further.
-
Installing pdftohtml:
- It's available in package managers under the name
xpdf-tools
(e.g. in Scoop). - It is pre-packaged with some Linux distributions (e.g. Ubuntu).
- You can download it here (under "Download the Xpdf command line tools").
Note: If
pdftohtml
is not globally installed, you can setPDF_TO_HTML_EXECUTABLE
env var to the location of the executable. - It's available in package managers under the name
-
Note: The code is tested on Windows 11. But it should work fine on Linux and possibly Mac.
- Clone this repository.
- Install dependencies using poetry:
poetry install
- Run the CLI to see available commands:
poetry run traveller-book-parser
- You can also run
poetry shell
to start a new sub-shell. And then run the CLI withtraveller-book-parser
.
- You can also run
There is a cli.ps1
PowerShell script that does everything above (passing any arguments to the CLI).
The script can be configured using environment variables.
(You can create a .env
file in the root directory to set these as well.)
See traveller_book_parser/settings/settings.py
for a list of all settings.
You can also run:
traveller-book-parser schema Settings
This will dump the JSON schema of the Settings
model (by default to /data/output/schema/Settings.json
).
This project is open to contributions. Feel free to open an issue or pull request.
Install just to run utility commands.
There are more just
commands available. Take a look at justfile for all commands.
To run linters, run:
just lint
To run tests, run:
just test
To run tests and update snapshots, run:
just test_update