llm-documentcloud

LLM integrations for DocumentCloud

Installation

Install this plugin in the same environment as LLM.

llm install llm-documentcloud

Add your DocumentCloud credentials to your environment variables (likely in your shell profile file):

export DC_USERNAME=""
export DC_PASSWORD=""

Usage

Use the dc: fragment to load documents hosted on DocumentCloud.

# run a basic prompt
llm -f dc:71072 'Summarize this document'

# extract tabular data
llm -f dc:25507045 'Extract the tables in this document as CSV'

Documents can be fetched based on ID alone, ID and slug or full URL. The following are equivalent:

llm -f dc:25507045 'Extract the tables in this document as CSV'
llm -f dc:25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico 'Extract the tables in this document as CSV'
llm -f dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/ 'Extract the tables in this document as CSV'

In each case, a DocumentCloud API client will fetch the document's full text and store it as a fragment for llm.

Using file attachments instead of text

DocumentCloud stores each document in several ways: a PDF file, its extracted text and each page as an image. You can feed each of these into llm using mode parameters:

# use the original PDF as an attachment
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=pdf'

# use each page image as an attachment
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=images'

# this is the same, since "grid" is the mode name used on the documentcloud frontend
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=grid'

# these are all equivalent and will extract full text
llm -f dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=document'
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=text'

Getting specific pages

Sometimes you only want one page. DocumentCloud can link to specific pages, and those URLs can be used here:

# extract text, but only for page 2
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=document#document/p2'

Note that pages are 1-indexed. You can also get images:

# attach the image for page 2
llm -f 'dc:https://www.documentcloud.org/documents/25507045-20250118-ufc-intuit-dome-athlete-pay-and-weights-c-amico/?mode=images#document/p2'

There isn't a way to get a single page out of a PDF, so passing mode=pdf will set page to None.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment using uv:

cd llm-documentcloud
uv sync

To install the dependencies and test dependencies, include the test extras:

uv sync --extra test

To run the tests:

uv run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_documentcloud.py		llm_documentcloud.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-documentcloud

Installation

Usage

Using file attachments instead of text

Getting specific pages

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

eyeseast/llm-documentcloud

Folders and files

Latest commit

History

Repository files navigation

llm-documentcloud

Installation

Usage

Using file attachments instead of text

Getting specific pages

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages