Skip to content

Commit dd17870

Browse files
authored
chore: update READMEs for dspy example, update imports, etc. (#1366)
1 parent 902e5c9 commit dd17870

File tree

4 files changed

+6
-4
lines changed

4 files changed

+6
-4
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ It defines an index flow like this:
204204
| [Patient intake form extraction](examples/patient_intake_extraction) | Use LLM to extract structured data from patient intake forms with different formats |
205205
| [HackerNews Trending Topics](examples/hn_trending_topics) | Extract trending topics from HackerNews threads and comments, using *CocoIndex Custom Source* and LLM |
206206
| [Patient Intake Form Extraction with BAML](examples/patient_intake_extraction_baml) | Extract structured data from patient intake forms using BAML |
207+
| [Patient Intake Form Extraction with DSPy](examples/patient_intake_extraction_dspy) | Extract structured data from patient intake forms using DSPy |
207208

208209
More coming and stay tuned 👀!
209210

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Check out our [examples documentation](https://cocoindex.io/docs/examples) for m
3030

3131
- 🏥 [**patient_intake_extraction**](./patient_intake_extraction) - Extract structured data from patient intake forms (PDF, Docx) using LLM
3232
- 🏥 [**patient_intake_extraction_baml**](./patient_intake_extraction_baml) - Extract structured data from patient intake PDFs using BAML
33+
- 🏥 [**patient_intake_extraction_dspy**](./patient_intake_extraction_dspy) - Extract structured data from patient intake PDFs using DSPy
3334
- 📖 [**manuals_llm_extraction**](./manuals_llm_extraction) - Extract structured information from PDF manuals using Ollama
3435
- 📄 [**paper_metadata**](./paper_metadata) - Extract metadata (title, authors, abstract) from research papers in PDF
3536
- 📝 [**meeting_notes_graph**](./meeting_notes_graph) - Extract structured meeting info from Google Drive and build a knowledge graph

examples/patient_intake_extraction_dspy/main.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import dspy
44
from pydantic import BaseModel, Field
5-
import fitz # PyMuPDF
5+
import pymupdf
66

77
import cocoindex
88

@@ -106,12 +106,12 @@ def extract_patient(pdf_content: bytes) -> Patient:
106106
"""Extract patient information from PDF content."""
107107

108108
# Convert PDF pages to DSPy Image objects
109-
pdf_doc = fitz.open(stream=pdf_content, filetype="pdf")
109+
pdf_doc = pymupdf.open(stream=pdf_content, filetype="pdf")
110110

111111
form_images = []
112112
for page in pdf_doc:
113113
# Render page to pixmap (image) at 2x resolution for better quality
114-
pix = page.get_pixmap(matrix=fitz.Matrix(2, 2))
114+
pix = page.get_pixmap(matrix=pymupdf.Matrix(2, 2))
115115
# Convert to PNG bytes
116116
img_bytes = pix.tobytes("png")
117117
# Create DSPy Image from bytes

examples/patient_intake_extraction_dspy/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ dependencies = [
77
"cocoindex>=0.3.9",
88
"dspy-ai>=3.0.4",
99
"pydantic>=2.0.0",
10-
"pymupdf>=1.24.0",
10+
"pymupdf>=1.26.5",
1111
]
1212

1313
[tool.setuptools]

0 commit comments

Comments
 (0)