Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
diptanu authored Jun 18, 2024
1 parent fb29bd2 commit 882ece5
Showing 1 changed file with 34 additions and 34 deletions.
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,40 @@ context = client.search_index(name="sportsknowledgebase.minilml6.embedding", que

> The method wait_for_extraction blocks the client until Indexify runs the extraction on the ingested content. In production applications you will most likely won't block your application, and let extraction be asynchronous.
### PDF Extraction and Retrieval
This example shows how to create a pipeline that extracts from PDF documents.
More information here - https://docs.getindexify.ai/usecases/pdf_extraction/

#### Create an Extraction Graph
```python
from indexify import IndexifyClient, ExtractionGraph
import requests
client = IndexifyClient()

extraction_graph_spec = """
name: 'pdfqa'
extraction_policies:
- extractor: 'tensorlake/pdfextractor'
name: 'docextractor'
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
```

#### Upload a Document
```python
with open("sample.pdf", 'wb') as file:
file.write((requests.get("https://extractor-files.diptanu-6d5.workers.dev/scientific-paper-example.pdf")).content)
content_id = client.upload_file("pdfqa", "sample.pdf")
```

#### Get Text, Image and Tables
```python
client.wait_for_extraction(content_id)
print(client.get_extracted_content(content_id, "pdfqa", "docextractor"))
```


### Podcast Summarization and Embedding
This example shows how to transcribe audio, and create a pipeline that embeds the transcription
Expand Down Expand Up @@ -182,40 +216,6 @@ client.get_extracted_content(content_id, "imageknowledgebase", "object_detection
print(client.sql_query("select * from imageknowledgebase where object_name='person';"))
```

### PDF Extraction and Retrieval
This example shows how to create a pipeline that extracts from PDF documents.
More information here - https://docs.getindexify.ai/usecases/pdf_extraction/

#### Create an Extraction Graph
```python
from indexify import IndexifyClient, ExtractionGraph
import requests
client = IndexifyClient()

extraction_graph_spec = """
name: 'pdfqa'
extraction_policies:
- extractor: 'tensorlake/pdfextractor'
name: 'docextractor'
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
```

#### Upload a Document
```python
with open("sample.pdf", 'wb') as file:
file.write((requests.get("https://extractor-files.diptanu-6d5.workers.dev/scientific-paper-example.pdf")).content)
content_id = client.upload_file("pdfqa", "sample.pdf")
```

#### Get Text, Image and Tables
```python
client.wait_for_extraction(content_id)
print(client.get_extracted_content(content_id, "pdfqa", "docextractor"))
```

### LLM Framework Integration
Indexify can work with any LLM framework, or with your applications directly. We have an example of a Langchain application [here](https://docs.getindexify.ai/integrations/langchain/python_langchain/) and DSPy [here](https://docs.getindexify.ai/integrations/dspy/python_dspy/).

Expand Down

0 comments on commit 882ece5

Please sign in to comment.