Skip to content

Langchain lib proprietary file extensions #8

@johnvanderton

Description

@johnvanderton

The RAG service is extracting documents thanks to the TextLoader which is only able to parse txt file extensions. Extend the other available formats from langchain as below,

Loader Package / Import Handles Notes
TextLoader langchain/document_loaders/fs/text .txt, raw text Simplest loader
PDFLoader langchain/document_loaders/fs/pdf .pdf Uses pdf-parse under the hood
DocxLoader langchain/document_loaders/fs/docx .docx Uses mammoth to extract text
CSVLoader langchain/document_loaders/fs/csv .csv Reads CSV to text or structured docs
JSONLoader langchain/document_loaders/fs/json .json Converts JSON keys/values to documents
EPubLoader langchain/document_loaders/fs/epub .epub Extracts EPUB chapters and text
DirectoryLoader langchain/document_loaders/fs/directory A folder Recursively loads supported files

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions