Support a document loader using the MarkItDown library #28958
Closed
Simon-Stone
started this conversation in
Ideas
Replies: 2 comments
-
I've started working on PR #28960 but I'm running into the challenge that Is there any precedent for how to deal with this? Or am I just SOL until the powers that be decide to drop support for 3.9? |
Beta Was this translation helpful? Give feedback.
0 replies
-
As I learned when the PR was closed, an integration like this should live in its own package. That's neat, because it will allow it to have different dependencies than the LangChain core packages. So that solves it! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Implement a document loader class that uses the MarkItDown library to convert files into Markdown. From the MarkItDown README:
Motivation
Formatting and layout of documents can convey important context information to the human reader. Using text-based markup languages like Markdown can be very helpful to preserve this information in the text-only representation passed to Large Language Models. The Markitdown library is a great, versatile tool to easily convert different kinds of files into a Markdown representation.
Proposal
I propose a simple
MarkitdownLoader
class derived fromBaseLoader
usingmarkitdown.MarkItDown
under the hood (composition pattern).I will try my hand at this and offer a PR soon.
Beta Was this translation helpful? Give feedback.
All reactions