langchain-pull-md
is a Python package that extends LangChain by providing a markdown loader from URLs using the pull.md
service. This package enables the fetching of fully rendered Markdown content, which is especially useful for web pages that utilize JavaScript frameworks such as React, Angular, and Vue.js.
- Convert URLs to Markdown directly, supporting pages rendered with JavaScript frameworks.
- Efficiently fetch markdown without local server resource consumption using the external
pull.md
service.
To install the package, use:
pip install langchain-pull-md
Here’s how you can use the PullMdLoader
from langchain-pull-md
:
from langchain_pull_md import PullMdLoader
# Initialize using a URL
loader = PullMdLoader(url="http://example.com")
documents = loader.load()
print(documents)
Parameter | Type | Default | Description |
---|---|---|---|
url |
str |
None | The URL to fetch and convert to Markdown. |
To run the tests:
-
Clone the repository:
git clone https://github.com/chigwell/langchain-pull-md cd langchain-pull-md
-
Install development dependencies:
pip install -r requirements.txt
-
Run the tests:
pytest tests/test_markdown_loader.py
Contributions are welcome! If you have ideas for new features or spot a bug, feel free to:
- Open an issue on GitHub.
- Submit a pull request.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.