Build software better, together

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Aug 13, 2025
Python

lazyFrogLOL / llmdocparser

Star

A package for parsing PDFs and analyzing their content using LLMs.

nlp ocr chunking document-analysis pdf-parser pdfparser rag llm text-chunking

Updated Aug 6, 2024
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Oct 22, 2025
JavaScript

drittich / SemanticSlicer

Star

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

ai embeddings openai gpt chunking chunker gpt-4 azure-openai llm chatgpt chat-gpt langchain text-chunking

Updated Sep 26, 2025
C#

GregorBiswanger / SemanticChunker.NET

Sponsor

Star

Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET

library ai csharp dotnet chunking slm embedding rag llm semantic-kernel semantickernel text-chunking semanticchunker

Updated Jul 27, 2025
C#

This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.

python data-processing text-parsing large-language-models llms text-chunking

Updated Nov 17, 2024
HTML

smart-models / Sentences-Chunker

Star

Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks

nlp docker-compose gpu-acceleration document-processing rag fastapi text-chunking

Updated Sep 30, 2025
Python

philnash / chunkers

Sponsor

Star

An exploration of text splitting and chunking in JavaScript

text-splitter llamaindex langchain-js text-chunking text-splitting

Updated Nov 21, 2024
TypeScript

betcorg / llm-text-splitter

Star

A lightweight TypeScript text splitter for RAG applications

chatbots rag text-splitter text-chunking

Updated Mar 9, 2025
TypeScript

Vivet-Software / Vivet.AI

Star

A service-oriented .NET library for AI with interchangeable orchestrations and vector stores.

chat ai knowledge memory azure inference openai summarization embedding huggingface llm metadata-retrieval ollama amazon-bedrock text-chunking google-gemini context-deduplication

Updated Oct 20, 2025
C#

ushakiranmai / text_summarization

Star

This Text Summarization Tool uses advanced machine learning models to create concise, meaningful summaries of lengthy texts. Built with Hugging Face Transformers and Gradio, it efficiently handles various input lengths, ideal for summarizing articles, reports, and more

web-development file-handling text-summarization gradio-interface text-chunking model-handling output-formats python-libraries-and-tools

Updated Jan 23, 2025
Python

Besthope-Official / predoc

Star

Preprocess document service for RAG (Retriveal Augumented Generation)

api microservice yolo pdf-parser text-embedding document-parser rag text-chunking

Updated Oct 22, 2025
Python

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

Star

A lightweight, modular Retrieval-Augmented Generation (RAG) system built with Streamlit, FAISS, and LLMs like OpenAI and Ollama. Upload documents, embed them, and ask intelligent questions with real-time context-aware responses.

embeddings openai nomic chroma faiss python-nlp rag vector-search streamlit gpt4 langchain ollama text-chunking llama3 llm-app simple-rag document-question-answering pdf-nlp qa-application

Updated Jun 26, 2025
Python

adityapathak-cubastion / cubastion-hr-chatbot

Star

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

python text-generation text-extraction cosine-similarity pinecone huggingface streamlit text-embeddings sentence-transformers prompt-engineering text-chunking llama3

Updated Jan 30, 2025
Python

adityapathakk / cubastion-hr-chatbot

Star

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

python text-generation text-extraction cosine-similarity pinecone huggingface streamlit text-embeddings sentence-transformers prompt-engineering text-chunking llama3

Updated Jan 30, 2025
Python

andrewschenck / ragl

Star

Vector Storage and Retrieval for RAG

python redis information-retrieval semantic-search nlp-machine-learning rag vector-search llm retrieval-augmented-generation text-chunking

Updated Oct 6, 2025
Python

DavidShableski / llm-pdf-analyzer

Star

Self-hosted RAG application for PDF question-answering using LangChain, ChromaDB, and Ollama. Features Flask web interface, vector embeddings, automated chunking, and local LLM inference. Includes CI/CD pipeline with automated testing.

Updated Sep 11, 2025
Python

mohsinraza2999 / Legal-Advisor-using-gpt-neo-1.3B

Star

This project aims to build an AI-powered Legal Advisor that leverages natural language processing and vector search technology to provide users with legal guidance based on authoritative legal texts.

embeddings tokenization similarity-search huggingface vector-database prompt-engineering llms langchain retrieval-augmented-generation llm-pipeline text-chunking

Updated Aug 8, 2025
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-chunking

Here are 18 public repositories matching this topic...

isaacus-dev / semchunk

lazyFrogLOL / llmdocparser

jparkerweb / semantic-chunking

drittich / SemanticSlicer

GregorBiswanger / SemanticChunker.NET

ChenTaHung / HTML-Text-Parser

smart-models / Sentences-Chunker

philnash / chunkers

betcorg / llm-text-splitter

Vivet-Software / Vivet.AI

ushakiranmai / text_summarization

Besthope-Official / predoc

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

adityapathak-cubastion / cubastion-hr-chatbot

adityapathakk / cubastion-hr-chatbot

andrewschenck / ragl

DavidShableski / llm-pdf-analyzer

mohsinraza2999 / Legal-Advisor-using-gpt-neo-1.3B

Improve this page

Add this topic to your repo