Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: RAG from PDF File #1087

Open
areytechai opened this issue Feb 6, 2025 · 3 comments
Open

[Feature]: RAG from PDF File #1087

areytechai opened this issue Feb 6, 2025 · 3 comments

Comments

@areytechai
Copy link

Background & Description

is there a way to use rag methodology with PDF or docx file read from filesystem

API & Usage

No response

How to implement

No response

@zsogitbe
Copy link
Contributor

zsogitbe commented Feb 8, 2025

Yes, you need a library that can extract text from PDF and DOCX files (there are free libraries available), and then you can use the standard procedure for Retrieval-Augmented Generation (RAG) in LLamaSharp.

@areytechai
Copy link
Author

@zsogitbe example of procecudure pleaes

@zsogitbe
Copy link
Contributor

zsogitbe commented Feb 8, 2025

  1. Extract text from a PDF using a free C# tool.
  2. Chunk the extracted text for RAG.
  3. Save the text chunks into Kernel memory with LLamaSharp.
  4. Perform searches within the memory using LLamaSharp.

The quality will depend on the efficiency and cleverness of executing the former steps (it is not as easy as many novice AI 'experts' think).

I am willing to provide an example in exchange for payment (willing to give a quote for this based on your requirements). Otherwise, please look at the basic examples in the LLamaSharp repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants