Skip to content
This repository has been archived by the owner on Nov 5, 2024. It is now read-only.

Bedrock Rag is missing ppt, pptx text splitting support with knowledgebase queries #819

Closed
2 tasks
converseKarl opened this issue Sep 14, 2024 · 6 comments
Closed
2 tasks
Assignees
Labels
feature-request New feature or request p2 service-api This issue pertains to the AWS API

Comments

@converseKarl
Copy link

Describe the feature

Current support but is behind other systems at moment,

format: "pdf" || "csv" || "doc" || "docx" || "xls" || "xlsx" || "html" || "txt" || "md", // required

but in rag you need ppt, pptx, (powerpoint splitting)

to 100% complete you need
mp4, mp3, youtube URL, youtube channel and JSON (someone implied its in there but i've not seen it)

Use Case

You have everything else except powerpoint, you have word, excel, txt, csv, html but no powerpoint.

A lot of information is in powerpoints, company info, results, and numerous presentations for training so ragifying them and using the information is quite a substantial set of user cases.

Proposed Solution

implement the embedding extraction from powerpoints (like you do with PDF's). If your using langchain in the background, its 5 minute job to add the PPT/PPTX conversion as a loader type but I don't know your underlying implementation.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

3.651.1

Environment details (OS name and version, etc.)

Linux Debian, Nodejs / EC2 or even using Lamba AWS direct

@converseKarl converseKarl added feature-request New feature or request needs-triage labels Sep 14, 2024
@Gab1988
Copy link

Gab1988 commented Sep 17, 2024

This is a big gap in the functionality - it is quite a simple addition. Anthropic supports it already so at least Claude models should be able to support pptx very easily
Thanks!

@zshzbh zshzbh self-assigned this Sep 17, 2024
@converseKarl
Copy link
Author

This is a big gap in the functionality - it is quite a simple addition. Anthropic supports it already so at least Claude models should be able to support pptx very easily Thanks!

Indeed, but text spliting happens before any LLM query, but any model including titan should be able to use it easily, same as PDF's.

@zshzbh zshzbh added p2 service-api This issue pertains to the AWS API and removed needs-triage labels Sep 18, 2024
@zshzbh
Copy link

zshzbh commented Sep 18, 2024

Thanks for the feedback!
AWS SDKs uses the exported models from the service API, but AWS service API doesn't support ppt&pptx yet -
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html

I think this is a good addition to the bedrock service API. I will move this issue to a cross SDK issue and open a feature request to the service team!

Thanks!
Maggie

@zshzbh zshzbh added the response-requested This issue requires a response to continue label Sep 18, 2024
@zshzbh zshzbh transferred this issue from aws/aws-sdk-js-v3 Sep 18, 2024
@zshzbh zshzbh removed the response-requested This issue requires a response to continue label Sep 23, 2024
@zshzbh
Copy link

zshzbh commented Sep 23, 2024

Added Product Feature Request, title - "Add ppt, pptx text splitting support in Bedrock Rag knowledge base query"

@zshzbh
Copy link

zshzbh commented Sep 25, 2024

Thanks again for the feature request.The Bedrock team is continuing to track this in their backlog for consideration. We're going to close this on our end as the service team would need to take the next steps here. Please refer to the blog or CHANGELOG for updates, or feel free to reach out through support if you have a support plan.

Thanks!
Maggie

@zshzbh zshzbh closed this as completed Sep 25, 2024
Copy link

This issue is now closed.

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request New feature or request p2 service-api This issue pertains to the AWS API
Projects
None yet
Development

No branches or pull requests

3 participants