Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Detection with Caption #3

Open
hungphongtrn opened this issue Jun 20, 2024 · 2 comments
Open

Object Detection with Caption #3

hungphongtrn opened this issue Jun 20, 2024 · 2 comments

Comments

@hungphongtrn
Copy link

First, thank you all for open sourcing this fantastic work.

I want to ask whether the object detection with caption feasible with this model and if yes how can I use it?

Thank you in advance!

@roman-bachmann
Copy link
Collaborator

Hi @hungphongtrn,

Thanks for your interest!
I'm not sure I fully understand the question, but 4M is able to both input and output captions and bounding boxes, alongside other modalities. Note that the captions and bounding boxes are not aligned - the captions are at an image-level, while the bounding boxes are labeled with COCO classes.
Please see our Jupyter notebooks for examples.

Best, Roman

@hungphongtrn
Copy link
Author

hungphongtrn commented Jul 1, 2024

Hi @roman-bachmann ,

Sorry @roman-bachmann for not being clear. My question is that given some labels as "captions", can 4M detect the bbox based on the provided "captions"?

Thank you,
Phong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants