Object Detection with Caption #3

hungphongtrn · 2024-06-20T06:45:45Z

First, thank you all for open sourcing this fantastic work.

I want to ask whether the object detection with caption feasible with this model and if yes how can I use it?

Thank you in advance!

roman-bachmann · 2024-06-22T09:23:15Z

Thanks for your interest!
I'm not sure I fully understand the question, but 4M is able to both input and output captions and bounding boxes, alongside other modalities. Note that the captions and bounding boxes are not aligned - the captions are at an image-level, while the bounding boxes are labeled with COCO classes.
Please see our Jupyter notebooks for examples.

Best, Roman

hungphongtrn · 2024-07-01T06:51:15Z

Hi @roman-bachmann ,

Sorry @roman-bachmann for not being clear. My question is that given some labels as "captions", can 4M detect the bbox based on the provided "captions"?

Thank you,
Phong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object Detection with Caption #3

Object Detection with Caption #3

hungphongtrn commented Jun 20, 2024

roman-bachmann commented Jun 22, 2024

hungphongtrn commented Jul 1, 2024 •

edited

Loading

Object Detection with Caption #3

Object Detection with Caption #3

Comments

hungphongtrn commented Jun 20, 2024

roman-bachmann commented Jun 22, 2024

hungphongtrn commented Jul 1, 2024 • edited Loading

hungphongtrn commented Jul 1, 2024 •

edited

Loading