Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run baseline captioning against one of the datasets identified in Unified-IO #7

Open
danielnapierski opened this issue Feb 17, 2023 · 2 comments
Assignees

Comments

@danielnapierski
Copy link

The unified-io isi saga-cluster demo does more than baseline captioning, it also does object detection.

The task here would be to write a script (or otherwise implement a feature) to be included in the docker build that allows the docker to run only captioning. The output would need to be recorded/logged.

Object Detection branch:

https://github.com/isi-vista/unified-io-inference/blob/object-detection/Dockerfile

https://github.com/isi-vista/unified-io-inference/blob/object-detection/run.py

The Object Detection branch has a Dockerfile that defines the entry point to run.py. To run only captioning, a caption.py
script could be written. That script could be passed as an argument to the docker execution to run that script instead of the entrypoint (run.py). caption.py would load the model and send only the captioning prompt to each image listed in a file.

I explored the visual_genome dataset and found the associated Python tools worked in a 3.6 env but not the more current 3.9. This presented a challenge for including in the existing docker image which only has a single env. This can be addressed today by adding another env.

I explored using vizwiz tools and started reading JSON caption annotations in a script but did not yet complete that.

@danielnapierski
Copy link
Author

@elizlee is working to download all of VG, hopefully completed by Tuesday.
CC12M download is in process. We have almost 4M of the 12M images, hopefully completed this weekend.
VizWiz is downloaded.

I am working to run captioning on a portion of VG now and expect to have those results by the EoD today. On Tuesday we can begin running caption on the entirety of VG if it is available. Otherwise on Tuesday we can start running captioning against the CC12M dataset by making some changes to the caption script to accommodate the different formats. We can also begin running against the VizWiz dataset at any time, and that requires some dev work to work with VizWiz dataset just as with the others.

@danielnapierski
Copy link
Author

I wrote caption.py for the CC12M dataset, then built the docker and ran this command on sagalg12 to caption 500 images from CC12M using Unified-IO:

docker run -it --gpus=1 -e WEBDATASET_FILE=/input/00000.tar -v /nas/gaia02/data/paper2023/cc12m/images:/input -v /nas/gaia02/users/napiersk/github/feb-14/unified-io-inference/output:/output -e SAMPLE_COUNT=500 unified-io-inference

The results are a TSV file with three columns ('ID', 'CC12M_Caption', 'Unified_IO_Caption')
/nas/gaia02/data/paper2023/results/draft/cc12m-unified-io-output-500-20230220.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants