Reverse-Prompting

Generating prompts from images. It can help in labeling dataset for Generative models like stable diffusion. It can also be used to help creators find and mimic inspirations from the creation of other artists.

img_2_txt Models

LLaVA GPT-4 BLIP-2 OpenFlamingo

MiniGPT4

How usually the finetuning of LLaVA happens?

Pretraining (2 hours)
Extract projector features
Visual instruction tuning (1 hour)

*authors have directly provided projector features. So we can directly jump to Visual instruction tuning…

Data Source

https://www.kaggle.com/datasets/ldmtwo/midjourney-250k-csv

Data Formats

detail_23k.json

{
    "id": "000000442786",
    "image": "000000442786.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "What do you see happening in this image?\n<image>"
      },
      {
        "from": "gpt",
        "value": "The scene depicts a lively plaza area with several people walking and enjoying their time. A man is standing in the plaza with his legs crossed, holding a kite in his hand. The kite has multiple sections attached to it, spread out in various directions as if ready for flight.\n\nNumerous people are scattered throughout the plaza, walking and interacting with others. Some of these individuals are carrying handbags, and others have backpacks. The image captures the casual, social atmosphere of a bustling plaza on a nice day."
      }
    ]
}

conversation_58k.json

{
    "id": "000000033471",
    "image": "000000033471.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "What are the colors of the bus in the image?\n<image>"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      },
      {
        "from": "human",
        "value": "What feature can be seen on the back of the bus?"
      },
      {
        "from": "gpt",
        "value": "The back of the bus features an advertisement."
      },
      {
        "from": "human",
        "value": "Is the bus driving down the street or pulled off to the side?"
      },
      {
        "from": "gpt",
        "value": "The bus is driving down the street, which is crowded with people and other vehicles."
      }
    ]
  }

complex_reasoning_77k.json

{
    "id": "000000215677",
    "image": "000000215677.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat skill set might someone need to perform such a frisbee trick?"
      },
      {
        "from": "gpt",
        "value": "To perform the frisbee trick shown in the image, where the man is passing a frisbee between or underneath his legs, a person would need a combination of skills. These skills include good hand-eye coordination, agility, balance, flexibility, and dexterity. Additionally, the ability to throw and catch the frisbee accurately while maintaining control of bodily movements would also be essential. To perfect the trick, practicing these skills and building up muscle memory through repetition would be beneficial."
      }
    ]
  }

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
MiniGPT4 fine tuning.ipynb		MiniGPT4 fine tuning.ipynb
MiniGPT4.png		MiniGPT4.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reverse-Prompting

img_2_txt Models

MiniGPT4

How usually the finetuning of LLaVA happens?

Data Source

Data Formats

About

Uh oh!

Releases

Packages

Languages

License

jaydeep-work/Reverse-Prompting

Folders and files

Latest commit

History

Repository files navigation

Reverse-Prompting

img_2_txt Models

MiniGPT4

How usually the finetuning of LLaVA happens?

Data Source

Data Formats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages