20,000 Image & Video caption data of human action contains 20,000 images and 10,000 videos of various human behaviors in different seasons and different shooting angles, including indoor scenes and outdoor scenes. The description language is English, mainly describing the gender, age, clothing, behavior description and body movements of the characters.
For more details, please refer to the link: https://www.nexdata.ai/datasets/llm/1289?source=Github
10,000 images, 1,000 videos
Caucasian, black
male, female
from teenagers to old age, mainly young and middle-aged
including indoor scenes and outdoor scenes
different age groups, different collection environments, different seasons, various shooting angles, and various human behaviors
image format is .jpg, video format is .mp4, text format is .txt
English, Chinese
in principle, 30~60 words, usually 3-5 sentences
gender, age, clothing, behavior description, body movements
the proportion of correctly labeled images is not less than 97%
Commercial License