Skip to content

Add keypoint-detection task to Hub #870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions packages/tasks/src/pipelines.ts
Original file line number Diff line number Diff line change
@@ -656,6 +656,18 @@ export const PIPELINE_DATA = {
name: "Video-Text-to-Text",
modality: "multimodal",
color: "blue",
hideInDatasets: false,
},
"keypoint-detection": {
name: "Keypoint Detection",
subtasks: [
{
type: "pose-estimation",
name: "Pose Estimation",
},
],
modality: "cv",
color: "red",
hideInDatasets: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any datasets for this on the Hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just set to false

},
other: {
2 changes: 2 additions & 0 deletions packages/tasks/src/tasks/index.ts
Original file line number Diff line number Diff line change
@@ -126,6 +126,7 @@ export const TASKS_MODEL_LIBRARIES: Record<PipelineType, ModelLibraryKey[]> = {
"image-to-image": ["diffusers", "transformers", "transformers.js"],
"image-to-text": ["transformers", "transformers.js"],
"image-to-video": ["diffusers"],
"keypoint-detection": ["transformers"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Cause there's no keypoint detection pipeline in Transformers yet

Copy link
Contributor Author

@merveenoyan merveenoyan Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not about pipelines, it's to show sort of which libraries support this task, e.g. in tasks
Screenshot 2024-08-28 at 12 20 12

we currently have SuperPoint and will soon have ViTPose in transformers so we do support it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently have SuperPoint and will soon have ViTPose

I think we should tag those models then before the PR is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened PRs to 30-40 models and some are merged already I think, see

also Sapiens model will have a lot of keypoint models soon (it was one unified model repo, I sent authors a script to automate separation)

Screenshot 2024-08-29 at 14 02 42

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@pcuenca pcuenca Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened PRs to 30-40 models and some are merged

Yes, I see 4 models now 👍

"video-classification": ["transformers"],
"mask-generation": ["transformers"],
"multiple-choice": ["transformers"],
@@ -205,6 +206,7 @@ export const TASKS_DATA: Record<PipelineType, TaskData | undefined> = {
"image-text-to-text": getData("image-text-to-text", imageTextToText),
"image-to-text": getData("image-to-text", imageToText),
"image-to-video": undefined,
"keypoint-detection": getData("keypoint-detection", placeholder),
"mask-generation": getData("mask-generation", maskGeneration),
"multiple-choice": undefined,
"object-detection": getData("object-detection", objectDetection),
59 changes: 59 additions & 0 deletions packages/tasks/src/tasks/keypoint-detection/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
## Task Variants

### Pose Estimation

Pose estimation is the process of determining the position and orientation of an object or a camera in a 3D space. It is a fundamental task in computer vision and is widely used in various applications such as robotics, augmented reality, and 3D reconstruction.

## Use Cases for Keypoint Detection

### Facial Landmark Estimation

Keypoint detection models can be used to estimate the position of facial landmarks. Facial landmarks are points on the face such as the corners of the mouth, the outer corners of the eyes, and the tip of the nose. These landmarks can be used for a variety of applications, such as facial expression recognition, 3D face reconstruction, and cinematic animation.

### Fitness Tracking

Keypoint detection models can be used to track the movement of the human body, e.g. position of the joints in a 3D space. This can be used for a variety of applications, such as fitness tracking, sports analysis or virtual reality applications.

## Inference Code

Below you can find an example of how to use a keypoint detection model and how to visualize the results.

```python
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
import matplotlib.pyplot as plt
from PIL import Image
import requests

url_image = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url_image_1, stream=True).raw)

# initialize the model and processor
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

# infer
inputs = processor(image, return_tensors="pt").to(model.device, model.dtype)
outputs = model(**inputs)

# visualize the output
image_width, image_height = image.size
image_mask = outputs.mask
image_indices = torch.nonzero(image_mask).squeeze()

image_scores = outputs.scores.squeeze()
image_keypoints = outputs.keypoints.squeeze()
keypoints = image_keypoints.detach().numpy()
scores = image_scores.detach().numpy()

plt.axis('off')
plt.imshow(image)
plt.scatter(
keypoints[:, 0],
keypoints[:, 1],
s=scores * 100,
c='cyan',
alpha=0.4
)
plt.show()
```
46 changes: 46 additions & 0 deletions packages/tasks/src/tasks/keypoint-detection/data.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import type { TaskDataCustom } from "..";

const taskData: TaskDataCustom = {
datasets: [
{
description: "A dataset of hand keypoints of over 500k examples.",
id: "Vincent-luo/hagrid-mediapipe-hands",
},
],
demo: {
inputs: [
{
filename: "keypoint-detection-input.png",
type: "img",
},
],
outputs: [
{
filename: "keypoint-detection-output.png",
type: "img",
},
],
},
metrics: [],
models: [
{
description: "A robust keypoint detection model.",
id: "magic-leap-community/superpoint",
},
{
description: "Strong keypoint detection model used to detect human pose.",
id: "qualcomm/MediaPipe-Pose-Estimation",
},
],
spaces: [
{
description: "An application that detects hand keypoints in real-time.",
id: "datasciencedojo/Hand-Keypoint-Detection-Realtime",
},
],
summary: "Keypoint detection is the task of identifying meaningful distinctive points or features in an image.",
widgetModels: [],
youtubeId: "",
};

export default taskData;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" width="1em" height="1em" viewBox="0 0 32 32" {...$$props}><path fill="currentColor" d="m28.316 13.949l-.632-1.898L17 15.612V4h-2v11.612L4.316 12.051l-.632 1.898l10.684 3.561L7.2 27.066l1.6 1.201l7.2-9.6l7.2 9.6l1.6-1.201l-7.168-9.556z"/></svg>
Original file line number Diff line number Diff line change
@@ -43,6 +43,7 @@
import IconImageTo3D from "../Icons/IconImageTo3D.svelte";
import IconImageFeatureExtraction from "../Icons/IconImageFeatureExtraction.svelte";
import IconVideoTextToText from "../Icons/IconVideoTextToText.svelte";
import IconKeypointDetection from "../Icons/IconKeypointDetection.svelte";
import type { WidgetType } from "@huggingface/tasks";

export let classNames = "";
@@ -96,6 +97,7 @@
"image-to-3d": IconImageTo3D,
"image-feature-extraction": IconImageFeatureExtraction,
"video-text-to-text": IconVideoTextToText,
"keypoint-detection": IconKeypointDetection,
};

$: iconComponent =