VirtualPainter is a computer vision application that allows users to draw on their screen using hand gestures captured by a webcam. The project consists of two main modules:
app.py- Main application with virtual painting interfacedetectlib.py- Detection library with hand, face, and pose detection capabilities
The main application creates a virtual painting interface where users can draw using hand gestures.
| Variable | Type | Default | Description |
|---|---|---|---|
cam_width |
int | 1280 | Camera capture width in pixels |
cam_height |
int | 720 | Camera capture height in pixels |
brush_color |
tuple | (0, 0, 255) | Current brush color in BGR format (red by default) |
brush_thickness |
int | 20 | Thickness of the drawing brush |
eraser_thickness |
int | 150 | Thickness of the eraser |
- Real-time hand tracking using MediaPipe
- Gesture-based drawing with index finger
- Color selection using two-finger gesture on header
- Eraser functionality with black color selection
- Canvas overlay system for persistent drawing
-
Selection Mode: Two fingers up (index + middle)
- Navigate header to select colors/tools
- Color options: Red, Green, Blue, Magenta, Black (eraser)
-
Drawing Mode: Index finger up only
- Draw on canvas with selected color and thickness
- Automatic line drawing between finger positions
WRIST = 0
THUMB_CMC = 1
THUMB_MCP = 2
THUMB_IP = 3
THUMB_TIP = 4
INDEX_FINGER_MCP = 5
INDEX_FINGER_PIP = 6
INDEX_FINGER_DIP = 7
INDEX_FINGER_TIP = 8
MIDDLE_FINGER_MCP = 9
MIDDLE_FINGER_PIP = 10
MIDDLE_FINGER_DIP = 11
MIDDLE_FINGER_TIP = 12
RING_FINGER_MCP = 13
RING_FINGER_PIP = 14
RING_FINGER_DIP = 15
RING_FINGER_TIP = 16
PINKY_FINGER_MCP = 17
PINKY_FINGER_PIP = 18
PINKY_FINGER_DIP = 19
PINKY_FINGER_TIP = 20HandDetector(static_image_mode=False, max_num_hands=2, model_complexity=1,
min_detection_confidence=0.5, min_tracking_confidence=0.5)Parameters:
static_image_mode(bool): Whether to treat input as static imagesmax_num_hands(int): Maximum number of hands to detectmodel_complexity(int): Model complexity (0, 1, or 2)min_detection_confidence(float): Minimum confidence for hand detectionmin_tracking_confidence(float): Minimum confidence for hand tracking
detect_hands(img, show_hlabel=True, draw_landmarks=False, draw_rect=True, angle_thickness=5, angle_length=20, draw_color=(0, 255, 0), draw_thickness=1)
Detects hands in the input image and optionally draws bounding boxes and landmarks.
Parameters:
img(numpy.ndarray): Input imageshow_hlabel(bool): Show hand labelsdraw_landmarks(bool): Draw hand landmarksdraw_rect(bool): Draw bounding rectanglesangle_thickness(int): Thickness of corner anglesangle_length(int): Length of corner anglesdraw_color(tuple): Color for drawing (BGR format)draw_thickness(int): Thickness of drawn lines
Returns:
numpy.ndarray: Image with drawn detections
Returns the number of detected hands.
Returns:
int: Number of detected hands (0 if none detected)
find_position(img, hand_index=0, draw=True, draw_color=(255, 0, 0), draw_thickness=15, draw_size=15)
Finds and returns landmark positions for a specific hand.
Parameters:
img(numpy.ndarray): Input imagehand_index(int): Index of the hand to analyzedraw(bool): Whether to draw landmarks on imagedraw_color(tuple): Color for drawing landmarksdraw_thickness(int): Thickness of drawn circlesdraw_size(int): Size of drawn circles
Returns:
list: List of landmarks in format [landmark_id, x, y]
Returns the count of detected hands (alternative to hands_count()).
Returns:
intorNone: Number of detected hands
Draws a circle at a specific landmark position.
Parameters:
img(numpy.ndarray): Input imagelandmark_index(int): Index of the landmark to markhand_index(int): Index of the handcolor(tuple): Circle color (BGR format)thickness(int): Circle thicknesssize(int): Circle size
Determines which fingers are extended upward.
Returns:
list: Array of 5 boolean values [thumb, index, middle, ring, pinky]- 1 = finger is up
- 0 = finger is down
FaceDetector(min_detection_confidence=0.5, model_selection=0)Parameters:
min_detection_confidence(float): Minimum confidence for face detectionmodel_selection(int): Model selection (0 for close-range, 1 for full-range)
Detects faces in the input image.
Parameters:
img(numpy.ndarray): Input imagedraw(bool): Whether to draw detection results
Returns:
tuple: (processed_image, bounding_boxes_list)processed_image(numpy.ndarray): Image with drawn detectionsbounding_boxes_list(list): List of [detection_id, bbox, confidence_score]
fancy_draw(img, bbox, draw_color=(0, 255, 0), rect_thickness=1, angle_size=30, angle_thickness=5) [Static Method]
Draws a stylized bounding box with corner angles.
Parameters:
img(numpy.ndarray): Input imagebbox(tuple): Bounding box coordinates (x, y, width, height)draw_color(tuple): Drawing color (BGR format)rect_thickness(int): Rectangle line thicknessangle_size(int): Size of corner anglesangle_thickness(int): Thickness of corner angles
Returns:
numpy.ndarray: Image with drawn bounding box
PoseDetector(static_image_mode=False, model_complexity=1, smooth_landmarks=True,
enable_segmentation=False, smooth_segmentation=True,
min_detection_confidence=0.5, min_tracking_confidence=0.5)Parameters:
static_image_mode(bool): Whether to treat input as static imagesmodel_complexity(int): Model complexity (0, 1, or 2)smooth_landmarks(bool): Whether to smooth landmarksenable_segmentation(bool): Whether to enable pose segmentationsmooth_segmentation(bool): Whether to smooth segmentationmin_detection_confidence(float): Minimum confidence for pose detectionmin_tracking_confidence(float): Minimum confidence for pose tracking
Detects pose landmarks in the input image.
Parameters:
img(numpy.ndarray): Input imagedraw(bool): Whether to draw pose landmarks and connections
Returns:
numpy.ndarray: Image with drawn pose landmarks
import cv2
import numpy as np
import detectlib as dlib
# Initialize camera and detector
cap = cv2.VideoCapture(0)
cap.set(3, 1280) # Width
cap.set(4, 720) # Height
detector = dlib.HandDetector(min_detection_confidence=0.5)
canvas = np.zeros((720, 1280, 3), np.uint8)
# Drawing parameters
brush_color = (0, 0, 255) # Red
brush_thickness = 5
prev_x, prev_y = 0, 0
while True:
ret, img = cap.read()
img = cv2.flip(img, 1)
# Detect hands
img = detector.detect_hands(img, draw_color=brush_color)
landmarks = detector.find_position(img, draw=False)
if len(landmarks) != 0:
# Get finger positions
x1, y1 = landmarks[dlib.INDEX_FINGER_TIP][1:]
x2, y2 = landmarks[dlib.MIDDLE_FINGER_TIP][1:]
# Check finger states
fingers = detector.fingers_up()
# Drawing mode (index finger only)
if fingers[1] and not fingers[2]:
if prev_x == 0 and prev_y == 0:
prev_x, prev_y = x1, y1
cv2.line(canvas, (prev_x, prev_y), (x1, y1), brush_color, brush_thickness)
prev_x, prev_y = x1, y1
else:
prev_x, prev_y = 0, 0
# Combine camera feed with canvas
img = cv2.bitwise_or(img, canvas)
cv2.imshow('Virtual Painter', img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()import cv2
import detectlib as dlib
# Initialize all detectors
hand_detector = dlib.HandDetector()
face_detector = dlib.FaceDetector()
pose_detector = dlib.PoseDetector()
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if ret:
# Detect hands, faces, and pose
frame = hand_detector.detect_hands(frame, draw_color=(0, 255, 0))
frame, faces = face_detector.detect_faces(frame)
frame = pose_detector.detect_landmarks(frame)
# Display counts
hand_count = hand_detector.hands_count()
face_count = len(faces)
cv2.putText(frame, f'Hands: {hand_count}', (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.putText(frame, f'Faces: {face_count}', (10, 70),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imshow('Multi-Modal Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()import cv2
import detectlib as dlib
detector = dlib.HandDetector()
cap = cv2.VideoCapture(0)
def recognize_gesture(fingers):
"""Recognize hand gestures based on finger positions"""
if fingers == [0, 1, 0, 0, 0]:
return "Pointing"
elif fingers == [0, 1, 1, 0, 0]:
return "Peace Sign"
elif fingers == [1, 1, 1, 1, 1]:
return "Open Hand"
elif fingers == [0, 0, 0, 0, 0]:
return "Fist"
elif fingers == [1, 0, 0, 0, 0]:
return "Thumbs Up"
else:
return "Unknown"
while True:
ret, frame = cap.read()
if ret:
frame = detector.detect_hands(frame)
landmarks = detector.find_position(frame, draw=False)
if len(landmarks) != 0:
fingers = detector.fingers_up()
gesture = recognize_gesture(fingers)
cv2.putText(frame, f'Gesture: {gesture}', (10, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Gesture Recognition', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()import cv2
import detectlib as dlib
detector = dlib.HandDetector(min_detection_confidence=0.7)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if ret:
frame = detector.detect_hands(frame, draw_rect=True, draw_color=(0, 255, 0))
landmarks = detector.find_position(frame, draw=False)
if len(landmarks) != 0:
# Get index finger tip position
x, y = landmarks[dlib.INDEX_FINGER_TIP][1:]
print(f"Index finger tip at: ({x}, {y})")
# Draw circle at index finger tip
detector.draw_circle(frame, dlib.INDEX_FINGER_TIP, color=(0, 0, 255))
cv2.imshow('Hand Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()import cv2
import detectlib as dlib
face_detector = dlib.FaceDetector(min_detection_confidence=0.7)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if ret:
frame, faces = face_detector.detect_faces(frame)
print(f"Detected {len(faces)} faces")
cv2.imshow('Face Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()import cv2
import detectlib as dlib
pose_detector = dlib.PoseDetector(min_detection_confidence=0.6)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if ret:
frame = pose_detector.detect_landmarks(frame, draw=True)
cv2.imshow('Pose Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()The project requires the following Python packages:
opencv-python(cv2)mediapipenumpyrequests(for app.py)
pip install opencv-python mediapipe numpy requests- All color values use BGR format (Blue, Green, Red)
- Coordinate system origin (0,0) is at the top-left corner
- Hand landmarks are numbered 0-20 following MediaPipe convention
- The application is optimized for real-time performance with webcam input
- For best results, ensure good lighting and clear hand visibility