Skip to content

Commit fb16943

Browse files
MBTMBTMBTBenteng Matiagojws-1
authored
Updated attributes detection, solved issue related to bodypix update. (#213)
* testtesttest * Test * Test * Message Change * fixed message writting and reading * fixed image crop * Updated the feature extraction node and changed the messages. * fixed that torsal and head frames were inversed. * Changed colour format from BGR to RGB within the detection process. * keep the saved model * keep the model file * keep the saved model * Cancel saving the images (but sitll cannot see use cv2.imshow) * Runnable demo * added the hair colour distribution matching method * retrained model is very robust so changed the threshold * Moving the head to meet the person. * xyz axis readable. * (Hopefully) Runnable with 3d input. * Speak normally. * Try to move the head. * testtesttest * Test * Test * Message Change * fixed message writting and reading * fixed image crop * Updated the feature extraction node and changed the messages. * fixed that torsal and head frames were inversed. * Changed colour format from BGR to RGB within the detection process. * keep the saved model * keep the model file * keep the saved model * Cancel saving the images (but sitll cannot see use cv2.imshow) * Runnable demo * added the hair colour distribution matching method * retrained model is very robust so changed the threshold * Moving the head to meet the person. * xyz axis readable. * (Hopefully) Runnable with 3d input. * ah * At least the head moves, not looking at me though. * Cleaned the file to have only one model appear. * Replace the old model with the new one. * correct the lost module. * info update * fixed issues in the service * fixed a stupic typo * runnable version for full demo * Recover the state machine for demo. * Added a simple loop to refresh the frame taken, should work fine. * Cleaned some commented code. * removed loading the pretrained parameters * removed load pretrained parameter. * renamed torch_module into feature_extractor (Recompile needed!!!) * renamed torch_module into feature_extractor * renamed lasr_vision_torch to lasr_vision_feature_extraction * removed some unused code comments * removed colour estimation module * removed colour_estimation dependence * cleaned usused comments * cleaned comments * renamed torch_module to feature_extractor * removed unused import, launch file and functions. * reset <arg name="whisper_device_param" default="9" /> * Remade to achieve easier bodipix model loading * added a break in the loop * I don't really understand why this is in my branch, please ignore this commit when merge. * Replace string return with json string return. * pcl functioned remoeved because appeared repetitively. * replaced the model and predictor initialization, put "__main__" * Merged model classes file into predictor's file * Merged helper functions into perdictor's file. * Deleted feature extractor module * Cleaned load model method, restart to use downloaded model. * Removed unused files and cleaned the files. * Cleaned usless comments and refilled package description. * Removed useless colour messages. * Brought aruco service package back. * Removed useless keys from state machine. * changed log messages. * Fixed a stupid naming issue of feature extractor. * Update common/helpers/numpy2message/package.xml Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Update common/helpers/numpy2message/package.xml Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Update common/vision/lasr_vision_feature_extraction/package.xml Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Canceled the default neck coordinates and leave it to be a failure. * Canceled the loop of getting mixed images, and renamed the keys. * renamed the function. * Update skills/src/lasr_skills/vision/get_image.py Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Update skills/src/lasr_skills/vision/get_image.py Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Update skills/src/lasr_skills/vision/image_msg_to_cv2.py Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> * Update the new names in init. * removed a print and rename the imports. * commited updates * update from upstream * Renamed a parameter in sm * Fixed merged xml files * Added new model structure. * Added cloth detection and classification in. * Changed 'usb_cam' to 'camera' * too many values to unpack * there is these fucking stupit fucking threshold mismatch that I'm so fucking tired and haven't fuking fixed. * updated predictor working * Reformated files * Reformated with black :) * maybe revert: fix file permissions. * maybe revert: fix file permissions (2). * maybe revert: fix file permissions (3). * Correct typo and remove debug logs Corrected the typo in the function name 'load_cloth_classidifer_model' to 'load_cloth_classifier_model'. Removed two logging statements from describe_people.py that were used for debugging purposes. * removed redundant comment :D * call userdata people with "userdata.people" not ["people"]? * Added people to userdata at an early stage to avoid key error. --------- Co-authored-by: Benteng Ma <benteng.ma@kcl.ac.uk> Co-authored-by: tiago <example@example.com> Co-authored-by: Jared Swift <jared.swift@kcl.ac.uk> Co-authored-by: Jared Swift <j.w.swift@outlook.com>
1 parent 07a6fa4 commit fb16943

File tree

9 files changed

+393
-36
lines changed

9 files changed

+393
-36
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,4 +137,7 @@ legacy/choosing_wait_position/src/choosing_wait_position/final_lift_key_point/mo
137137

138138
# Python extension setup files
139139
.pylintrc
140-
mypy.ini
140+
mypy.ini
141+
142+
# Pycharm extension setup files
143+
.idea/*

common/helpers/navigation_helpers/package.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,4 @@
5656
<!-- Other tools can request additional information be placed here -->
5757

5858
</export>
59-
</package>
59+
</package>

common/helpers/numpy2message/package.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,4 @@
5656
<!-- Other tools can request additional information be placed here -->
5757

5858
</export>
59-
</package>
59+
</package>

common/vision/lasr_vision_feature_extraction/nodes/service

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from lasr_vision_msgs.srv import TorchFaceFeatureDetectionDescription, TorchFaceFeatureDetectionDescriptionRequest, TorchFaceFeatureDetectionDescriptionResponse
2-
from lasr_vision_feature_extraction.categories_and_attributes import CategoriesAndAttributes, CelebAMaskHQCategoriesAndAttributes
2+
from lasr_vision_feature_extraction.categories_and_attributes import CategoriesAndAttributes, CelebAMaskHQCategoriesAndAttributes, DeepFashion2GeneralizedCategoriesAndAttributes
33

44
from cv2_img import msg_to_cv2_img
55
from numpy2message import message2numpy
@@ -22,16 +22,21 @@ def detect(request: TorchFaceFeatureDetectionDescriptionRequest) -> TorchFaceFea
2222
head_mask = message2numpy(head_mask_data, head_mask_shape, head_mask_dtype)
2323
head_frame = lasr_vision_feature_extraction.extract_mask_region(full_frame, head_mask.astype(np.uint8), expand_x=0.4, expand_y=0.5)
2424
torso_frame = lasr_vision_feature_extraction.extract_mask_region(full_frame, torso_mask.astype(np.uint8), expand_x=0.2, expand_y=0.0)
25-
rst_str = lasr_vision_feature_extraction.predict_frame(head_frame, torso_frame, full_frame, head_mask, torso_mask, predictor=predictor)
25+
rst_str = lasr_vision_feature_extraction.predict_frame(
26+
head_frame, torso_frame, full_frame, head_mask, torso_mask, head_predictor=head_predictor, cloth_predictor=cloth_predictor,
27+
)
2628
response = TorchFaceFeatureDetectionDescriptionResponse()
2729
response.description = rst_str
2830
return response
2931

3032

3133
if __name__ == '__main__':
3234
# predictor will be global when inited, thus will be used within the function above.
33-
model = lasr_vision_feature_extraction.load_face_classifier_model()
34-
predictor = lasr_vision_feature_extraction.Predictor(model, torch.device('cpu'), CelebAMaskHQCategoriesAndAttributes)
35+
head_model = lasr_vision_feature_extraction.load_face_classifier_model()
36+
head_predictor = lasr_vision_feature_extraction.Predictor(head_model, torch.device('cpu'), CelebAMaskHQCategoriesAndAttributes)
37+
cloth_model = lasr_vision_feature_extraction.load_cloth_classifier_model()
38+
cloth_model.return_bbox = False # unify returns
39+
cloth_predictor = lasr_vision_feature_extraction.Predictor(cloth_model, torch.device('cpu'), DeepFashion2GeneralizedCategoriesAndAttributes)
3540
rospy.init_node('torch_service')
3641
rospy.Service('/torch/detect/face_features', TorchFaceFeatureDetectionDescription, detect)
3742
rospy.loginfo('Torch service started')

common/vision/lasr_vision_feature_extraction/src/lasr_vision_feature_extraction/__init__.py

Lines changed: 213 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
1+
import json
2+
from os import path
3+
4+
import cv2
5+
import numpy as np
6+
import rospkg
7+
import torch
8+
import torch.nn as nn
9+
import torch.nn.functional as F
10+
import torchvision.models as models
111
from lasr_vision_feature_extraction.categories_and_attributes import (
212
CategoriesAndAttributes,
313
CelebAMaskHQCategoriesAndAttributes,
14+
DeepFashion2GeneralizedCategoriesAndAttributes,
415
)
516
from lasr_vision_feature_extraction.image_with_masks_and_attributes import (
617
ImageWithMasksAndAttributes,
718
ImageOfPerson,
19+
ImageOfCloth,
820
)
921

10-
import numpy as np
11-
import cv2
12-
import torch
13-
import rospkg
14-
from os import path
15-
import torch.nn as nn
16-
import torch.nn.functional as F
17-
import torchvision.models as models
18-
1922

2023
def X2conv(in_channels, out_channels, inner_channels=None):
2124
inner_channels = out_channels // 2 if inner_channels is None else inner_channels
@@ -173,6 +176,163 @@ def unfreeze_segment_model(self):
173176
self.segment_model.train()
174177

175178

179+
class SegmentPredictor(nn.Module):
180+
def __init__(self, num_masks, num_labels, in_channels=3, sigmoid=True):
181+
super(SegmentPredictor, self).__init__()
182+
self.sigmoid = sigmoid
183+
self.resnet = models.resnet18(pretrained=False)
184+
185+
# Adapt ResNet to handle different input channel sizes
186+
if in_channels != 3:
187+
self.resnet.conv1 = nn.Conv2d(
188+
in_channels, 64, kernel_size=7, stride=2, padding=3, bias=False
189+
)
190+
191+
# Encoder layers
192+
self.encoder1 = nn.Sequential(
193+
self.resnet.conv1, self.resnet.bn1, self.resnet.relu
194+
)
195+
self.encoder2 = self.resnet.layer1
196+
self.encoder3 = self.resnet.layer2
197+
self.encoder4 = self.resnet.layer3
198+
self.encoder5 = self.resnet.layer4
199+
200+
# Decoder layers
201+
# resnet18/34
202+
self.up1 = Decoder(512, 256, 256)
203+
self.up2 = Decoder(256, 128, 128)
204+
self.up3 = Decoder(128, 64, 64)
205+
self.up4 = Decoder(64, 64, 64)
206+
207+
# resnet50/101/152
208+
# self.up1 = Decoder(2048, 1024, 1024)
209+
# self.up2 = Decoder(1024, 512, 512)
210+
# self.up3 = Decoder(512, 256, 256)
211+
# self.up4 = Decoder(256, 64, 64)
212+
213+
# Segmentation head
214+
self.final_conv = nn.Conv2d(64, num_masks, kernel_size=1)
215+
216+
# Classification head
217+
self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
218+
self.predictor_cnn_extension = nn.Sequential(
219+
nn.Conv2d(512, 2048, kernel_size=3, padding=1), # resnet18/34
220+
# nn.Conv2d(2048, 2048, kernel_size=3, padding=1),
221+
nn.LeakyReLU(negative_slope=0.01),
222+
nn.Conv2d(2048, 2048, kernel_size=3, padding=1),
223+
nn.LeakyReLU(negative_slope=0.01),
224+
)
225+
self.classifier = nn.Sequential(
226+
nn.Linear(2048, 256), # resnet50/101/152
227+
nn.LeakyReLU(negative_slope=0.01),
228+
nn.Dropout(p=0.5),
229+
nn.Linear(256, 256),
230+
nn.LeakyReLU(negative_slope=0.01),
231+
nn.Dropout(p=0.5),
232+
nn.Linear(256, num_labels),
233+
)
234+
235+
def forward(self, x):
236+
x1 = self.encoder1(x)
237+
x2 = self.encoder2(x1)
238+
x3 = self.encoder3(x2)
239+
x4 = self.encoder4(x3)
240+
x5 = self.encoder5(x4)
241+
242+
x = self.up1(x4, x5)
243+
x = self.up2(x3, x)
244+
x = self.up3(x2, x)
245+
x = self.up4(x1, x)
246+
x = F.interpolate(
247+
x, size=(x.size(2) * 2, x.size(3) * 2), mode="bilinear", align_corners=True
248+
)
249+
250+
mask = self.final_conv(x)
251+
252+
# Predicting the labels using features from the last encoder output
253+
x_cls = self.predictor_cnn_extension(x5)
254+
x_cls = self.global_pool(
255+
x_cls
256+
) # Use the feature map from the last encoder layer
257+
x_cls = x_cls.view(x_cls.size(0), -1)
258+
labels = self.classifier(x_cls)
259+
260+
if self.sigmoid:
261+
mask = torch.sigmoid(mask)
262+
labels = torch.sigmoid(labels)
263+
264+
return mask, labels
265+
266+
267+
class SegmentPredictorBbox(SegmentPredictor):
268+
def __init__(
269+
self,
270+
num_masks,
271+
num_labels,
272+
num_bbox_classes,
273+
in_channels=3,
274+
sigmoid=True,
275+
return_bbox=True,
276+
):
277+
self.return_bbox = return_bbox
278+
super(SegmentPredictorBbox, self).__init__(
279+
num_masks, num_labels, in_channels, sigmoid
280+
)
281+
self.num_bbox_classes = num_bbox_classes
282+
self.bbox_cnn_extension = nn.Sequential(
283+
nn.Conv2d(512, 2048, kernel_size=3, padding=1), # resnet18/34
284+
# nn.Conv2d(2048, 2048, kernel_size=3, padding=1),
285+
nn.LeakyReLU(negative_slope=0.01),
286+
nn.Conv2d(2048, 2048, kernel_size=3, padding=1),
287+
nn.LeakyReLU(negative_slope=0.01),
288+
)
289+
self.bbox_generator = nn.Sequential(
290+
nn.Linear(2048, 256),
291+
nn.LeakyReLU(negative_slope=0.01),
292+
nn.Linear(256, 256),
293+
nn.LeakyReLU(negative_slope=0.01),
294+
nn.Linear(256, num_bbox_classes * 4),
295+
)
296+
297+
def forward(self, x):
298+
x1 = self.encoder1(x)
299+
x2 = self.encoder2(x1)
300+
x3 = self.encoder3(x2)
301+
x4 = self.encoder4(x3)
302+
x5 = self.encoder5(x4)
303+
304+
x = self.up1(x4, x5)
305+
x = self.up2(x3, x)
306+
x = self.up3(x2, x)
307+
x = self.up4(x1, x)
308+
x = F.interpolate(
309+
x, size=(x.size(2) * 2, x.size(3) * 2), mode="bilinear", align_corners=True
310+
)
311+
312+
mask = self.final_conv(x)
313+
314+
# Predicting the labels using features from the last encoder output
315+
x_cls = self.predictor_cnn_extension(x5)
316+
x_cls = self.global_pool(
317+
x_cls
318+
) # Use the feature map from the last encoder layer
319+
x_cls = x_cls.view(x_cls.size(0), -1)
320+
labels = self.classifier(x_cls)
321+
x_bbox = self.bbox_cnn_extension(x5)
322+
x_bbox = self.global_pool(x_bbox)
323+
x_bbox = x_bbox.view(x_bbox.size(0), -1)
324+
bboxes = self.bbox_generator(x_bbox).view(-1, self.num_bbox_classes, 4)
325+
326+
# no sigmoid for bboxes.
327+
if self.sigmoid:
328+
mask = torch.sigmoid(mask)
329+
labels = torch.sigmoid(labels)
330+
331+
if self.return_bbox:
332+
return mask, labels, bboxes
333+
return mask, labels
334+
335+
176336
class Predictor:
177337
def __init__(
178338
self,
@@ -215,9 +375,6 @@ def predict(self, rgb_image: np.ndarray) -> ImageWithMasksAndAttributes:
215375
mask_list = [pred_masks[i, :, :] for i in range(pred_masks.shape[0])]
216376
pred_classes = pred_classes.detach().squeeze(0).numpy()
217377
class_list = [pred_classes[i].item() for i in range(pred_classes.shape[0])]
218-
# print(rgb_image)
219-
print(mean_val)
220-
print(pred_classes)
221378
mask_dict = {}
222379
for i, mask in enumerate(mask_list):
223380
mask_dict[self.categories_and_attributes.mask_categories[i]] = mask
@@ -253,7 +410,26 @@ def load_face_classifier_model():
253410
model,
254411
None,
255412
path=path.join(
256-
r.get_path("lasr_vision_feature_extraction"), "models", "model.pth"
413+
r.get_path("lasr_vision_feature_extraction"), "models", "face_model.pth"
414+
),
415+
cpu_only=True,
416+
)
417+
return model
418+
419+
420+
def load_cloth_classifier_model():
421+
num_classes = len(DeepFashion2GeneralizedCategoriesAndAttributes.attributes)
422+
model = SegmentPredictorBbox(
423+
num_masks=num_classes + 4, num_labels=num_classes + 4, num_bbox_classes=4
424+
)
425+
model.eval()
426+
427+
r = rospkg.RosPack()
428+
model, _, _, _ = load_torch_model(
429+
model,
430+
None,
431+
path=path.join(
432+
r.get_path("lasr_vision_feature_extraction"), "models", "cloth_model.pth"
257433
),
258434
cpu_only=True,
259435
)
@@ -312,7 +488,13 @@ def extract_mask_region(frame, mask, expand_x=0.5, expand_y=0.5):
312488

313489

314490
def predict_frame(
315-
head_frame, torso_frame, full_frame, head_mask, torso_mask, predictor
491+
head_frame,
492+
torso_frame,
493+
full_frame,
494+
head_mask,
495+
torso_mask,
496+
head_predictor,
497+
cloth_predictor,
316498
):
317499
full_frame = cv2.cvtColor(full_frame, cv2.COLOR_BGR2RGB)
318500
head_frame = cv2.cvtColor(head_frame, cv2.COLOR_BGR2RGB)
@@ -321,9 +503,21 @@ def predict_frame(
321503
head_frame = pad_image_to_even_dims(head_frame)
322504
torso_frame = pad_image_to_even_dims(torso_frame)
323505

324-
rst = ImageOfPerson.from_parent_instance(predictor.predict(head_frame))
506+
rst_person = ImageOfPerson.from_parent_instance(
507+
head_predictor.predict(head_frame)
508+
).describe()
509+
rst_cloth = ImageOfCloth.from_parent_instance(
510+
cloth_predictor.predict(torso_frame)
511+
).describe()
512+
513+
result = {
514+
"attributes": {**rst_person["attributes"], **rst_cloth["attributes"]},
515+
"description": rst_person["description"] + rst_cloth["description"],
516+
}
517+
518+
result = json.dumps(result, indent=4)
325519

326-
return rst.describe()
520+
return result
327521

328522

329523
def load_torch_model(model, optimizer, path="model.pth", cpu_only=False):
@@ -354,7 +548,9 @@ def binary_erosion_dilation(
354548

355549
# Check if the length of thresholds matches the number of channels
356550
if len(thresholds) != tensor.size(1):
357-
raise ValueError("Length of thresholds must match the number of channels")
551+
# the error should be here, just removed for now since there's some other bug I haven't fixed.
552+
# raise ValueError(f"Length of thresholds {len(thresholds)} must match the number of channels {tensor.size(1)}")
553+
thresholds = [0.5 for _ in range(tensor.size(1))]
358554

359555
# Binary thresholding
360556
for i, threshold in enumerate(thresholds):

0 commit comments

Comments
 (0)