Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion error: Data rank or shape of sizes input is required to static #46

Open
leets0429 opened this issue Oct 25, 2022 · 28 comments
Open
Assignees

Comments

@leets0429
Copy link

I have a model that was created using Azure Machine Learning Studio.

When I try to convert the onnx model, I received the following error. What does it mean and how should I correct it?

Screenshot from 2022-10-25 11-10-54
Screenshot from 2022-10-25 11-10-48
Screenshot from 2022-10-25 11-10-42

@Erol444
Copy link
Member

Erol444 commented Oct 25, 2022

Hi @leets0429 ,
Could you share the onnx model as well, so we can try to convert it?
Thanks, Erik

@Erol444
Copy link
Member

Erol444 commented Oct 25, 2022

@tersekmatija could someone from your team look into it? I think it would be valuable to also have a tutorial on how to convert&deploy Azure Machine Learning Studio model(s) to OAK cam.
Thanks, Erik

@tersekmatija
Copy link
Contributor

@Erol444 I think the problem with general support for models from Azure ML is that likely not all models are supported from the drag-and-drop tools they provide. Also, customers can train their own custom models. For example the model from @leets0429 seems to be like that - either custom one or one provided by Azure ML. It seems to have some operations that cause the shape to be dynamic, which is not supported.

[ ERROR ]  Check '((sizes_shape.is_static() || data_shape.rank().is_static()))' failed at frontends/onnx/frontend/src/op/resize.cpp:155:
While validating ONNX node '<Node(Resize): Resize_2327>':
 Data rank or shape of sizes input is required to be static.

Is it possible to change the model and retrain it inside Azure ML @leets0429 ? Or get more information about the model itself - what do the predictions look like, is there some metadata available, etc.?

@leets0429
Copy link
Author

@tersekmatija It is our custom model. The model is a maskrcnn_resnet50_fpn, that is use to recognize sugarbeet and the output are bounding boxes, labels, scores (confidence level), mask. When validation was done in Azure ML studio it works. I think Azure by default uses Pytorch but model can be exported in onnx format.

Nevertheless, I'll ask my colleague to retrain the model with fixed orientation. The training was done using landscape and portrait images.

@tersekmatija
Copy link
Contributor

@leets0429
Hey, yeah, I'm afraid Mask R-CNN will not be easy to export. I'd suggest you take a look at this comment for instructions. Note that Mask R-CNN might run slowly on OAK as it's quite heavy computationally. If you do not require instance segmentation, I'd recommend you object detection (like YoloV6n) or a modified version of YOLOP if semantic segmentation is enough.

Best, Matija

@leets0429
Copy link
Author

@tersekmatija

I've recreate a new model (Yolov5) with static images yet the problem still persists. How should I resolved the problem? Do I need to manually convert it instead of using the blob converter. Do you have any resources regarding such topics?

I've attached the model here
https://ailandgmbh-my.sharepoint.com/:u:/g/personal/lee_a-i_land/EQoVtF7IzYxPhNfgxPnYj30Bexo5kRoOlP8IsT91_nPsmQ?e=3AHbcK

Screenshot from 2022-10-31 07-54-22

@tersekmatija
Copy link
Contributor

@leets0429 So, it seems that generated ONNX does not have a fixed batch size. This can be solved by specifying flag -b=1 or explicitly by --input_shape=[1,3,640,640] in Model optimizer params: under advanced settings. Note that you should clear mean_values and specify --scale=255 and --reverse_input_channels. YoloV5 expects BGR images with values in [0,1], while OAK by default outputs BGR with values in [0,255]. More info on mean values here.

Your export will still fail, that is because of the ScatterND operation in ONNX graph. This happens when you are directly changing the slices, like

x[..., 2:4] = torch.exp(x[..., 2:4]).

Currently, the only solution is to modify the code directly. See this issue for an example.

For Yolo models however, we offer the NMS and post-processing directly on device. If you can download best.pt weights saved during the model training, I'd recommend you to use our tools. Simply upload best.pt weights, define input shape, and choose Yolo version. Then open the link in the left panel for instructions on how to run your model.

@leets0429
Copy link
Author

@tersekmatija Correct me if I'm wrong, in order to remove the ScatterND, I have to change the code during the conversion from Pytorch to ONNX?

@tersekmatija
Copy link
Contributor

@leets0429 Yes it's likely the post-processing code in the Detect layer. I'd still suggest you to use tools.luxonis.com for YoloV5-V7.

@leets0429
Copy link
Author

@tersekmatija I've tried using your Yolo tool but it said error while loading the model.

here's my pt file
https://ailandgmbh-my.sharepoint.com/:u:/g/personal/lee_a-i_land/EWIZIq-b5wtCjJVKerb6FDwBjTagBFX0JX2xHpzGc_nVPw?e=mn4D3T

Screenshot from 2022-11-02 07-50-17

@tersekmatija
Copy link
Contributor

Hey @leets0429 , thanks. We will give this a look. In the meantime can you tell me if official YoloV5 repository was used to produce this .pt?

@leets0429
Copy link
Author

@tersekmatija I have no information on the Yolo repo. The model was trained from using Azure Automated ML.

Screenshot from 2022-11-02 13-47-24

@tersekmatija
Copy link
Contributor

Hm, thanks. It could be that they are using some other source code inside their tool rather than the official repo. We will investigate this with the .pt file.

Can you please also send me the ONNX model that you can export with their tool? @leets0429

@leets0429
Copy link
Author

The onnx file can be found here #46 (comment)

@leets0429
Copy link
Author

@tersekmatija any updates on the Yolo tool issue?

@HonzaCuhel
Copy link
Contributor

@leets0429 Could you please share with us some examples of images and the expected outputs? So that we could test the model.

@tersekmatija
Copy link
Contributor

@leets0429 The issue with the model that you've shared with us is that it is a custom implementation of YoloV5 which seems to have a slightly different postprocessing. We are looking into that. I am not sure a standard support in tools.luxonis.com will be possible, but in case we can match their postprocessing with ours we can share an example of how you can export it yourself.

Besides an example of an input, do you maybe have access to the source code?

For the best plug-and-play experience I'd recommend training your model with official Ultralytics' YoloV5 GitHub repository. Not sure how feasible this is with Azure ML.

@leets0429
Copy link
Author

In the "input" folder, you will find the required images. As for right now, I am only able to deploy the onnx model from Azure locally. You can find it in the onnx_local_deploy folder. It is not the same model, but it uses Yolov5 too and trained through Azure. The output should be bounding boxes with label.

https://ailandgmbh-my.sharepoint.com/:f:/g/personal/lee_a-i_land/Ej_49ewtOQJMpgX73Y5Wp04BlZXGjIxMYcehJ2JLzNGAxw?e=kjyJvK

I tried to deploying the Pytorch model from Azure on my laptop but the parameter names are not configured correctly. I think this is the same issue you are facing now too. I've contacted support and hopefully I could get a solution from them in a few days.

I can find the source code but it uses Azure services, requiring certain credentials, hence I don't think you can use it.

Regarding the ScatterND problem, shouldn't your Blob Converter considered all kinds of ONNX Layer? As far as I understand ONNX was supposed to be standardized format to ease deployment. As for right now, I could manually convert pt to onnx to avoid the ScatterND issues, but in the long term, it seems troublesome to deal with such exception everytime.

@tersekmatija
Copy link
Contributor

Thanks, @leets0429, @HonzaCuhel will give it a look and give you some updates on the export process when we have them.

The reason I am asking for the source code is to determine whether some different post-processing is used in their YoloV5 version or not, also during the training (for example some constant subtraction). This would help us properly export the model and test it out to make sure it works. We've modified the ONNX model in the way that our DepthAI API expects it, but are seeing some false positive detections.

Regarding the ScatterND, blobconverter is using OpenVINO model optimizer and compile tool in the background which seems to support it, but they don't support ScatterNDUpdate. So, even if ONNX, which is supposed to be a general format supports some layer, it doesn't mean that all of them are supported in OpenVINO and consequently on OAK-D. You can see only supported layers and their limitations here.

As mentioned, this happens due to updating the box "slices" in the postprocessing code in YoloV5. Latest version seems to avoid that by concatenating the box locations and sizes, while this was not the case in the older versions. If you modify the postporcessing code to match that, you should be able to export it without a problem, at least to the NMS layer which could be causing some troubles as well. Note that if you succeed, NeuralNetwork node does all off the post-processing. If you train your model with official YoloV5 repository and export it using tools.luxonis.com, post-processing is not done in the NeuralNetwork, but still completely on device. The benefit is that it enables you to directly use our YoloDetectionNetwork, which you can combine with SpatialLocationCalculator for example.

@leets0429
Copy link
Author

@tersekmatija @HonzaCuhel I just found out that the model provided was not a well trained model. Hence false positive detections are expected. Could you provided me the modified model and also the way how you modify it so in future if I have a better model, I can do it myself? Thanks

@HonzaCuhel
Copy link
Contributor

Thank you for the info, and I apologize for the longer delay in my answer, @leets0429. Here is a Colab notebook that takes care of the conversion. I tried to export both models, and the second one (that detects salads and weed) works, but because the model is rather heavy, the performance was around 6fps when I tested it on OAK-1. So for better performance, I'd recommend using a lighter version of YoloV5 if possible.

@leets0429
Copy link
Author

@HonzaCuhel Hi thank you very much for the script. Could you send me the converted blob file so that I can test it on my OAK right away?

@HonzaCuhel
Copy link
Contributor

@leets0429, no problem! Yeah, absolutely. Here are the converted files.

@leets0429
Copy link
Author

@HonzaCuhel I'm not sure whether I deployed the model on the OAK correctly. This is my result after increasing the threshold to 0.8 or else I receive error about to much metadata.

result

I suspect I interpret the result incorrectly hence the weird bounding boxes. I've seen you have attached a json file too. Is that relevant to interpretation of the result? If so, how should I used it in the OAK pipeline.

I've faced similar issue when I try to use the ONNX model on my laptop but it was resolved after correctly interpreting the result. Hence, I suspect what I got now is due to the same cause.

I've attached my OAK pipeline script (oak_stream_custom_nn_noros.py) and also the local deployment onnx script (onnx_local_deploy_example/examples)

https://ailandgmbh-my.sharepoint.com/:u:/g/personal/lee_a-i_land/EfmqfteIqf1JogVsR6BQZCoBZ2vH29TVbn4GpvnsK9g63Q?e=wHEAnY

@tersekmatija
Copy link
Contributor

@leets0429

I haven't checked the export script, but the processing you are doing in the Python script is indeed incorrect. You are using a MobileNetDetectionNetwork which we use for SSDLite based detectors. You should replace it with YoloDetectionNetwork, similar as here (L66). Note that for Yolo you need to use the items from the JSON and set them as parameters of the detection network, like here (L82-L90).

You can give it a quick try using main_api.py from here by simply calling:

python3 main_api.py -m path/to/model.blob -c path/to/config.json

@leets0429
Copy link
Author

leets0429 commented Nov 15, 2022

@tersekmatija The model is working now but the fps is quite low (around 5 fps). What are the possible ways that I can increase it? We are using the device on the moving vehicle, hence fps is quite important for us.

Another problem is how do I change the exposure of the camera with this script
https://github.com/luxonis/depthai-experiments/tree/master/gen2-yolo/device-decoding

I've tried using the updateColorCamConfig() but I received the following error.
Screenshot from 2022-11-15 08-10-50

So far I can't find any documentation about the PipelineManager in your website. Here is my script.
oak_stream_custom_no_ros.zip

@tersekmatija
Copy link
Contributor

@leets0429
FPS depends on quite a few factors. One of the main is the input shape. I don't recall if the ONNX that you get from the Azure ML already has defined input shapes. In case it does, you need to look whether there are some settings and reduce the input shape. If not, you could try changing it in the mo command as well. CC @HonzaCuhel for additional help here.
We've also observed that the model is quite on the heavy side - it is a larger YoloV5 model which means it has more parameters and operations. You would need to look if Azure ML supports different YoloV5 versions, look for YoloV5n or YoloV5s (nano or small). Those usually work faster. If not, I'd recommend you to try using our tutorial to train your model. You can choose a lighter version of YoloV5 which will work faster on camera, and you should be able to export it with tools.luxonis.com.

As far as the script goes, main.py is based on and old depthai-sdk (CC @daniilpastukhov on updating it). In the meantime, I'd recommend you to use main_api.py and refer here for exposure controls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants