Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantized binary classifier fails validation #539

Open
lkaneda opened this issue Dec 28, 2022 · 9 comments
Open

quantized binary classifier fails validation #539

lkaneda opened this issue Dec 28, 2022 · 9 comments

Comments

@lkaneda
Copy link

lkaneda commented Dec 28, 2022

While debugging my own quantized models created with tim-vx, I came across the issue of binary classifiers failing the validation step. This seemed to happen across different models I had although everything was set up properly. Further, my models worked ok with a non tim-vx implementation.

I am able to reproduce the issue seen in my models with the lenet example provided in this repo by doing the following:

  1. Run example - note that it works.
  2. Modify the following shapes (change 10's to 3's) so it becomes:
    tim::vx::ShapeType fc4_weight_shape({500, 3});
    tim::vx::ShapeType fc4_bias_shape({3});
    tim::vx::ShapeType output_shape({3, 1});
  3. Save and run example - note that it works.
  4. Repeat step 2 but change the 3's to 2's.
  5. Save and run example - note that it does not pass the validation step (graph->Compile()).

I have tested this with models of varying amounts of classes and it only has a problem with 2-class classifiers. I noted there was a similar comment to this in issue 167 but it looks like the issue was resolved without discussing this part further.

The output error codes I get are:
D [operator():134]vsi_nn_SetupGraph Returned 0
D [operator():140]vsi_nn_VerifyGraph Returned -1

Some notes:

  1. If I remove the softmax layer, the model passes validation.
  2. If I leave all layers in but run a non-quantized version of the model, the model passes validation.
  3. If I leave the network as is (all layers and quantized), the model doesn't pass validation.
@sunshinemyson
Copy link
Contributor

@lhawana

sorry for late reply. I can not reproduce follow your suggestion based on latest TIM-VX code. Could you share logs with following env variables:

VSI_NN_LOG_LEVEL=5
VIV_VX_DEBUG_LEVEL=1

Thanks

@lkaneda
Copy link
Author

lkaneda commented Jan 9, 2023

@sunshinemyson

Here are the logs with the variables on. I've included the output for each of the steps in my original description (10 classes, 3 classes, and 2 classes). 10 and 3 work, 2 does not.
Failure_Example_2_Class.txt
Working_Example_3_Class.txt
Working_Example_10_Class.txt

If it helps, we're operating on the tim-vx download that comes with the Yocto build 5.10.72 (hardknott).

Also thanks for sharing the debug env variables with us - we had the first one but didn't know about the second one!

@sunshinemyson
Copy link
Contributor

@lhawana ,

Thank for the log. I want to confirm two things :

  1. did you use latest src/tim/vx/internal in your test ? or you use standalone library of libovxlib.so ?
  2. which version of your driver libraries? (libOpenVX.so)

Is it possible try your test case with out default sdk in prebuilt directory? it is a simulator.

@lkaneda
Copy link
Author

lkaneda commented Jan 11, 2023

@sunshinemyson

  1. we used the standalone library of libovxlib.so with version 1.1.0 I believe. we also did build tim-vx from source and recompile as an .so but it was not from latest source. Instead, it was just what our Yocto recipe pulls from which is branch 5.10.72-2.2.0.
  2. libOpenVX.so version is 1.3.0 I believe.

I'm getting these numbers from the filenames installed on the board. Example: "./lib/libOpenVX.so.1.3.0". If this is not correct, please point me in the direction of how best to find this information.

Not sure if this is helpful, but looking through the commits between the branch we pull from and the current branch, this commit looked like it might be relevant.

@sunshinemyson
Copy link
Contributor

@lhawana ,

libovxlib.so 1.1.0 is quite out-of-date. the source version you take from NXP fork also 1 years ago.

Are you working with NXP imx.8plus ? If yes, i can ask NXP guys for help.

Thanks

@lkaneda
Copy link
Author

lkaneda commented Jan 12, 2023

@sunshinemyson

Yes we are working with an NXP imx8mplus.

Also at this point it is not simple for us to update to the latest version without risking breaking other parts of our pipeline. However, if there is a patch that helps fix the problem, we can look into applying that and testing it.

Thank you again for your help!

@sunshinemyson
Copy link
Contributor

@lhawana ,

Thanks, I've forward your issue to them.

@robert-kalmar
Copy link
Contributor

Hi @lkaneda,
we checked the issue. The fix requires updating the TIM-VX and GPU driver also.
Please post your case in NXP Community (https://community.nxp.com/ ). If you already did so, please share the link in order to match with this conversation. Here we can provide support with an incremental upgrade to avoid complete migration to new Linux Release.

Thanks,

@BralSLA
Copy link

BralSLA commented Jun 21, 2023

Hey @robert-kalmar ,

Apologies for the radio silence on this ticket. I'll be taking over the work in progress.
I've gone ahead and submitted a ticket with NXP to update those driver files. Let me know if you're able to view it. It may not be accessible because it's part of our support project. Are you able to point me to a commit hash, that I can use to get the affected files?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants