Skip to content

[tmva] Missing dependency or clean up in TMVA test/tutorials #16553

@pcanal

Description

@pcanal

Check duplicate issues.

  • Checked for duplicates

Description

On a large node (127 cores, 128 GB), I ran:

  1. ctest -j 32
  2. ctest --rerun-failed
  3. ctest -j 32

After 1. many test failes due to lack of resources (running out of threads, see #16552 ):

47:PyMVA-Keras-Classification
348:PyMVA-Keras-Regression
349:PyMVA-Keras-Multiclass
350:gtest-tmva-pymva-test-TestRModelParserKeras
984:tutorial-tmva-TMVA_SOFIE_GNN_Application
985:tutorial-tmva-TMVA_SOFIE_Keras
986:tutorial-tmva-TMVA_SOFIE_Keras_HiggsModel
988:tutorial-tmva-TMVA_SOFIE_RDataFrame
990:tutorial-tmva-TMVA_SOFIE_RSofieReader
1238:tutorial-tmva-RBatchGenerator_PyTorch-py
1239:tutorial-tmva-RBatchGenerator_TensorFlow-py
1246:tutorial-tmva-TMVA_SOFIE_Models-py
1247:tutorial-tmva-TMVA_SOFIE_RDataFrame-py
1252:tutorial-tmva-keras-GenerateModel-py
1253:tutorial-tmva-keras-MulticlassKeras-py

However in 2., several tests still failed (even-though resources where no longer an issue):

50:gtest-tmva-pymva-test-TestRModelParserKeras
984:tutorial-tmva-TMVA_SOFIE_GNN_Application
986:tutorial-tmva-TMVA_SOFIE_Keras_HiggsModel
988:tutorial-tmva-TMVA_SOFIE_RDataFrame
990:tutorial-tmva-TMVA_SOFIE_RSofieReader
1247:tutorial-tmva-TMVA_SOFIE_RDataFrame-py

The errors listed there included:

IncrementalExecutor::executeFunction: symbol 'saxpy_' unresolved while linking [cling interface function]!
IncrementalExecutor::executeFunction: symbol 'sgemm_' unresolved while linking [cling interface function]!
tutorials/tmva/TMVA_SOFIE_RDataFrame.C:29:10: fatal error: 'Higgs_trained_model.hxx' file not found
/tutorials/tmva/TMVA_SOFIE_GNN_Application.C:10:10: fatal error: 'encoder.hxx' file not found

From this I conclude that those tests (in particular TMVA_SOFIE_RDataFrame.C and tutorials/tmva/TMVA_SOFIE_GNN_Application.C) are missing a dependencies that failed in the first run.

Note tutorial-tmva-TMVA_SOFIE_Keras_HiggsModel and tutorial-tmva-TMVA_SOFIE_RDataFrame-py are indeed needing TMVA_Higgs_Classification.C to run first (it says so in the output! :) ).

tutorial-tmva-TMVA_SOFIE_RSofieReader is asking for Higgs_trained_model.h5

gtest-tmva-pymva-test-TestRModelParserKeras is missing the symbol sgemm_ (see below)

However when rerunning (where this time somehow there was no resource related failures), I still got several failures:

346:gtest-tmva-pymva-test-TestRModelParserPyTorch
350:gtest-tmva-pymva-test-TestRModelParserKeras
984:tutorial-tmva-TMVA_SOFIE_GNN_Application
988:tutorial-tmva-TMVA_SOFIE_RDataFrame
990:tutorial-tmva-TMVA_SOFIE_RSofieReader

all due to:

IncrementalExecutor::executeFunction: symbol 'sgemm_' unresolved while linking [cling interface function]!

or both

IncrementalExecutor::executeFunction: symbol 'saxpy_' unresolved while linking [cling interface function]!
IncrementalExecutor::executeFunction: symbol 'sgemm_' unresolved while linking [cling interface function]!

Which may be due to either a badly formed result of the failing run (1) or due to an external package that does not have the correct version number?

Reproducer

ctest -j 32 # and get lots of out of resource failures
ctest --rerun-failed
ctest -j 32

ROOT version

master

Installation method

hand build

Operating system

Alma9

Additional context

jupyter-pcanal-rootdevel:quick-devel pcanal$ bin/root-config --features
cxx17 asimage builtin_clang builtin_cling builtin_gtest builtin_llvm builtin_lz4 builtin_lzma builtin_nlohmannjson builtin_openui5 builtin_tbb builtin_vdt builtin_xxhash builtin_zlib builtin_zstd clad dataframe davix gdml http imt pyroot roofit root7 rpath runtime_cxxmodules shared sqlite ssl tmva tmva-pymva tpython spectrum vdt x11 xml xrootd

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Issues

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions