Find the data (synthesized models, intermediate results) of this project here:
This is a project for synthesizing large Simulink models for stress testing, scaling, or giant fuzzing. In its standard settings, you can create new models that are syntactically valid and
- giant sized (>100k Blocks, >15k Subsystems), the strategy that produces these is GIANT,
- deep (50-100 levels deep), the strategy is DEPTH,
- bushy and dense (more shallow, but manymany Subsystems per level), the strategy is WIDTH,
- have isomorphic Subsystem tree to a given model (from the SLNET set), the strategy is AST_MODEL,
- are randomly synthesized, the strategy is RANDOM. Each model is synthesized using a set of subsystems of a corpus of models, like the SLNET-corpus, as building blocks. Suitable subsystems are puzzled together to synthesize (huge) models.
You can also create new models with TITLE that are compilable or simulatable. These models are much smaller and currently not all of the synthesis attempts will result in a compilable/simulatable model.
You can use the synthetic models to test the scalability of Simulink, or your Simulink tool, or to fuzz them.
This package comes with 4,600 synthesized models of various shapes and sizes. 600 of them are very large.
- In the directory
0you can find syntactically valid models from our strategies. The largest model is0/WIDTH/model70.slxit has more than 4 million model elemnents and is 500MB large. - In the directory
1you can find models of which some are compilable and simulatable. The largest compilable model is1/GIANT/model77.slx. The largest simulatable model is1/GIANT/model2.slx.
You can look up various properties of the models in either modellist_synthed.csv or X/STRATEGY/synth_report.csv where X=0 or X=1 and STRATEGY is one of our strategy names.
⚠ This is not a normal replication package. ⚠ Our approach is inherently fuzzing Simulink in various of its operations: loading, simulating, copying, saving, closing. All this with a diverse and challenging corpus such as SLNET. If you follow our instructions below, most likely you will experience program errors, hard crashes of MATLAB/Simulink, or even system-critical memory leaks! So caution is advised. Keeping that in mind, most of our scripts are designed to pick up their work, where they hard crashed the last time. So usually, restarting the script; restarting MATLAB and then restarting the script; restarting your PC, then MATLAB, then the script should skip over the bug to continue the work. We reported a number of bugs to Mathworks, so maybe you won't experience as many issues as we did in the most current MATLAB/Simulink version.
We used the model collection SLNET for TITLE. Have the model collection unzipped at some location models_path into directories (names of directories are numbers), like this:
SLNET
|---SLNET_GitHub
| |---100042416
| |---100381142
| |---...
|---SLNET_MATLABCentral
|---10335
|---10439
|---11027
|---...
As MATLAB sometimes hard crashes, you may have to restart steps 2, and 3 a couple of times, until they completely go through. Progress is saved, even with crashes.
- In
system_constants.mstate where yourSLNETdirectory is located (models_pathlocation from earlier), where you want the useful models stored at (tame_models_path), where you want the output of the synthesizer to be stored at (project_dir), and optionally where your copy ofproject_diris insynthed_models_path. We usedsynthed_models_pathif we restart the synthesis a couple of times, to not ruin prior runs. - Next, run
clean_models.mto clean the models from yourmodels_path, e.g., all Callbacks are removed, so that they 'behave' later. Models that misbehave at this cleaning process are filtered out. The cleaned models are put intotame_models_path. - Next, run
gather_models.m. Thegather_models.m-script will scan all cleaned models for their suitability for typed/untyped interface checks, i.e., models that are loadable/compilable/runnable. A filemodellist.csvwill be created inproject_dirin this step. - To build the database of Subsystems and their dictionaries of meta-information, use
mine.m. Inproject_dir/0(for statically correct models) andproject_dir/1(for compilable models) the filesinterface2subs.json,name2subinfo_complete.jsonwill be created. These are the dictionaries used in the synthesis. The0directory holds all models' subsystem information, while the1directory holds the subsystem information of the compilable and runnable models, only. - Finally, run the
synthesize.mscript to generate the synthetic models. Insynthesize.myou can choose which typed/untyped equivalence (line 8) and synthesize strategies (line 9) you want to use or leave out. Change the modes, according to the list of modes listed in line 7. Thesynthesize.mscript will create models and a report atproject_dir/<0,1>/<mode>/synth_report.csv.
If your system has:
- more than 120GB of RAM, you can run
synthesize.mas is - between 70GB and 120GB of RAM you should probably only synthesize for one mode combination only (lines 8 and 10 in
synthesize.mshould be modified likefor needs_to_be_compilable = X:Xandfor mode = Y:Y) and then do a restart of MATLAB before attempting the next - if you have less than 70GB of RAM, you should change line 40 in
synthesize.mto beif ~loaded && ~dry && dryto comment out the preloading of the SLNET set. You also have to uncomment the lines 229-231 inModelMutator.mand lines 95-97 inGSubTree.m.
We used the current settings to produce the results given in our paper. Our scripts ran for days though. You probably want to change the number of total models that are scanned in line 32 of clean_models.m to something like 1000 or 3000 (will work good for statically correct models). Further consider to change various constants in Helper.m's synth_profile function: reduce time_outs, limit maximum depths, or the desired model count per strategy.
We added these functions to our MATLAB path, to suppress warning and error dialogues, that otherwise will spam your screen and make work near impossible:
cat MATLAB/warndlg.m
function h = warndlg(varargin)
disp('[Suppressed warndlg]');
if nargout > 0
h = []; % return empty if output is expected
end
cat errordlg.m
function h = errordlg(varargin)
disp('[Suppressed errordlg]');
if nargout > 0
h = []; % return empty if output is expected
end
checkcompilabilitysimulability.mis used for finding the last two columns of Table IIIscalability.mcreates the raw data for Table V- the python scripts in
plotscreate Figures 5, 6, 7, 8, Table V - raw data in
modellist.csvis used for Table I - data for Table II is in the output of script
mine.m - the first six rows of Table III are output by
synthesize.m