Name		Name	Last commit message	Last commit date
parent directory ..
CASMI_2016		CASMI_2016
EA_Massbank		EA_Massbank
README.md		README.md
__init__.py		__init__.py
figures_and_tables.py		figures_and_tables.py
plot_and_table_utils.py		plot_and_table_utils.py

README.md

Produce Raw Results

Scripts to produce the raw results that can be used to create plots and tables. For both datasets (CASMI 2016 and EA (Massbank)) there are three scripts:

Script	MS2 Base Scorer	Note	Reference in the Paper
eval__MetFrag22	MetFrag 2.2	Evaluation of metabolite identification performance	Section 4.3.1, Table 3
eval__TFG	Our, Bach et al. (2018)	Evaluation of metabolite identification performance	Section 4.3.1, Table 3
		Inspect parameters of our framework, e.g. margin type or number of spanning trees	Section 4.2.*
		Inspect hyper-parameter estimation	Section 4.2.2
eval__TGF__missing_MS2	Our	Evaluation of score integration framework for missing tandem mass spectra (MS2)	Section 4.4

Re-run Experiments

Here, we will describe how the experiments can be re-run on the example of eval__TFG.py (EA Massbank). Assuming you have installed the nmsmsrt_scorer package and, if needed, activated the virtual environment, you can run the evaluation script as follows:

python EA_Massbank/eval__TFG.py \
      --mode=EVALUATION_MODE \
      --D_value_grid 0.001 0.005 0.01 0.05 0.1 0.15 0.25 0.35 0.5 \
      --make_order_prob=EDGE_POTENTIAL_FUNCTION \
      --order_prob_k_grid platt \
      --margin_type=MARGIN_TYPE \
      --n_random_trees=NUMBER_OF_RANDOM_TREES_FOR_APPROXIMATION \
      --n_samples=NUMBER_OF_RANDOM_TEST_TRAINING_SETS \
      --ion_mode=IONIZATION_MODE \
      --max_n_ms2=NUMBER_OF_MS2_FOR_TEST \
      --database_fn=SCORE_DB_FN \
      --base_odir=BASE_OUTPÙT_DIRECTORY

Detailed description of selected parameters

A description of all parameters, can be found in the __main___ of the "eval__" script files. Some selected parameters will be explained here:

--mode

EVALUATION_MODE [1, 2]	Description
application	Results to Evaluate the performance on the test sets in the application setting.
development	Performance evaluation of training and test set for each hyper parameter grid value
missing_ms2	Performance evaluation for the mssing MS2 experiment

--D_value_grid

Grid used to search for the best retention order weight (see Section 2.2.4 and 3.4). We use [0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.25, 0.35, 0.5] in our experiments.

--order_prob_k_grid

Grid used to search for the best sigmoid slope parameter when using EDGE_POTENTIAL_FUNCTION=sigmoid or EDGE_POTENTIAL_FUNCTION=hinge_sigmoid (see Section 2.2.3 and 3.4). As grid we use [0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 7.0, 10.0 for the Hinge-Sigmoid. Our experiments shows that for Sigmoid we can use Platt's ("platt") method to determine the optimal value for k (see Section 4.2.2).

--ìon_mode and --max_n_ms2

These two parameter controll which ionization mode should be evaluation (negative or positive) and how many MS-features are used to calculate the test accuracy. The following settings are available (see Section 3.1):

Dataset	IONIZATION_MODE	NUMBER_OF_MS2_FOR_TEST
CASMI (2016)	positive	75
	negative	50
EA (Massbank)	positive	100
	negative	65

--n_smaples

Number of (training, test)-set samples. In our experiments we use NUMBER_OF_RANDOM_TEST_TRAINING_SETS=50 (CASMI, EA (negative)) and NUMBER_OF_RANDOM_TEST_TRAINING_SETS=100 (EA (positive))

--db_fn

Path to the SQLite DB.

--base_odir

Path to the output directory storing the raw results. The output directory will sub-directories separating the results resulting from different parameter settings.

Example: EA (Massbank) positive, Results for Table 3

Running the following command can be used to reproduce the EA (Massbank) positive results in Table 3. Note, to speed up the calculations, this command uses a reduced D-value grid, only 4 random spanning trees and only 3 samples. To get the exact results as in the paper, you need to set:


`--mode`	application
`--D_value_grid`	0.001 0.005 0.01 0.05 0.1 0.15 0.25 0.35 0.5
`--n_random_trees`	32
`--n_samples`	100

However, running the simplified setting can verify for you that the scripts are running in your configuration.

python EA_Massbank/eval__TFG.py \
      --mode=debug_application \
      --D_value_grid 0.01 0.1 0.25\
      --make_order_prob=sigmoid \
      --order_prob_k_grid platt \
      --margin_type=max \
      --n_random_trees=4 \
      --n_samples=3 \
      --ion_mode=negative \
      --max_n_ms2=65 \
      --database_fn=!!!_YOUR_SCORE_DB_FN_!!! \
      --base_odir=../../results/YOUR_RESULTS_GO_HERE/EA_Massbank/

The results will be stored in:

../../results/YOUR_RESULTS_GO_HERE/EA_Massbank/
      └── debug_application
          └── tree_method=random__n_trees=4__make_order_prob=sigmoid__param_selection_measure=topk_auc__norm_scores=none__mtype=max
              └── ion_mode=negative__participant=MetFrag_2.4.5__8afe4a14__max_n_cand=inf__sort_candidates_by_ms2_score=0
                  └── trainset=MEOH_AND_CASMI_JOINT__keep_test=0__est=ranksvm__mol_rep=substructure_count

You will find:

File	Description
measures.csv	Training set performance measures for each (D, k) grid value to select the best parameter
opt_params.csv	Selected (D, k) for each sample
topk_casmi__max_n_ms2=VALUE__sample_id=VALUE.pkl.gz	Top-k accuracies for the baseline (Only MS) and after the score integration (MS + RT)

You can load the results using the load_results function. To reproduce figures and tables of the paper, please take a look here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

README.md

Produce Raw Results

Re-run Experiments

Detailed description of selected parameters

Example: EA (Massbank) positive, Results for Table 3

Files

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Produce Raw Results

Re-run Experiments

Detailed description of selected parameters

Example: EA (Massbank) positive, Results for Table 3