BEDS Pipeline

Introduction

The BERT Error Detection for STPA (BEDS) is a machine-learning Pipeline dedicated to assist the system analyst to perform the first step of the System-Theoretic Process Analysis (STPA) hazard analysis technique. BEDS was trained using the BERT language model, and specializes in detecting writing errors in sentences that does not follow the guidelines present in the STPA Handbook.

The pipeline has four steps:

(Optional) The first step takes an unlabeled sentence and classifies between the Loss, Hazard, and Constraint classes;
The second step checks if a sentence is considered either valid or invalid based on the examples given in the STPA Handbook;
The third step checks the category of fault present in the invalid sentences discovered in the previous step;
The fourth step uses a sentence similarity model to suggest corrections from a list of verified sentences to the incorrect sentences previously discovered.

Python Code

Two Python notebooks are available in this repository:

BEDS_Pipeline_Fine_tuning_and_Evaluation is the code used to manipulate the dataset and train all the ML models of the pipeline;
BEDS_Pipeline_Execution_example is the functional example of the pipeline.

How to Use

To experiment with BEDS, you should use BEDS_Pipeline_Execution_example.ipynb:

Uncomment and install the required libraries;
Prepare your input based on the examples given in this repository ("input_example labeled.csv" or "input_example unlabeled.csv");
Choose the input type: "labeled" or "unlabeled";
Run all lines of code sequentially.

Dataset

This dataset contains textual sentences generated and used during the first step of the System-Theoretic Process Analysis (STPA) hazard analysis technique, called "defining the purpose of the analysis". In this step, three security aspects of the system are defined:

Losses are something of value which a loss is unacceptable to stakeholders, such as human life, equipment or mission;
System-Level Hazards are system states or conditions that, together with a set of worst-case environmental conditions, will lead to a loss;
Sustem-Level Constraints are the system's conditions or behaviors that need to be satisfied to prevent hazards.

Dataset Creation

This dataset was created by extracting sentences found in presentations from the Annual MIT STAMP Workshop. The presentations are from 2012 to 2023.

How to Use

The dataset is a ".csv" file. For Python programming language, the use of Pandas library is recommended:

import pandas as pd
df = pd.read_csv(r'/[PATH]/stpa-dataset.csv')

Dataset Columns

This dataset contains 9 columns that organize the collected data.

"sentence": The extracted sentence from the presentation;
"label": The corresponding label of the sentence (Loss, Hazard, or Constraint);
"validity": Indicates whether the sentence is valid or invalid;
"faults": Indicates the type of fault in invalid sentences;
"domain": Domain of the presentation;
"year": Year of the presentation;
"title": Title of the presentation;
"url": URL of the presentation;
"slide": The number of the slide which the sentence was extracted;

Class distribution

The sentences extracted are from slides that explicitly show the type of sentence (for example a table explaining which are the system losses and hazards), that automatically represents the corresponding label to be filled in the dataset. However, the presentations containing different amounts of examples lead to an unbalanced dataset.

Class	Sentences
loss	291
hazard	424
constraint	369
Total	1084

About the Author

This repository was created by the Computing and Communication Systems graduate student Andrey Toshiro Okamura, from the State University of Campinas (UNICAMP)'s School of Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
reference		reference
BEDS_Pipeline_Execution_example.ipynb		BEDS_Pipeline_Execution_example.ipynb
BEDS_Pipeline_Fine_tuning_and_Evaluation.ipynb		BEDS_Pipeline_Fine_tuning_and_Evaluation.ipynb
README.md		README.md
input_example labeled.csv		input_example labeled.csv
input_example unlabeled.csv		input_example unlabeled.csv
stpa-dataset.csv		stpa-dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BEDS Pipeline

Introduction

Python Code

How to Use

Dataset

Dataset Creation

How to Use

Dataset Columns

Class distribution

About the Author

About

Uh oh!

Releases

Packages

Languages

andreyokamura-unicamp/BEDS-Pipeline

Folders and files

Latest commit

History

Repository files navigation

BEDS Pipeline

Introduction

Python Code

How to Use

Dataset

Dataset Creation

How to Use

Dataset Columns

Class distribution

About the Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages