- Introduction
- Process Mining Pipeline: Workflow
- Stack & Installation/Configuration Requirements
- Maintainers
- Backlog (Shared Doc Link)
- Troubleshooting
- This is a Proof of Concept (PoC) for distributed Process Mining (PM) data pipelines. It relies on the theoretical concepts of:
- graphs:
- Directly-Follows Graph (DFG)
- Directed Acyclic Graph (DAG)
- database:
- Neo4J graph db (via database .jar connector)
- data analysis:
- distributed analytical engine, i.e., Apache Spark
- interface: PySpark
- PM4Py Process Mining algorithms (Python-based package)
- distributed analytical engine, i.e., Apache Spark
- log files, i.e., .csv files
- Apache Spark via PySpark interface
- data partitioning (by case ID) and time windowing (by timestamps)
- Directly-Follows Graph schema, i.e., predecessor(s) and successor(s) nodes
- WRITE queries to a Neo4J graph database
- format conversion for algorithmic analysis, i.e., Parent/Child nodes and their frequency
- derive Petri net objects and put them within dataframes
- PM4Py analysis by applying - in parallel to DFGs - Process Mining algorithms, i.e.,
- α-miner
- heuristic miner
- inductive miner
- plotting of evaluation metrics via comparative charts from the dataframes - derived by converting Resilient Distributed Database (RDD)s to Pandas
- PM4Py analysis by applying - in parallel to DFGs - Process Mining algorithms, i.e.,
- derive Petri net objects and put them within dataframes
- format conversion for algorithmic analysis, i.e., Parent/Child nodes and their frequency
- WRITE queries to a Neo4J graph database
- Directly-Follows Graph schema, i.e., predecessor(s) and successor(s) nodes
- data partitioning (by case ID) and time windowing (by timestamps)
- Apache Spark via PySpark interface
- pySpark vs. 2.4.6
- Java JVM vs. 8
- Scala vs. 2.13
name: pyspark-vs2 channels: - defaults dependencies: - pip=20.2.4 - python=3.7.9 - pip: - matplotlib==3.3.4 - numpy==1.20.0 - pandas==1.2.1 - pm4py==2.2.5 - py4j==0.10.9.1 - pydotplus==2.0.2 - pyspark==2.4.7 - python-graphviz==0.8.4
NOTE: if some packages are missing in the required version, try installing from pip
or from Conda-forge: $ conda install -c conda-forge [pkg_name]
- Prof. Dr. Ahmed Awad ahmed.awad@ut.ee
- Belal Mohammad: B.Mohammed@nu.edu.eg
- Fabiano Spiga: fabiano.spiga@ut.ee