Enron Machine Learning Analysis

This project is an analysis of Enron persons of interest (POIs) using Machine Learning algorithms. The project was created for Udacity's Data Analyst Nanodegree.

Access the final report here: https://sbsousa.github.io/EnronML

Project Description:

Per Udacity, the goal of this project is to "play detective and put your machine learning skills to use by building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset."

Approach

First, I performed a thorough Exploratory Data Analysis to gain a better understanding of the data. Next, I shaped the data and removed outliers. Then, I used SelectKBest to determine the best features for the machine learning algorithms. Finally, I created Naive Bayes and Decision Tree algorithms to process the data.

The project was created in Python and a Jupyter Notebook. Multiple Python packages were used including scikit-learn, Pandas, NumPy, Matplotlib, and Seaborn. The final Jupyter report is provided in HTML format.

Instructions

The Udacity files were modified to work with Python 3.9 and current packages. If you attempt to use these files, they may not work unless you recreate my environment using the packages in requirements.txt

poi_id.py: creates the pickle (pkl) files
tester.py: validates the selected machine learning algorithms against the pkl files and returns metrics (Accuracy, Precision, Recall, and F1)

License

This project is publicly available for educational purposes. Please acknowledge this source if you use it.

Sources

The Python scripts were provided by Udacity:

https://www.udacity.com/course/data-analyst-nanodegree--nd002

Udacity code that I modified is commented.

Additional sources are acknowledged in the code and report.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
.gitignore		.gitignore
Identify_Fraud_from_Enron_Email.ipynb		Identify_Fraud_from_Enron_Email.ipynb
LICENSE		LICENSE
README.md		README.md
feature_format.py		feature_format.py
final_project_dataset.pkl		final_project_dataset.pkl
my_classifier.pkl		my_classifier.pkl
my_dataset.pkl		my_dataset.pkl
my_feature_list.pkl		my_feature_list.pkl
poi_id.py		poi_id.py
tester.py		tester.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enron Machine Learning Analysis

Project Description:

Approach

Instructions

License

Sources

About

Languages

License

sbsousa/EnronML

Folders and files

Latest commit

History

Repository files navigation

Enron Machine Learning Analysis

Project Description:

Approach

Instructions

License

Sources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages