Enron Scandal: The Fall of a Wall Street Darling
Played detective role and put my machine learning skills to use by building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset.
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives. In this project, you will play detective, and put your new skills to use by building a person of interest identifier based on financial and email data made public as a result of the Enron scandal. To assist you in your detective work, we've combined this data with a hand-generated list of persons of interest in the fraud case, which means individuals who were indicted, reached a settlement or plea deal with the government, or testified in exchange for prosecution immunity.
- Deal with an imperfect, real-world dataset (Class Imbalance problem)
- Validate a machine learning result using test data (K-fold cross validation,
SelectKBest
- Evaluate a machine learning result using quantitative metrics (
Accuracy
-Precision
-Recall
) - Create, select and transform features (
sklearn.preprocessing
) - Compare the performance of few machine learning algorithms (
Naive Bayes
,SVM
,DecisionTree
) - Tune machine learning algorithms for maximum performance
- Communicate your machine learning algorithm results clearly