Data Exploration and Mining

Author: Eliza Fury

Overview

This project focuses on applying data mining techniques to analyze a dataset and extract meaningful insights. It includes both descriptive and predictive mining tasks to explore patterns, relationships, and trends in the data.

Setup

1. Prerequisites

Ensure you have the following installed:

Python (version 3.7 or higher)
pip (Python package manager)

2. Environment Setup

It is recommended to set up a virtual environment for this project. Use the following commands:

# Create a virtual environment
python -m venv env

# Activate the virtual environment
# On Windows:
env\Scripts\activate
# On macOS/Linux:
source env/bin/activate

3. Install Required Libraries

Run the following commands to install the necessary Python packages:

pip install pandas matplotlib prefixspan mlxtend

Project Structure

Objectives

The main goals of the project are:

To clean and preprocess the dataset to ensure data quality.
To perform descriptive mining by identifying associations and clustering patterns within the dataset.
To conduct predictive mining using methods such as decision trees, regression models, and neural networks.
To evaluate the results and provide actionable recommendations based on the analysis.

Key Components

The project will cover the following:

Data Cleaning and Preparation
- Handling missing values and duplicates
- Converting data types
- Ensuring dataset readiness for analysis
Descriptive Mining
- Association analysis to uncover relationships between items
- Clustering to identify natural groupings in the data
Predictive Mining
- Building and evaluating decision tree models
- Creating regression models to predict outcomes
- Implementing neural networks for advanced predictions
Evaluation and Recommendations
- Comparing models based on metrics such as accuracy and ROC curves
- Summarizing key findings
- Providing recommendations supported by analysis

Tools and Libraries

The project utilizes Python and key data science libraries, including:

pandas: Data manipulation and analysis
matplotlib: Data visualization
mlxtend: Tools for association analysis
prefixspan: Sequence pattern mining

Deliverables

The final deliverables for this project include:

A comprehensive report summarizing the methods, results, and recommendations.
Visualizations and insights derived from the analysis.
Supporting files such as datasets and model outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.jupyter_ystore.db		.jupyter_ystore.db
Assignment2A.ipynb		Assignment2A.ipynb
Eliza Fury Data Mining Report.pdf		Eliza Fury Data Mining Report.pdf
Python code for Data Exploration.pdf		Python code for Data Exploration.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Exploration and Mining

Overview

Setup

1. Prerequisites

2. Environment Setup

3. Install Required Libraries

Project Structure

Objectives

Key Components

Tools and Libraries

Deliverables

About

Releases

Packages

Languages

Efury1/Data-Exploration-and-Mining

Folders and files

Latest commit

History

Repository files navigation

Data Exploration and Mining

Overview

Setup

1. Prerequisites

2. Environment Setup

3. Install Required Libraries

Project Structure

Objectives

Key Components

Tools and Libraries

Deliverables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages