Project Name

Predict Bank Credit Risk using South German Credit Data

Motivation

Normally, most of the bank's wealth is obtained from providing credit loans so that a marketing bank must be able to reduce the risk of non-performing credit loans. The risk of providing loans can be minimized by studying patterns from existing lending data. One technique that you can use to solve this problem is to use data mining techniques. Data mining makes it possible to find hidden information from large data sets by way of classification.

The goal of this project, you have to build a model to predict whether the person, described by the attributes of the dataset, is a safe (1) or a not safe (0) credit risk

Dataset Link

https://archive.ics.uci.edu/dataset/522/south+german+credit

Overview

This project is designed to predict credit risk using a machine learning model. It includes data ingestion, transformation, model training,and prediction. The project is implemented using Python and Flask, with a Cassandra database for data storage.

Project Structure

├── .github\workflows
│   ├── main.yaml                  # Deployment
├── artifacts                      # saved all outputs
├── Documents                      # Documentation
├── src/mlproject
│   ├── components                 # Main Coding
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── model_training.py
│   ├── pipelines/                 # Flow for running the code
│   │   ├── prediction_pipeline.py
│   │   ├── training_pipeline.py  
│   ├── exception.py               # For Custom Exception
│   ├── logger.py                  # Logging file
│   ├── utils.py
├── templates/                     # HTML templates for Flask app
│   ├── history.html                       
│   ├── index.html
│   ├── train.html
│   ├── test.html
├── app.py                         # Main Flask application
├── Dockerfile                     # Docker configuration
├── LICENSE                        # License
├── README.md                      # Project documentation        
├── requirements.txt               # All library list needed to use 
├── setup.py                       # Creating the package
├── template.py                    # Created files/folder with help this file

Installation

Clone the repository:


git clone https://github.com/kunalshelke90/Predict-Bank-Credit-Risk-using-South-German-Credit-Data.git


cd Predict-Bank-Credit-Risk-using-South-German-Credit-Data

Create a virtual environment and install dependencies:


conda create -p myenv python=3.9 -y


conda activate myenv


pip install -r requirements.txt

Set up ".env" environment variables:

Create a .env file in the root directory and add the necessary environment variables:


CASSANDRA_USER = "clientid"
CASSANDRA_PASSWORD = "secret"
CASSANDRA_KEYSPACE = "keyspace name"
CASSANDRA_SECURE_BUNDLE ="your_data_base_name.zip"

DAGSHUB_REPO_OWNER="owner_name"
DAGSHUB_REPO_NAME="Repo_name"
DAGSHUB_MLFLOW="True"
MLFLOW_REGISTRY_URI="https://dagshub.com/repo_owner/repo_name.mlflow"

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = 
AWS_ECR_LOGIN_URI = 
ECR_REPOSITORY_NAME =

Usage

Running the Flask Application

Start the Flask application:


python app.py

Access the application:

Open your web browser and go to http://localhost:8080 to interact with the application. or http://127.0.0.1:8080

Using Docker

Build the Docker image:


docker build -t predict-bank .

Run the Docker container:


docker run -p 5000:5000 predict-bank

Access the application:

Open your web browser and go to http://localhost:5000

Workflow

Data Ingestion:

The data is ingested and stored in the artifacts/ folder. This step involves loading data from the Cassandra database , names of the columns and rename to english and target column position is change and set it to last of table .Here data is divided into 2 parts train.csv , test.csv and entire raw data is saved as raw.csv

Data Transformation:

The ingested data is transformed using RobustScaler(keep every feature in same range and handle outliers),SimpleImputer(to deal with missing values) and convert the train and test data into array ,stored in the form preprocessor.pkl in artifacts/preprocessor.pkl

Model Training:

The transformed data is used to train machine learning models. The best-performing model is selected based on accuracy and other metrics. Stored the file as model.pkl in artifacts/model

Prediction:

The trained model is used to make predictions on new data inputs provided through the test.html page.

customization

Data Ingestion:

Customize data_ingestion.py in the src/mlproject/components folder to suit your data source and schema. You can modify the connection settings for your Cassandra database and adjust the data loading logic in src/mlproject/utils.py .

Data Transformation:

Modify data_transformation.py in the src/mlproject/components folder to apply different scaling methods, feature engineering techniques, or transformations according to your dataset's needs.

Model Training:

Customize model_training.py in the src/mlproject/components folder to experiment with different models, hyperparameters, and evaluation metrics. You can also integrate other ML libraries like TensorFlow or PyTorch.

Web Interface:

Modify the HTML templates in the templates/ folder to match your preferred UI design. You can add or remove input fields, change styles, and customize the prediction output format.

Contributing

Contributions are welcome! Please fork this repository and submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Name

Motivation

Dataset Link

Overview

Table of Contents

Project Structure

Installation

Usage

Running the Flask Application

Using Docker

Workflow

customization

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
Documents		Documents
artifacts		artifacts
notebook		notebook
src/mlproject		src/mlproject
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
prediction_history.csv		prediction_history.csv
requirements.txt		requirements.txt
secure-connect-bank-credit.zip		secure-connect-bank-credit.zip
setup.py		setup.py
template.py		template.py

License

kunalshelke90/Predict-Bank-Credit-Risk-using-South-German-Credit-Data

Folders and files

Latest commit

History

Repository files navigation

Project Name

Motivation

Dataset Link

Overview

Table of Contents

Project Structure

Installation

Usage

Running the Flask Application

Using Docker

Workflow

customization

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages