This Jupyter Notebook demonstrates the classification of handwritten digits using a Logistic Regression model. The dataset used is the popular digits
dataset from the scikit-learn library, which contains images of handwritten digits.
The notebook follows these main steps:
- Loading the Data: The digits dataset is loaded from scikit-learn.
- Displaying the Images and Labels: A few sample images from the dataset are displayed alongside their corresponding labels.
- Splitting Data into Training and Test Sets: The dataset is divided into training and test sets.
- Training the Model: A Logistic Regression model is trained on the training data.
- Testing the Model: The model's predictions on the test data are made.
- Measuring Model Performance: The model's accuracy is calculated.
- Confusion Matrix: A confusion matrix is generated to visualize the model's performance.
To run this notebook, you need to have Python and Jupyter Notebook installed on your machine. Additionally, you'll need to install the required Python packages. You can install these using pip
:
pip install numpy matplotlib seaborn scikit-learn
-
Load the Data: The dataset is loaded using the
load_digits()
function from scikit-learn.from sklearn.datasets import load_digits digits = load_digits()
-
Display the Images: Display the first five images and their corresponding labels using Matplotlib.
import numpy as np import matplotlib.pyplot as plt plt.figure(figsize=(18,3)) for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])): plt.subplot(1, 5, index + 1) plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray) plt.title('%i\\n' % label, fontsize = 20)
-
Split the Data: Split the data into training and test sets.
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0)
-
Train the Model: Train a Logistic Regression model.
from sklearn.linear_model import LogisticRegression logRegr = LogisticRegression(solver='saga', max_iter=2000) logRegr.fit(x_train, y_train)
-
Test the Model: Use the trained model to make predictions on the test set.
predictions = logRegr.predict(x_test)
-
Evaluate the Model: Calculate the model's accuracy and display the confusion matrix.
from sklearn import metrics import seaborn as sns score = logRegr.score(x_test, y_test) cm = metrics.confusion_matrix(y_test, predictions) plt.figure(figsize=(9,9)) sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Pastel1') plt.ylabel('Actual Value') plt.xlabel('Predicted Value') plt.title('Accuracy Score: {0}'.format(score), size = 15) plt.show()
- Accuracy: The Logistic Regression model achieved an accuracy of approximately 96.39% on the test set.
- Confusion Matrix: The confusion matrix provides a detailed breakdown of the model's performance across different digit classes.
- The scikit-learn library for providing the
digits
dataset and machine learning tools. - The Matplotlib and Seaborn libraries for visualization support.