Skip to content

A fun little project I did where I am using Convolutional Neural Networks for a classification task. We want to classify the images as either steak or a pizza.

Notifications You must be signed in to change notification settings

AaditS99/CNN-for-image-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Introduction:

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision and are widely used for image classification tasks. In this blog post, we will explore how to use CNNs to classify images as either pizza or steak. We will cover the fundamental principles of CNNs, data preparation, model architecture, training, evaluation, and even explore the use of transfer learning for improved performance. CNNs are a type of deep neural network specifically designed to analyze visual data. They are effective for image classification tasks due to their ability to automatically learn hierarchical features from raw pixel data. CNNs consist of multiple layers, including convolutional layers, activation functions (such as ReLU), pooling layers, and fully connected layers. These layers work together to extract relevant features from images and make predictions.

Data Preparation:

We imported the data from google drive, the data consists of images of either pizza or steak. Before training our CNN model, we need to prepare and preprocess the image data. This involves tasks such as resizing images to a uniform size, normalizing pixel values to a range between 0 and 1, and splitting the data into training and validation sets to assess the model's performance. We will use popular Python libraries like TensorFlow and Keras to streamline this process. To prepare our image data for training a CNN model to classify pizza and steak images, we utilize TensorFlow's ImageDataGenerator utility. This powerful tool allows us to perform real-time data augmentation and pre-processing on our image dataset. Here we are using the ImageDataGenerator to only rescale the values. By rescaling the pixel values of our images to a range between 0 and 1 using the rescale parameter set to 1/255, we standardize the input data, facilitating faster convergence and improved model performance during training. This normalization step is crucial as it ensures that all pixel values are within the same scale, preventing any single feature from dominating the learning process. Additionally, rescaling the pixel values helps mitigate issues related to vanishing or exploding gradients, leading to more stable and efficient training of our CNN model. Overall, ImageDataGenerator simplifies the preprocessing pipeline by automating common tasks such as rescaling and augmentation, allowing us to focus on building and training our CNN architecture. Notice, we are also specifying the directory paths for the training data and the testing data in the dataset. Finally, in the last code chunk, we are splitting the data into “train_data” and “test_data”, where the batch_size = 32 (i.e. the images will be generated in batches of 32), we are also reshaping the images to be 224 * 224 in size and having the class_mode set to “Binary” since it is a binary classification problem. The flow_from_directory method handles the process of loading the images and their corresponding labels from the directory structure.

Model Architecture:

The architecture of our CNN model will consist of multiple convolutional layers followed by activation functions like ReLU to introduce non-linearity. We will also include pooling layers to reduce spatial dimensions and fully connected layers for classification. Each layer plays a crucial role in feature extraction and classification, allowing the model to learn complex patterns in the data. I have included an illustration of a CNN below to give the readers a better idea of how CNN works.

The Convolutional Neural Network (CNN) architecture that I constructed is designed to classify images into two categories: pizza and steak. It begins with an input layer specifying the dimensions of the input data, which are images with dimensions of 224x224 pixels and 3 color channels (RGB). Following the input layer are three convolutional layers, each applying convolution operations to extract features from the input images. The convolutional layers use 40, 20, and 10 filters respectively, with a kernel size of 3x3. After each convolutional layer, a Rectified Linear Unit (ReLU) activation function is applied, introducing non-linearity to the network by mapping negative values to zero. Pooling layers, specifically max pooling layers, are added after each convolutional layer to reduce the spatial dimensions of the feature maps while preserving important features. Dropout layers are incorporated after each max pooling layer with a dropout rate of 0.1. Dropout randomly sets a fraction of input units to zero during training, preventing overfitting by reducing reliance on any individual neuron. Following the dropout layers is a flatten layer, which converts the 3D feature maps into a 1D vector, preparing the data for input into the fully connected layers. The architecture concludes with two fully connected layers: one with 60 neurons serving as a feature extractor, and the final layer with a single neuron and a sigmoid activation function for binary classification. The sigmoid activation function produces the probability of the input image belonging to the positive class (either pizza or steak). Overall, this CNN architecture combines convolutional layers for feature extraction, pooling layers for dimension reduction, dropout layers for regularization, and fully connected layers for classification, enabling it to effectively learn discriminative features and make accurate predictions on image data.

Training the Model:

Training a CNN involves selecting appropriate loss functions, optimizers, and other hyperparameters. We will discuss how to choose these parameters based on the nature of our classification task. Additionally, we will explore the use of callbacks like early stopping to prevent overfitting and improve model performance during training. The model is trained using the Adam optimizer and binary cross-entropy loss function, which are common choices for binary classification tasks like image classification. The Adam optimizer offers adaptive learning rates and momentum, making it well-suited for training deep neural networks efficiently. Binary cross-entropy loss is appropriate for binary classification problems, as it measures the difference between the predicted probabilities and the true labels. Additionally, an early stopping callback is implemented to monitor the training process and prevent overfitting. The early stopping callback is configured to monitor the training loss (monitor="loss") and halt training if the loss does not improve for 10 consecutive epochs (patience=10). This helps prevent the model from learning noise or irrelevant patterns from the training data and saves computational resources by stopping training when further optimization is unlikely to improve performance. By incorporating this callback, the training process becomes more efficient and less prone to overfitting, resulting in a model that generalizes well to unseen data.

Evaluating the Model:

Once our model is trained, we need to evaluate its performance on unseen data. We can calculate metrics such as accuracy, precision, recall, and F1-score to assess the model's effectiveness. Visualizations like confusion matrices will help us understand how well the model is performing and identify any areas for improvement. In this case, are using accuracy as our metric of choice. The accuracy score that we got was 0.5 (50%), and as you can see in the confusion matrix below, the model is predicting all the observations as pizza, which is why we have an accuracy of 50%. The model gives us a validation accuracy of 0.8 (80%)

This suggests that the model is not learning meaningful patterns from the data. There could be multiple reasons for this such as the model is not complex enough, or the features that are extracted may not effectively distinguish between the classes. Another reason might be class imbalance, but here the training data and the testing data both are equally distributed between the classes to predict. Transfer Learning: In addition to training a CNN from scratch, we will also explore the use of transfer learning. Transfer learning involves leveraging pre-trained models (such as VGG16 or ResNet) that have been trained on large datasets like ImageNet. ImageNet has 1000 different classes on which the models are trained on. We will fine-tune these models on our pizza vs. steak dataset and compare their performance to our custom CNN model. This helps us a get a head start. Here we have used the MobileNet pretrained model. MobileNet is a lightweight convolutional neural network architecture specifically designed for mobile and embedded devices with limited computational resources. It was developed by Google and focuses on reducing model size and computational complexity while maintaining good performance. It achieves this by using depthwise separable convolutions, which separate the spatial and channel-wise convolutions, significantly reducing the number of parameters and computations. In the parameters we have set (include_top = False) as we want to add our own output layer depending on the specifications of our problem.

We are adding the “sigmoid” activation since we have a binary classification problem. In this model, we get the validation accuracy of 0.9840 (98.4%) but still the actual accuracy on predicted data is 50%, the model is classifying all the images as Pizza again.

It should be taken into account that the model had only 1500 images to train on and 500 images to test, this might not be the suitable number to get a good result.

Conclusion:

In this blog post, we have covered the entire process of using CNNs to classify images as pizza vs. steak. From understanding the fundamentals of CNNs to data preparation, model architecture, training, evaluation, and even exploring transfer learning, readers will gain a comprehensive understanding of how to approach image classification tasks using deep learning techniques. By following the step-by-step guide and experimenting with different architectures and techniques, readers will be well-equipped to apply CNNs to their own image classification projects.

About

A fun little project I did where I am using Convolutional Neural Networks for a classification task. We want to classify the images as either steak or a pizza.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published