Skip to content

CNN - EMNIST Balanced Dataset (using plaidML and Metal)

Notifications You must be signed in to change notification settings

oceallaigh-p/cnn_emnist_plaidML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Convolutional Neural Network - EMNIST Balanced Dataset

          (using plaidML and Metal)


Description

This is a simple convolutional neural network (CNN) trained on the EMNIST Balanced dataset designed to test the performance of an environment built with plaidML and Metal. PlaidML is a software framework that enables Keras to execute calculations on a GPU using OpenCL instead of CUDA. 1

Early stopping was added to halt training once the model performance failed to improve on the validation dataset. This ensures the avoidance of both overfitting (by using too many training epochs) and underfitting (by using too few training epochs).

CNN Architecture

Convolutional (Conv2D)
Pooling (MaxPooling)
Convolutional (Conv2D)
Pooling (MaxPooling)
Flattening
Dense (ReLU)
Dropout
Dense (SoftMax)

Hardware

  • iMac Pro
    • 10-core Intel Xenon Processor
    • 128 GB RAM (2666 MHz DDR4)
    • Radeon Pro Vega 56 8 GB

Software Environment

  • PyCharm for Anaconda
    • Conda virtual environment
      • Python 3.7
      • plaidML
      • See environment.yml for package list

Dataset

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset. Further information on the dataset contents and conversion process can be found in the paper available at https://arxiv.org/abs/1702.05373v1.

Format

There are six different splits provided in this dataset and each are provided in two formats:

  1. Binary (see emnistsourcefiles.zip)
  2. CSV (combined labels and images)
    • Each row is a separate image
    • 785 columns
    • First column = class_label (see mappings.txt for class label definitions)
    • Each column after represents one pixel value (784 total for a 28 x 28 image)

EMNIST Balanced Dataset

The EMNIST Balanced dataset is meant to address the balance issues in the ByClass and ByMerge datasets. It is derived from the ByMerge dataset to reduce mis-classification errors due to capital and lower case letters and also has an equal number of samples per class. This dataset is meant to be the most applicable.

  • train: 112,800
  • test: 18,800
  • total: 131,600
  • classes: 47 (balanced)

References:


Footnotes

1 GPU-Accelerated Machine Learning on MacOS

About

CNN - EMNIST Balanced Dataset (using plaidML and Metal)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published