Jet Image Classification:

This project served as my MSc thesis at the Indian Institute of Technology, Delhi, under the supervision of Professor Abhishek M. Iyer, Department of Physics. The objective was to utilize ML techniques to analyze particle collisions at the Large Hadron Collider and detect anomalies.

Data:

The project works on the classification of image data. These simulated images are called Jet images, which represent collimated sprays of particles resulting from the particle collisions. The task was to employ several Machine Learning classification techniques to categorize the Jet images into two distinct classes: Signal and Background events.

An example of Jet images is given below:

Algorithm 1: Principal Component Analysis (PCA)

The first ML algorithm employed was PCA, a dimensionality reduction technique. This method follows from the Eigenfaces method of facial recognition. It enables the comparison of images based on their proximity in this reduced feature space. The accuracy achieved on the test set was an impressive 98.04%.

Algorithm 2: Convolutional Neural Networks (CNNs)

Transitioning to CNNs, we used deep learning in image analysis. The CNN architecture we modeled is shown in the image below:

We trained the algorithm on two different datasets:

Dataset 1: This was a balanced dataset of 100 images of each class, resulting in a total of 200 images of dimensions (480, 640, 3). The accuracy achieved by the CNN model on the test set after training for 40 epochs was 77.4%. The ROC curve is shown below.

Dataset 2: This was also a balanced dataset of 200 images of each class, resulting in a total of 400 images of dimensions (369, 369, 3). The accuracy achieved by the CNN model on the test set after training for 38 epochs was 92.5%. The ROC curve is shown below.

Image Preprocessing:

To further increase the accuracies of classification, the project pivoted to better pre-processing techniques on the images. A methodology for reconstructing images from recorded data was used which was formulated as an optimization problem. The optimization was achieved by the Gradient Descent Algorithm.

This method was implemented using two Python libraries: Numpy and Autograd. The results were compared with another preprocessing algorithm called 'Weiner Filter Recovery'. Mean Signal and Background images after employment of these images are shown below:

To quantify the separation of classes after pre-processing, the Bhattacharya Distance was calculated between the mean Signal and mean Background images. The results were:

Weiner: 0.279
Autograd: 0.332
Numpy: 0.151

To further test efficiency, the preprocessed images were run through the CNN to determine changes in accuracy. The unprocessed images gave an accuracy of 41.2%. The processed images gave the following accuracies:

Autograd: 76.2%
Numpy: 62.5%

Therefore, it was concluded that the preprocessing done by Autograd was the most useful in classification.

Large Hadron Collider Dataset:

At the culmination of this project, the algorithms were applied to the real Large Hadron Collider data available on the CERN website: Photon data and Electron data. The two classes in this dataset corresponded to Electrons and Photons, real-life examples of signal and background events. The dataset contained 40,000 images of dimensions (32,32,2) each, the two channels representing Energy and Time. An example image of this dataset is given below:

The Jupyter Notebook for this portion of the project shows three CNN models developed for this dataset. The model which gave the best accuracy had the following architecture:

The accuracy achieved was 67.3%.

References:

Jet-Images– Deep Learning Edition - De Oliveira, L., Kagan, M., Mackey, L., Nachman, B., Schwartzman, A. (2015)
Eigenfaces for Face Recognition, Matthew Turk and Alex Pentland (1991)
Large Hadron Collider, CERN
Collider Physics Lecture I - Sreerup Raychaudhuri, TIFR
Fourier Optics and Computational Imaging by Kedar Khare
Introduction To Elementary Particles 3rd Edition, David Griffiths

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
CNN weights		CNN weights
Images		Images
Jupyter_Notebooks		Jupyter_Notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jet Image Classification:

Data:

Algorithm 1: Principal Component Analysis (PCA)

Algorithm 2: Convolutional Neural Networks (CNNs)

Image Preprocessing:

Large Hadron Collider Dataset:

References:

About

Releases

Packages

Languages

License

jayantee12/Jet-Image-classification-MScThesis

Folders and files

Latest commit

History

Repository files navigation

Jet Image Classification:

Data:

Algorithm 1: Principal Component Analysis (PCA)

Algorithm 2: Convolutional Neural Networks (CNNs)

Image Preprocessing:

Large Hadron Collider Dataset:

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages