Written using Python 2.7. Packages are listed in requierements.txt
.
download the images using this link into ./images
folder
Experiment ID refers to two different experiments:
exp_id=4
--> 312 classes 1950 images (at least 4 img per class)
exp_id=8
--> 64 classes 512 images (exactly 8 img per class)
-
If you want to perform the experiments on all 10 splits, you have to change the the range of for loops inside the functions below.
-
If you want to try with the other experiment set, go and change
exp_id = 8
in the classification scripts. I did not add any argument parser.
Eigen-flukes - Run classify_eigen.py
without any modification to get test set experiment accuracy for experiment 4, split 1. Each split runs about a minute.
SIFT experiment - Run classify_sift.py
without any modification to get test set
accuracy for experiment 4, split 1.
Each split runs about a minute for exp_id=8
and about 10 minutes for exp_id=4
.
opencv_funcs
: Contains many opencv functions that I used during my experiments.
first_n_good_matches()
function is used by SIFT detector.
th_segmentation
: I used this to experiment with thresholding methods. Did not
work out well.
eigenfluke
: Contains the functions such as PCA transforming, reconstruction,
symmetry correction etc. that I used for eigenfluke classification.
my_utils
: Contains basic functions that are used for object storage and
accuracy calvulations.
db_utils
: Contains functions that I used when discarding some of the data
and extracting file info from directory of images. Contains also
the function that I used for train test indicecs splitting.
/images
--> contains dataset
/obj
--> contains objects that are saved and used later on, such as
accuracy records as Pandas Dataframe
/obj/pca_obj
--> PCA weights are saved here, in order to avoid computing PCA
if it already exists. This folder can grow very large. Therefore it is
set to not saving PCA in the current settings. If you want to use PCA for
repeated experiments, enable it from PCA_Classification
initialization in
classify_eigen.py
/obj/split_ind
--> contains train, validation and test split indices for two experiments.
/data_cleaning
--> contains the scripts that I used for data cleaning and for other
irrelevant preprocessing purposes. These scripts are not meant to be
re-run, I only include them for reference.
train_info(exp_id)
: a Pandas DataFrame object that stores image path, Id and various
other information
uniq_ids(exp_id)
: an object that stores unique ID names that belong to an
experiment set