Introduction

This repository contains my dissertation. I presented this work as my final year project.

The paper is about how datasets are affected by the values of complexity metrics, and how some techniques that try to mitigate the effect of some of those metrics affect the evaluation results.

Pre-requisites

To be able to execute the experiments within the repository, Python3 is needed. Anaconda or the official Python located in the official repositories can be used as long as version 3 or posterior is used. Trying to replicate the experiments on some operative systems might terminate in error. If this is the case, python can be changed (Linux) with:

sudo update-alternatives --config python

R (programming language) is needed before trying to execute the project. Also, the following packages are mandatory in order to replicate the experiments:

ECoL - Dataset Complexity Metrics Package.
- ECoL GitHub
- ECoL Documentation
Rserve - server, responds requests made to R.

Once the previous requisites are fulfilled, the R server can be started by executing the following commands:

library(Rserve) # import the library
run.Rserve() # start the server. Or simply Rserve()

Now, it is time to download the project to install the remaining Python packages. The project can be downloaded from this repo.

Same as before, some Python packages are mandatory to execute the project. These packages are available in requirements.txt file. To automatically install those packages, run (execute all commands from parent repository):

pip install -r .\code\requirements.txt

It might happen that pip install -r might not install all packages. To solve this, the failing packages must be installed manually:

pip install <package_name>

Execution

Now, the experiments should be replicable. The experiment's code is under the code folder. To run them, execute:

python code/metrics_comparison.py
python code/metrics_kfold.py
python code/metrics_kfold_undersampling.py
python code/metrics_kfold_oversampling.py

Each of the previous commands execute one experiment.

As final remarks, the class r_connect.py (go here) is the client connection the server in R (Rserve). It makes the requests to the ECoLpackage to obtain the complexity metrics.

The class data.py (go here) standardizes the datasets input (parsing data) and some other metrics from the package sklearn.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
ProjectProposal		ProjectProposal
code		code
dataset		dataset
notebook		notebook
paper		paper
setup		setup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Pre-requisites

Execution

About

Releases

Packages

Languages

License

PabloAceG/ComputingProject

Folders and files

Latest commit

History

Repository files navigation

Introduction

Pre-requisites

Execution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages