PracticalNCD

Code used to generate the results of the DMKD journal paper A Practical Approach to Novel Class Discovery in Tabular Data

🔍 Overview

This python library proposes an ensemble tools for the Machine Learning problem of Novel Class Discovery.

In this library, you will find the following tools illustrated through Jupyter Notebooks:

An hyperparameter optimization procedure tailored to transfer the results from the known classes to the novel classes.
An estimation of the number of clusters by applying clustering quality metrics in the latent space of NCD methods.
Two unsupervised clustering algorithms modified to utilize the data available in the NCD setting.
A novel method called PBN (for Projection-Based NCD).

🐍 Setting up the Python environment

Option 1 - With Anaconda:

# Create the virtual environment and install the packages with conda
conda env create --file environment.yml --prefix ./venvpracticalncd

# Activate the virtual environment
conda activate .\venvpracticalncd

# Add package missing from conda repositories
pip install iteration-utilities==0.11.0

Option 2 - Without Anaconda:

Prerequisite: having Python 3.10.9 the default python 3.10 version.

# Create the empty virtual environment
py -3.10 -m venv venvpracticalncd

# Activate the virtual environment
# On windows:
  .\venvpracticalncd\Scripts\activate
# On linux:
  source venvpracticalncd/bin/activate
  
# Install the needed packages
pip install -r requirements.txt

# And finish by installing pytorch independently
pip install torch==1.12.1 --index-url https://download.pytorch.org/whl/cu113

Finishing touches

# Add the virtual environment as a jupyter kernel
ipython kernel install --name "venvpracticalncd" --user

# Check if torch supports GPU (you need CUDA 11 installed)
python -c "import torch; print(torch.cuda.is_available())"

💻 Usage

Three notebooks are available:

Full_notebook.ipynb lets you train and evaluate the models when the number of clusters k is known in advance.
Full_notebook_with_k_estimation.ipynb (self-explanatory).
results_wrt_n_unknown_classes.ipynb is used to evaluate the performance of all the models when the number of novel classes increases. It was used to generate Figure C1 of Appendix C.

📊 Datasets

The datasets will be automatically downloaded from https://archive.ics.uci.edu/ on the first execution.
If it fails, please try disabling proxies.

However, the data splits for some datasets are random and the results can vary compared to the paper.

The most impacted datasets are:

LetterRecognition
USCensus1990
multiple_feature

📜 Citation

If you found this work useful, please use the following citation:

@article{tr2024practical,
   title = {A Practical Approach to Novel Class Discovery in Tabular Data},
   author = {Troisemaine, Colin and Reiffers{-}Masson, Alexandre and Gosselin, St{'{e}}phane and Lemaire, Vincent and Vaton, Sandrine},
   journal = {Data Mining and Knowledge Discovery},
   year = {2024},
   month = {May},
   day = {31},
   issn = {1573-756X},
   doi = {10.1007/s10618-024-01025-y}
}

⚖️ License

This code is released under the MIT license. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
models		models
src		src
.gitignore		.gitignore
Full_notebook.ipynb		Full_notebook.ipynb
Full_notebook_with_k_estimation.ipynb		Full_notebook_with_k_estimation.ipynb
LICENSE.txt		LICENSE.txt
Perf_wrt_n_unknown_classes.ipynb		Perf_wrt_n_unknown_classes.ipynb
README.md		README.md
THIRD_PARTY.md		THIRD_PARTY.md
environment.yml		environment.yml
hyperparameters.json		hyperparameters.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PracticalNCD

🔍 Overview

🐍 Setting up the Python environment

Option 1 - With Anaconda:

Option 2 - Without Anaconda:

Finishing touches

💻 Usage

📊 Datasets

📜 Citation

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ColinTr/PracticalNCD

Folders and files

Latest commit

History

Repository files navigation

PracticalNCD

🔍 Overview

🐍 Setting up the Python environment

Option 1 - With Anaconda:

Option 2 - Without Anaconda:

Finishing touches

💻 Usage

📊 Datasets

📜 Citation

⚖️ License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages