Skip to content

ncdisrup-ai/ProjectTypeClassification_TransformerFineTune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ProjectTypeClassification Finetuning DistilBert

Classify Project type with fine tuning a transformer model (DistilBert)

πŸ€” What is this?

Description: How to know the type of IT projects/contracts (Services) that have been carrying out in the Portuguese Public Administration?. In addition to project values, what type of projects have been developed and by whom? What is the competition and what type of services do they focus on? Strategic Digital Transformation Consulting; in Project Management/PMO; in Requirements Gathering/Analysis or Architecture; in Project Implementation; in Support/Maintenance; in Change Management; or in SaaS?

It involves the research and design of Artificial Intelligence algorithms, in this project we will be fine tuning a transformer model (DistilBert) for the Multiclass text classification problem. Given a Project description the model will classify into one of the project categories out of the given list.

πŸ“š Data

Data with the projects (to train the model and to apply the model) are in data dir.

We are using Project Descriptions from Portuguese Public Administration site of Contract aggregator dataset available at base.gov Repository.

Please bear in mind that this data has already been cleaned and processed: ContratosAP_v5.2_TrainPred.xlsx.

Dataframe DadosTreinoVal has 751 rows of data to train and test. Where each row has the following data-point:

- Objeto do Contrato: Contract Object
		 
- Contrato (Tipo): Type of Contract

Type of Contract to be able to classify IT project descriptions from Public Portuguese Administration into the following categories:

 - Digital Transformation (0)
	 
 - Project Management/PMO (1)
	 
 - Requirements Definition/Analysis/Architecture (2)
	 
 - Implementation (3)
	 
 - Support/Maintenance (4)
	 
 - Change Management (5)
	 
 - Licenses (6)
	 
 - SaaS (7)

πŸš€ Quick Install

Due to the power of GPU needed i advise you to use colab with ClassifyContratType_TransformersFineTunEx_v2_colab.ipynb(in classification dir)

copy data (in data dir) ContratosAP_v5.2_TrainPred.xlsx to sample_data.

Run

πŸ“– Documentation

Please see the description in .ipynb about this project.

πŸš€ Results (Applied AI vs Transformers)

Having in consideration that we have few data (751 projects) to fine tune the 8 project categories, weΒ΄ve understood that distilbert focus on the categories with more data, so we only managed to obtain about 50% acurracy (for instance with Support Linear Classifier - SVC - Applied AI, with the same data to train we manage to obtain 84% acurracy).

About

Classify Project type with fine tuning a transformer model (DistilBert)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published