Trait Prediction from Dutch Company Names

This repository contains the full codebase and outputs for the thesis project:
"Trait Prediction from Dutch Company Names using Surface and Semantic Features."
We investigate whether letter-level patterns and semantic embeddings can predict human perceptions across four traits:

Femininity, Evilness, Trustworthiness, Smartness

📌 Overview

We trained and evaluated models using two algorithms:

ElasticNet (for interpretable linear baselines)
Feedforward Neural Networks (FFNNs) (for non-linear modeling)

We tested four feature sets:

🟦 Unigrams (letter counts)
🟩 Bigrams (letter pairs)
🟨 RobBERT (Dutch semantic embeddings)
🟥 Combined (all features together)

📂 Folder Structure

File/Folder	Description
`main.ipynb`	Full pipeline for all experiments and models
`models/`	Trained models, metrics, feature importances, and plots

⚠️ Some preprocessing steps are not in the notebook but all final modeling data is included.

📊 What You Can Explore

✔️ Compare model performance across all four traits
🔍 See ElasticNet feature importances per trait (letters, bigrams, or embeddings)
🧪 View charts not shown in the thesis
🔁 Adapt the code to predict traits in other domains (e.g., product names)

🧪 Trained Models

We trained:

ElasticNet regressors and Feedforward Neural Networks (FFNNs)
Across 4 feature sets: Unigram, Bigram, RobBERT, and Combined

Each model’s results (train/val/test R², feature count, etc.) are stored in: models/{model_name}/ffnn_metrics.csv models/{model_name}/all_models_metrics.csv

yaml Copy Edit

All test plots and coefficient visualizations can be found in the models/ folder — including additional ones not shown in the thesis.

⚙️ Setup

Tested on Windows 10, with Python 3.13.2.

Install dependencies using:

pip install -r requirements.txt

Core packages:

scikit-learn==1.4.2

torch==2.7.0, transformers==4.51.3

pandas, numpy, matplotlib, joblib

🔬 Research Context

This work explores how linguistic cues in names — from letters to embeddings — shape trait perception. All results are explained in detail in the accompanying thesis.

📜 Citation

If this repo helps your research, please cite the thesis or link back to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
models		models
.gitignore		.gitignore
README.md		README.md
best_models.py		best_models.py
best_models_per_trait.png		best_models_per_trait.png
best_r2_barplot.png		best_r2_barplot.png
main.ipynb		main.ipynb
requirements.txt		requirements.txt
thesis.pdf		thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trait Prediction from Dutch Company Names

📌 Overview

📂 Folder Structure

📊 What You Can Explore

🧪 Trained Models

⚙️ Setup

Install dependencies using:

Core packages:

🔬 Research Context

📜 Citation

About

Uh oh!

Releases

Packages

Languages

hiwafeizi/thesis

Folders and files

Latest commit

History

Repository files navigation

Trait Prediction from Dutch Company Names

📌 Overview

📂 Folder Structure

📊 What You Can Explore

🧪 Trained Models

⚙️ Setup

Install dependencies using:

Core packages:

🔬 Research Context

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages