This project trains a model using Multinomial Naive Bayes algorithm to predict gender of a person from his/her first name. For this project, we used a dataset
downloaded from data.gov which contains a zip file containing
142 txt files. There are files for every year from 1800 to 2021.
###Instruction
- Clone this repository:
git clone https://github.com/taeefnajib/predict-gender-from-first-name
-
Download the zip file from data.gov and unzip the
namesfolder. Place it in the working directory. -
Install all the dependencies:
pip install -r requirements.txt
-
data.pyprepare acsvfile from all thetxtfiles and pre-processes the dataset. You don't need to run it in the command line. -
train.pybuilds a model and trains it on the dataset. The repository contains the filesdata.csvandmodel.pkl. If you remove them and runtrain.py, this file will create the filesdata.csvandmodel.pkl -
test.pyusesargparseto allow users to predict genders from first names in the command line. Use--nameor-nfollowed by the name you want to predict gender for. Example:
python test.py --name Josh
- If you want to use
FastAPIinstead, you can do it:
uvicorn main:app --reload
This will open Swagger UI interface at 127.0.0.1 using port 8080 (if it is available). If you use the first name as a string it will reuturn a dictionary
for Gender and Probability