vla6

Follow

Valerie Carey vla6

Follow

Data Scientist

7 followers · 3 following

@vcarey-circlestar
Rochester, NY
https://medium.com/@vla6
https://towardsdatascience.com/author/vla6/
in/valerie-carey

Achievements

Achievements

vla6/README.md

Valerie Carey's Personal Repository

This is my personal code stash, where I explore some of my favorite topics, including:

Categorical feature embeddings and encodings
Model explainaibilty methods and their limitations
Testing processes to detect and mitigate bias in machine learning models

See also vcarey-circlestar

Towards Data Science

You're a Towards Data Science Reader? Get the code used for my articles here:

Visualizing Stochastic Regularization for Entity Embeddings

Topics: Python / Tensorflow / tSNE / kMeans / SHAP / Entity Embeddings / Data Visualization

Pretty pictures illustrate how stochastic regularization impacts entity embeddings in neural network models.

Read the article at TDS or Medium.

Data Disruptions to Elevate Entity Embeddings

Topics: Python / Tensorflow / Stochastic Regularization / Data Generators / Entity Embeddings

Exploring how messing with data can increase model accuracy and robustness, when using entity embeddings in neural network models.

Read the article at TDS or Medium.

No Label Left Behind: Alternative Encodings for Hierarchical Categoricals

Topics: Python / XGBoost / Target Encoding / Categorical Feature Encoding

A comparison of several methods of encoding categorical features for XGBoost machine learning models. Over-engineering of features can be problematic for missing or unseen codes. Results suggest that simpler methods, in conjunction with data manipulation, may work better for changing business environments or code sets that update frequently.

Read the article at TDS or Medium.

Exploring Hierarchical Blending in Target Encoding

Topics: Python / XGBoost / Neural Networks / Target Encoding / Deep Graph Infomax

In which I test a proposed method for encoding categorical features for machine learning models, e.g. XGBoost and neural networks, and discover tradeoffs between performance and robustness, which suggest alternative ways to approach these features.

Read the article at TDS or Medium.

SHAP vs. ALE for Feature Interactions: Understanding Conflicting Results

Topics: Python / XGBoost / Model Explainability / SHAP / ALE

A deep dive into why two popular model explainability methods, SHAP and ALE, show opposite results in a public dataset. This leads me to go on about how model "explainers" don't produce "explanations". Instead, these methods should be viewed as diagnostic tests which must be interpreted thoughtfully.

Read the article at TDS or Medium.

Measuring “Fairness” When Ages Differ

Topics: Python / Logistic Regression / Fairness Metrics

A demonstration that common model fairness metrics can differ by group when the groups have different age profiles, even if everything else is the same. I argue that metrics alone are not useful in assessing fairness, but it's necessary to understand reasons underlying the outcomes.

Read the article at TDS or Medium.

Feature Choice and Fairness: Less May be More

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP

Part of a series presented at the ROC Data Science Meetup. The article argues that it's important to be extra careful with model bias / fairness when key information is missing from your models. Relying on a bunch of weak predictors increases bias risk.

Read the article at TDS or Medium.

How to Fix Feature Bias

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP / Feature Bias

Part of a talk presented at the ROC Data Science Meetup. Explores some ways to mitigate feature bias which leads to unfair outcomes in a machine learning model.

Read the article at TDS or Medium.

No Free Lunch with Feature Bias

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP / Feature Bias

Part of a talk presented at the ROC Data Science Meetup. Argues that we can't assume that machine learning models will "automatically" incorporate interactions to compensate for feature bias.

Read the article at TDS or Medium.

Fairness Metrics Won’t Save You from Stereotyping

Topics: R / XGBoost / Fairness Metrics / SHAP

Part of a talk presented at the ROC Data Science Meetup. Demonstrates that the same fairness metric results can be seen whether a model is basing its decision on a sensitive feature, or on a legitimate predictor correlated with that feature.

Read the article at TDS or Medium.

Other

A few other miscellaneous repositories are also here.

Baby_Sleep

Topics: R / Personal Data / Baby Sleep / Data Visualization / ANOVA / Autocorrelation

Code for a talk presented at the ROC Data Science Meetup. After my kid was born, we tracked his sleep data for ~2 years. He was a tough baby and so I read all the official parenting guides, which described what we "should" expect from "normal" children. The data gave me an opportunity to quantify my child's abnormality, but I also got data sets from other parents with similar data, and found that no baby was really normal. Is this selection bias, or are these expert guidelines mostly nonsense? Who knows, but it was an interesting personal data project. I actually have even more data from another child now and may want to revisit this topic in the future...

Azure_notes

Topics: Azure / Hyperdrive / Parquet / ParallelRunStep

I ran into bugs while training and deploying a machine learning model at scale with Azure. The best way to get support was to write short programs to reproduce issues. This is code for those, luckily most of these issues are fixed now.

Popular repositories Loading

Blog_gnn_naics Blog_gnn_naics Public

Exploring categorical features with various encodings and models

Jupyter Notebook 4
Blog_naics_nn Blog_naics_nn Public

Investigations of stochastic randomization for entity embeddings in neural network models, including visualizations of embeddings

Jupyter Notebook 2
Stereotyping_ROCDS Stereotyping_ROCDS Public

Examines fairness metrics for models including gender stereotyping versus group differences due to appropriate predictors. Also explores feature bias mitigation

R 1
Blog_age_fairness Blog_age_fairness Public

Jupyter notebook simulating fairness metric results for race/ethnicity group for a process that depends on age only

Jupyter Notebook 1
Blog_interactions Blog_interactions Public

Comparisons of methods used to measure model interactions

Jupyter Notebook 1
Baby_Sleep Baby_Sleep Public

Infant sleep tracking, visualizations, and analytics

R 1