Skip to content
View vla6's full-sized avatar

Block or report vla6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vla6/README.md

Valerie Carey's Personal Repository

This is my personal code stash, where I explore some of my favorite topics, including:

  • Categorical feature embeddings and encodings
  • Model explainaibilty methods and their limitations
  • Testing processes to detect and mitigate bias in machine learning models

See also vcarey-circlestar

Towards Data Science

You're a Towards Data Science Reader? Get the code used for my articles here:

Topics: Python / Tensorflow / tSNE / kMeans / SHAP / Entity Embeddings / Data Visualization

Pretty pictures illustrate how stochastic regularization impacts entity embeddings in neural network models.

Read the article at TDS or Medium.

Topics: Python / Tensorflow / Stochastic Regularization / Data Generators / Entity Embeddings

Exploring how messing with data can increase model accuracy and robustness, when using entity embeddings in neural network models.

Read the article at TDS or Medium.

Topics: Python / XGBoost / Target Encoding / Categorical Feature Encoding

A comparison of several methods of encoding categorical features for XGBoost machine learning models. Over-engineering of features can be problematic for missing or unseen codes. Results suggest that simpler methods, in conjunction with data manipulation, may work better for changing business environments or code sets that update frequently.

Read the article at TDS or Medium.

Topics: Python / XGBoost / Neural Networks / Target Encoding / Deep Graph Infomax

In which I test a proposed method for encoding categorical features for machine learning models, e.g. XGBoost and neural networks, and discover tradeoffs between performance and robustness, which suggest alternative ways to approach these features.

Read the article at TDS or Medium.

Topics: Python / XGBoost / Model Explainability / SHAP / ALE

A deep dive into why two popular model explainability methods, SHAP and ALE, show opposite results in a public dataset. This leads me to go on about how model "explainers" don't produce "explanations". Instead, these methods should be viewed as diagnostic tests which must be interpreted thoughtfully.

Read the article at TDS or Medium.

Topics: Python / Logistic Regression / Fairness Metrics

A demonstration that common model fairness metrics can differ by group when the groups have different age profiles, even if everything else is the same. I argue that metrics alone are not useful in assessing fairness, but it's necessary to understand reasons underlying the outcomes.

Read the article at TDS or Medium.

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP

Part of a series presented at the ROC Data Science Meetup. The article argues that it's important to be extra careful with model bias / fairness when key information is missing from your models. Relying on a bunch of weak predictors increases bias risk.

Read the article at TDS or Medium.

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP / Feature Bias

Part of a talk presented at the ROC Data Science Meetup. Explores some ways to mitigate feature bias which leads to unfair outcomes in a machine learning model.

Read the article at TDS or Medium.

Topics: R / XGBoost / Random Forest / Fairness Metrics / SHAP / Feature Bias

Part of a talk presented at the ROC Data Science Meetup. Argues that we can't assume that machine learning models will "automatically" incorporate interactions to compensate for feature bias.

Read the article at TDS or Medium.

Topics: R / XGBoost / Fairness Metrics / SHAP

Part of a talk presented at the ROC Data Science Meetup. Demonstrates that the same fairness metric results can be seen whether a model is basing its decision on a sensitive feature, or on a legitimate predictor correlated with that feature.

Read the article at TDS or Medium.

Other

A few other miscellaneous repositories are also here.

Topics: R / Personal Data / Baby Sleep / Data Visualization / ANOVA / Autocorrelation

Code for a talk presented at the ROC Data Science Meetup. After my kid was born, we tracked his sleep data for ~2 years. He was a tough baby and so I read all the official parenting guides, which described what we "should" expect from "normal" children. The data gave me an opportunity to quantify my child's abnormality, but I also got data sets from other parents with similar data, and found that no baby was really normal. Is this selection bias, or are these expert guidelines mostly nonsense? Who knows, but it was an interesting personal data project. I actually have even more data from another child now and may want to revisit this topic in the future...

Topics: Azure / Hyperdrive / Parquet / ParallelRunStep

I ran into bugs while training and deploying a machine learning model at scale with Azure. The best way to get support was to write short programs to reproduce issues. This is code for those, luckily most of these issues are fixed now.

Popular repositories Loading

  1. Blog_gnn_naics Blog_gnn_naics Public

    Exploring categorical features with various encodings and models

    Jupyter Notebook 4

  2. Blog_naics_nn Blog_naics_nn Public

    Investigations of stochastic randomization for entity embeddings in neural network models, including visualizations of embeddings

    Jupyter Notebook 2

  3. Stereotyping_ROCDS Stereotyping_ROCDS Public

    Examines fairness metrics for models including gender stereotyping versus group differences due to appropriate predictors. Also explores feature bias mitigation

    R 1

  4. Blog_interactions Blog_interactions Public

    Comparisons of methods used to measure model interactions

    Jupyter Notebook 1

  5. Baby_Sleep Baby_Sleep Public

    Infant sleep tracking, visualizations, and analytics

    R 1

  6. Azure_notes Azure_notes Public

    Files related to Azure questions and issues

    Jupyter Notebook 3