diff --git a/README.md b/README.md index fc99055..aed82c1 100644 --- a/README.md +++ b/README.md @@ -270,7 +270,7 @@ We're excited to see so many articles popping up on data ethics! The short list To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. -We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/wiki/Add-a-new-item-to-the-examples-table) to add an example. +We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) to add an example. ## Related tools diff --git a/deon/assets/examples_of_ethical_issues.yml b/deon/assets/examples_of_ethical_issues.yml index 653a248..a9fad18 100644 --- a/deon/assets/examples_of_ethical_issues.yml +++ b/deon/assets/examples_of_ethical_issues.yml @@ -4,6 +4,8 @@ url: https://techcrunch.com/2018/09/27/yes-facebook-is-using-your-2fa-phone-number-to-target-you-with-ads/ - text: African-American men were enrolled in the Tuskegee Study on the progression of syphilis without being told the true purpose of the study or that treatment for syphilis was being withheld. url: https://en.wikipedia.org/wiki/Tuskegee_syphilis_experiment + - text: OpenAI's ChatGPT memorized and regurgitated entire poems without checking for copyright permissions. + url: https://news.cornell.edu/stories/2024/01/chatgpt-memorizes-and-spits-out-entire-poems - line_id: A.2 links: - text: StreetBump, a smartphone app to passively detect potholes, may fail to direct public resources to areas where smartphone penetration is lower, such as lower income areas or areas with a larger elderly population. @@ -22,8 +24,6 @@ url: https://www.bloomberg.com/graphics/2016-amazon-same-day/ - text: Facial recognition software is significanty worse at identifying people with darker skin. url: https://www.theregister.co.uk/2018/02/13/facial_recognition_software_is_better_at_white_men_than_black_women/ - - text: -- Related academic study. - url: http://proceedings.mlr.press/v81/buolamwini18a.html - line_id: B.1 links: - text: Personal and financial data for more than 146 million people was stolen in Equifax data breach. @@ -46,14 +46,8 @@ url: https://www.theverge.com/2014/9/25/6844021/apple-promised-an-expansive-health-app-so-why-cant-i-track - line_id: C.2 links: - - text: A widely used commercial algorithm in the healthcare industry underestimates the care needs of black patients, assigning them lower risk scores compared to equivalently sick white patients. - url: https://www.nature.com/articles/d41586-019-03228-6 - - text: -- Related academic study. - url: https://science.sciencemag.org/content/366/6464/447 - text: word2vec, trained on Google News corpus, reinforces gender stereotypes. url: https://www.technologyreview.com/s/602025/how-vector-space-mathematics-reveals-the-hidden-sexism-in-language/ - - text: -- Related academic study. - url: https://arxiv.org/abs/1607.06520 - text: Women are more likely to be shown lower-paying jobs than men in Google ads. url: https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study - line_id: C.3 @@ -72,6 +66,8 @@ url: https://www.bbc.com/news/magazine-22223190 - line_id: D.1 links: + - text: In hypothetical trials, language models assign the death penalty more frequently to defendants who use African American dialects. + url: https://arxiv.org/abs/2403.00742 - text: Variables used to predict child abuse and neglect are direct measurements of poverty, unfairly targeting low-income families for child welfare scrutiny. url: https://www.wired.com/story/excerpt-from-automating-inequality/ - text: Amazon scraps AI recruiting tool that showed bias against women. @@ -98,12 +94,16 @@ url: https://www.technologyreview.com/s/510646/racism-is-poisoning-online-ad-delivery-says-harvard-professor/ - text: -- Related academic study. url: https://arxiv.org/abs/1301.6822 + - text: OpenAI's GPT models show racial bias in ranking job applications based on candidate names. + url: https://www.bloomberg.com/graphics/2024-openai-gpt-hiring-racial-discrimination/ - line_id: D.3 links: - text: Facebook seeks to optimize "time well spent", prioritizing interaction over popularity. url: https://www.wired.com/story/facebook-tweaks-newsfeed-to-favor-content-from-friends-family/ - text: YouTube's search autofill suggests pedophiliac phrases due to high viewership of related videos. url: https://gizmodo.com/youtubes-creepy-kid-problem-was-worse-than-we-thought-1820763240 + - text: A widely used commercial algorithm in the healthcare industry underestimates the care needs of black patients because it optimizes for spending as a proxy for need, introducing racial bias due to unequal access to care. + url: https://www.science.org/doi/10.1126/science.aax2342 - line_id: D.4 links: - text: Patients with pneumonia with a history of asthma are usually admitted to the intensive care unit as they have a high risk of dying from pneumonia. Given the success of the intensive care, neural networks predicted asthmatics had a low risk of dying and could therefore be sent home. Without explanatory models to identify this issue, patients may have been sent home to die. @@ -128,9 +128,11 @@ links: - text: Google "fixes" racist algorithm by removing gorillas from image-labeling technology. url: https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai -- line_id: E.4 - links: - text: Microsoft's Twitter chatbot Tay quickly becomes racist. url: https://www.theguardian.com/technology/2016/mar/24/microsoft-scrambles-limit-pr-damage-over-abusive-ai-bot-tay +- line_id: E.4 + links: + - text: Generative AI can be exploited to create convincing scams like "virtual kidnapping". + url: https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/how-cybercriminals-can-perform-virtual-kidnapping-scams-using-ai-voice-cloning-tools-and-chatgpt - text: Deepfakes—realistic but fake videos generated with AI—span the gamut from celebrity porn to presidential statements. url: http://theweek.com/articles/777592/rise-deepfakes diff --git a/docs/docs/examples.md b/docs/docs/examples.md index 0cd51fa..779c881 100644 --- a/docs/docs/examples.md +++ b/docs/docs/examples.md @@ -7,28 +7,28 @@ To make the ideas contained in the checklist more concrete, we've compiled examp
Checklist Question
|
Examples of Ethical Issues
--- | --- |
**Data Collection**
-**A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent? | +**A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent? | **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those? | **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis? | -**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)? | +**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)? | |
**Data Storage**
**B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)? | **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed? | **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed? | |
**Analysis**
**C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)? | -**C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)? | +**C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)? | **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data? | **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis? | **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future? | |
**Modeling**
-**D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory? | -**D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)? | -**D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics? | +**D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory? | +**D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)? | +**D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics? | **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed? | **D.5 Communicate bias**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood? | |
**Deployment**
**E.1 Monitoring and evaluation**: How are we planning to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)? | **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)? | -**E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary? | -**E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed? | +**E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary? | +**E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed? | diff --git a/docs/docs/index.md b/docs/docs/index.md index 1d9dedf..5c15871 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -263,7 +263,7 @@ We're excited to see so many articles popping up on data ethics! The short list To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. -We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/wiki/Add-a-new-item-to-the-examples-table) to add an example. +We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) to add an example. ## Related tools diff --git a/docs/md_templates/_common_body.tpl b/docs/md_templates/_common_body.tpl index b107c78..c46a089 100644 --- a/docs/md_templates/_common_body.tpl +++ b/docs/md_templates/_common_body.tpl @@ -199,7 +199,7 @@ We're excited to see so many articles popping up on data ethics! The short list To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. -We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/wiki/Add-a-new-item-to-the-examples-table) to add an example. +We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) to add an example. ## Related tools diff --git a/docs/md_templates/_common_body_pt-BR.tpl b/docs/md_templates/_common_body_pt-BR.tpl index 3fee1fe..a0f246b 100644 --- a/docs/md_templates/_common_body_pt-BR.tpl +++ b/docs/md_templates/_common_body_pt-BR.tpl @@ -175,7 +175,7 @@ Estamos entusiasmados de ver tantos artigos surgindo sobre ética de dados! A cu Para tornar as ideias contidas na checklist mais concretas, compilamos [exemplos](http://deon.drivendata.org/examples/) de situações em que as coisas deram errado. Estão vinculadas a questões da checklist para ajudar a iluminar onde no processo as discussões éticas poderiam ter ajudado a criar uma correção no curso. -Nós aceitamos contribuições! Siga [estas instruções](https://github.com/drivendataorg/deon/wiki/Add-a-new-item-to-the-examples-table) para acrescentar um exemplo. +Nós aceitamos contribuições! Siga [estas instruções](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) para acrescentar um exemplo. ## Ferramentas relacionadas