-
Notifications
You must be signed in to change notification settings - Fork 30
Contingency Tables
Contingency tables show the observed counts of all possible combinations of outcomes in a discrete, bivariate sample. They are the most common analysis tool for such samples, and arise often in clinical trials and classifiers.
Suppose we have developed a test for some underlying condition. The result of the test is either P or N. The subject either has the underlying condition (True) or does not (False). Here is the data:
Test \ Actual | True | False |
---|---|---|
P | 35 | 65 |
N | 4 | 896 |
Very easily:
using System;
using Meta.Numerics.Statistics;
ContingencyTable<string, bool> contingency = new ContingencyTable<string, bool>(
new string[] { "P", "N" }, new bool[] { true, false }
);
contingency["P", true] = 35;
contingency["P", false] = 65;
contingency["N", true] = 4;
contingency["N", false] = 896;
Just give the constructor two lists with the distinct instances that will be the row and column labels.
If you have the data columns, just give them to the Bivariate.Crosstabs method and it will compute your contingency table. For example:
using System.Collections.Generic;
IReadOnlyList<string> x = new string[] { "N", "P", "N", "N", "P", "N", "N", "N", "P" };
IReadOnlyList<bool> y = new bool[] { false, false, false, true, true, false, false, false, true };
ContingencyTable<string, bool> contingencyFromLists = Bivariate.Crosstabs(x, y);
Notice that both the constructor and the Crosstabs methods take lists of objects, so don't confuse them. The constructor takes lists of distinct values that form the row and column labels, and initializes all counts to zero. The Crosstabs method takes (usually much longer) lists of paired measurements (which typically include repeats). It extracts the column labels and computes the counts.
Very easily:
foreach (string row in contingency.Rows) {
Console.WriteLine($"Total count of {row}: {contingency.RowTotal(row)}");
}
foreach (bool column in contingency.Columns) {
Console.WriteLine($"Total count of {column}: {contingency.ColumnTotal(column)}");
}
Console.WriteLine($"Total counts: {contingency.Total}");
Notice that, because our ContingencyTable type is generic with row and column type parameters, its methods accept typed arguments to refer to row and column labels. This makes code clearer and less error-prone.
using Meta.Numerics;
foreach (string row in contingency.Rows) {
UncertainValue probability = contingency.ProbabilityOfRow(row);
Console.WriteLine($"Estimated probability of {row}: {probability}");
}
foreach (bool column in contingency.Columns) {
UncertainValue probability = contingency.ProbabilityOfColumn(column);
Console.WriteLine($"Estimated probablity of {column}: {probability}");
}
Notice that you are not just given best estimates, but also error bars. (The best estimates are just what you would expect: the fraction of the total count represented by each row and column total. The computation of the error bars is more complicated, but Meta.Numerics handles it for you.)
UncertainValue sensitivity = contingency.ProbabilityOfRowConditionalOnColumn("P", true);
Console.WriteLine($"Chance of P result given true condition: {sensitivity}");
UncertainValue precision = contingency.ProbabilityOfColumnConditionalOnRow(true, "P");
Console.WriteLine($"Chance of true condition given P result: {precision}");
Notice that our example exhibits a very common characteristic of tests for rare conditions: even though the test is sensitive (i.e. has a low chance of giving the wrong result for a given condition), it is not precise (i.e. given a P result, there is nonetheless a high chance that the condition is not actually present). The confusion of these two conditional probabilities is infamously common, and is usually called the Prosecutor's fallacy. At the price of a couple of long method names, our API clearly distinguishes between them.
The canonical way to test for an association between discrete variables is the Pearson chi squared test, and Meta.Numerics can do that for you:
TestResult pearson = contingency.PearsonChiSquaredTest();
Console.WriteLine($"Pearson χ² = {pearson.Statistic.Value} has P = {pearson.Probability}.");
If some cell entries are small, as is the case with out example data, the assumptions of Pearson's test will not be well-fulfilled, and it is better to use Fisher's exact test. For 2X2 tables like this one, Meta.Numerics can do that for you too:
TestResult fisher = contingency.Binary.FisherExactTest();
Console.WriteLine($"Fisher exact test has P = {fisher.Probability}.");
For any half-decent classifier (or any decent treatment in a clinical trial), there will be a statistically significant association between row and column values, so the P-values will be tiny.
The odds ratio, or its log, is the usual way to quantify the degree of association for a 2X2 table. Meta.Numerics can compute it for you:
UncertainValue logOddsRatio = contingency.Binary.LogOddsRatio;
Console.WriteLine($"log(r) = {logOddsRatio}");
Notice that we give you an estimate with uncertainty. For any table with a statistically significant association between rows and columns, the error bars should exclude 0 for the log odds ratio.
No problem. You can construct and manipulate the in exactly the same ways. The only different is the Binary property, which is used to access the API surface specific to 2X2 tables, is not available. If you try to access the Binary property on a non-binary table, you will get an InvalidOperationException.
- Project
- What's New
- Installation
- Versioning
- Tutorials
- Functions
- Compute a Special Function
- Bessel Functions
- Solvers
- Evaluate An Integral
- Find a Maximum or Minimum
- Solve an Equation
- Integrate a Differential Equation
- Data Wrangling
- Statistics
- Analyze a Sample
- Compare Two Samples
- Simple Linear Regression
- Association
- ANOVA
- Contingency Tables
- Multiple Regression
- Logistic Regression
- Cluster and Component Analysis
- Time Series Analysis
- Fit a Sample to a Distribution
- Distributions
- Special Objects
- Linear Algebra
- Polynomials
- Permutations
- Partitions
- Uncertain Values
- Extended Precision
- Functions