Skip to content

Commit

Permalink
Merge pull request #29 from Mattehub/main
Browse files Browse the repository at this point in the history
adding TC, DTC, Sinfo, link to class
  • Loading branch information
EtienneCmb authored Nov 23, 2023
2 parents e7d4196 + b1a2d95 commit 9206e9f
Show file tree
Hide file tree
Showing 3 changed files with 93 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/overview/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Overview
This section contains References and Mathematical background about information-theory and HOI

.. toctree::
:maxdepth: 3
:maxdepth: 4

ovw_theory
ovw_refs
Expand Down
77 changes: 62 additions & 15 deletions docs/overview/ovw_theory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Core information theoretic measures
In this section, we delve into some fundamental information theoretic measures, such as Shannon Entropy and Mutual Information (MI), and their applications in the study of pairwise interactions. Besides the fact that these measures play a crucial role in various fields such as data science and machine learning, as we will see in the following parts, they serve as the building blocks for quantifying information and interactions between variables at higher-orders.

Measuring Entropy
-----------------
*****************

Shannon entropy is a fundamental concept in IT, representing the amount of uncertainty or disorder in a random variable :cite:`shannon1948mathematical`. Its standard definition for a discrete random variable :math:`X`, with probability mass function :math:`P(X)`, is given by:

Expand All @@ -29,7 +29,7 @@ A more complicated and common scenario is the one of continuous variables. To es
Note that all the functions mentioned in the following part are based on the computation of entropies, hence we advise care in the choice of the estimator to use.

Measuring Mutual Information (MI)
---------------------------------
*********************************

One of the most used functions in the study of pairwise interaction is the Mutual Information (MI) that quantifies the statistical dependence or information shared between two random variables :cite:`shannon1948mathematical, watanabe1960information`. It is defined mathematically using the concept of entropies. For two random variables X and Y, MI is given by:

Expand All @@ -43,37 +43,84 @@ Where:
:math:`H(X,Y)` is the joint entropy of :math:`X` and :math:`Y`.
MI between two variables, quantifies how much knowing one variable reduces the uncertainty about the other and measures the interdependency between the two variables. If they are independent, we have :math:`H(X,Y)=H(X)+H(Y)`, hence :math:`MI(X,Y)=0`. Since the MI can be reduced to a signed sum of entropies, the problem of how to estimate MI from continuous data can be reconducted to the problem, discussed above, of how to estimate entropies. An estimator that has been recently developed and presents interesting properties when computing the MI is the Gaussian Copula estimator :cite:`ince2017statistical`. This estimator is based on the statistical theory of copulas and is proven to provide a lower bound to the real value of MI, this is one of its main advantages: when computing MI, Gaussian copula estimator avoids false positives. Play attention to the fact that this can be mainly used to investigate relationships between two variables that are monotonic.

From pairwise to higher-order interactions
++++++++++++++++++++++++++++++++++++++++++
From pairwise to higher-order interactions
++++++++++++++++++++++++++++++++++++++++++

The information theoretic metrics involved in this work are all based in principle on the concept of Shannon entropy and mutual information. Given a set of variables, a common approach to investigate their interaction is by comparing the entropy and the information of the joint probability distribution of the whole set with the entropy and information of different subsets. This can be done in many different ways, unveiling different aspects of HOIs :cite:`timme2014synergy, varley2023information`. The metrics implemented in the toolbox can be divided in two main categories: a group of metrics focus on the relationship between a set of source variables and a target one and another group measures the interactions within a set of variables. In the following part we are going through all the metrics that have been developed in the toolbox, providing some insights about their theoretical foundation and possible interpretations.
The information theoretic metrics involved in this work are all based in principle on the concept of Shannon entropy and mutual information. Given a set of variables, a common approach to investigate their interaction is by comparing the entropy and the information of the joint probability distribution of the whole set with the entropy and information of different subsets. This can be done in many different ways, unveiling different aspects of HOIs :cite:`timme2014synergy, varley2023information`. The metrics implemented in the toolbox can be divided in two main categories: a group of metrics measures the interaction behaviour prevailing within a set of variable, network behaviour, another group of metrics instead focuses on the relationship between a set of source variables and a target one, network encoding. In the following parts we are going through all the metrics that have been developed in the toolbox, providing some insights about their theoretical foundation and possible interpretations.

O-information
Network behaviour
*****************

Total correlation
-----------------

Total correlation, :class:`hoi.metrics.TC`, is the oldest exstension of mutual information to
an arbitrary number of variables :cite:`watanabe1960information, studeny1998multiinformation`. It is defined as:

.. math::
TC(X^{n}) &= \sum_{j=1}^{n} H(X_{j}) - H(X^{n}) \\
The total correlation quantifies the strength of collective constraints ruling the systems, it is sentive to information shared between single variables and it can be associated with redundancy.


Dual Total correlation
----------------------

Dual total correlation, :class:`hoi.metrics.DTC`, is another extension of mutual information to
an arbitrary number of variables, also known as binding information and excess entropy, :cite:`sun1975linear`. It quatifies the part of the joint entropy that is shared by at least two or more variables in the following way:

.. math::
DTC(X^{n}) &= H(X^{n}) - \sum_{j=1}^{n} H(X_j|X_{-j}^{n}) \\
&= \sum_{j=1}^{n} H(X_j) - (n-1)H(X^{n})
where :math:`\sum_{j=1}^{n} H(X_j|X_{-j}^{n})` is the entropy of :math:`X_j` not shared by any other variable. This measure is higher in systems in which lower order constraints prevails.

S information
-------------

One prominent metric that has emerged in the pursuit of higher-order understanding is the O-information. Introduced by Rosas in 2019 :cite:`rosas2019oinfo`, O-information elegantly addresses the challenge of quantifying higher-order dependencies by extending the concept of mutual information. Given a multiplet of :math:`n` variables, :math:`X^n = \{ X_0, X_1, …, X_n \}`, its formal definition is the following:
The S-information (also called exogenous information), :class:`hoi.metrics.Sinfo`, is defined
as the sum between the total correlation (TC) plus the dual total
correlation (DTC), :cite:`james2011anatomy`:

.. math::
\Omega(X^n)=(n-2)H(X^n)+\sum_{i=1}^n \left[ H(X_i) - H(X_{-i}^n) \right]
\Omega(X^{n}) &= TC(X^{n}) + DTC(X^{n}) \\
&= nH(X^{n}) + \sum_{j=1}^{n} [H(X_{j}) + H(
X_{-j}^{n})]
Where :math:`X_{-i}` is the set of all the variables in :math:`X^n` apart from :math:`X_i`. The O-information can be written also as the difference between the total correlation and the dual total correlation and reflects the balance between higher-order and lower-order constraints among the set of variables of interest. It is shown to be a proxy of the difference between redundancy and synergy: when the O-information of a set of variables is positive this indicates redundancy, when it is negative, synergy. In particular when working with big data sets it can become complicated
It is sensitive to both redundancy and synergy, quantifying the total ammount of constraints ruling the system under study.

O-information
-------------

One prominent metric that has emerged in the pursuit of higher-order understanding is the O-information, :class:`hoi.metrics.Oinfo`. Introduced by Rosas in 2019 :cite:`rosas2019oinfo`, O-information elegantly addresses the challenge of quantifying higher-order dependencies by extending the concept of mutual information. Given a multiplet of :math:`n` variables, :math:`X^n = \{ X_0, X_1, …, X_n \}`, its formal definition is the following:

.. math::
\Omega(X^n)= (n-2)H(X^n)+\sum_{i=1}^n \left[ H(X_i) - H(X_{-i}^n) \right]
Where :math:`X_{-i}` is the set of all the variables in :math:`X^n` apart from :math:`X_i`. The O-information can be written also as the difference between the total correlation and the dual total correlation and reflects the balance between higher-order and lower-order constraints among the set of variables of interest. It is shown to be a proxy of the difference between redundancy and synergy: when the O-information of a set of variables is positive this indicates redundancy, when it is negative, synergy. In particular when working with big data sets it can become complicated

Topological information
-----------------------

The topological information, a generalization of the mutual information to higher-order, :math:`I_k` has been introduced and presented to test uniformity and dependence in the data :cite:`baudot2019infotopo`. Its formal definition is the following:
The topological information, :class:`hoi.metrics.InfoTopo`, a generalization of the mutual information to higher-order, :math:`I_k` has been introduced and presented to test uniformity and dependence in the data :cite:`baudot2019infotopo`. Its formal definition is the following:

.. math::
I_{k}(X_{1}; ...; X_{k}) = \sum_{i=1}^{k} (-1)^{i - 1} \sum_{I\subset[k];card(I)=i} H_{i}(X_{I})
Note that :math:`I_2(X,Y) = MI(X,Y)` and that :math:`I_3(X,Y,Z)=\Omega(X,Y,Z)`. As the O-information this function can be interpreted in terms of redundancy and synergy, more into details when it is positive it indicates that the system is dominated by redundancy, when it is negative, synergy.

Network encoding
****************

Gradient of O-information
-------------------------

The O-information gradient has been developed to study the contribution of one or a set of variables to the O-information of the whole system :cite:`scagliarini2023gradients`. In this work we proposed to use this metric to investigate the relationship between multiplets of source variables and a target variable. Following the definition of the O-information gradient of order 1 we have:
The O-information gradient, :class:`hoi.metrics.GradientOinfo`, has been developed to study the contribution of one or a set of variables to the O-information of the whole system :cite:`scagliarini2023gradients`. In this work we proposed to use this metric to investigate the relationship between multiplets of source variables and a target variable. Following the definition of the O-information gradient of order 1 we have:

.. math::
Expand All @@ -84,7 +131,7 @@ This metric does not focus on the O-information of a group of variables, instead
Redundancy-Synergy index (RSI)
------------------------------

Another metric, proposed by Gal Chichek et al in 2001 :cite:`chechik2001group`, is the Redundancy-Synergy index, developed as an extension of mutual information, aiming to characterize the statistical interdependencies between a group of variables :math:`X^n` and a target variable :math:`Y`, in terms of redundancy and synergy, it is computed as:
Another metric, proposed by Gal Chichek et al in 2001 :cite:`chechik2001group`, is the Redundancy-Synergy index, :class:`hoi.metrics.RSI`, developed as an extension of mutual information, aiming to characterize the statistical interdependencies between a group of variables :math:`X^n` and a target variable :math:`Y`, in terms of redundancy and synergy, it is computed as:

.. math::
Expand All @@ -95,13 +142,13 @@ The RSI is designed to measure directly whether the sum of the information provi
Synergy and redundancy (MMI)
----------------------------

Within the broad research field of IT a growing body of literature has been produced in the last 20 years about the fascinating concepts of synergy and redundancy. These concepts are well defined in the framework of Partial Information Decomposition, which aims to distinguish different “types” of information that a set of sources convey about a target variable. In this framework, the synergy between a set of variables refers to the presence of relationships between the target and the whole group that cannot be seen when considering separately the single parts. Redundancy instead refers to another phenomena, in which variables contain copies of the same information about the target. Different definition have been provided in the last years about these two concepts, in our work we are going to report the simple case of the Minimum Mutual Information (MMI), proposed by Barrett in :cite:`barrett2015exploration`, in which the redundancy between a set of :math:`n` variables :math:`X^n = \{ X_1, \ldots, X_n\}` and a target :math:`Y` is defined as:
Within the broad research field of IT a growing body of literature has been produced in the last 20 years about the fascinating concepts of synergy and redundancy. These concepts are well defined in the framework of Partial Information Decomposition, which aims to distinguish different “types” of information that a set of sources convey about a target variable. In this framework, the synergy between a set of variables refers to the presence of relationships between the target and the whole group that cannot be seen when considering separately the single parts. Redundancy instead refers to another phenomena, in which variables contain copies of the same information about the target. Different definition have been provided in the last years about these two concepts, in our work we are going to report the simple case of the Minimum Mutual Information (MMI) :cite:`barrett2015exploration`, in which the redundancy, :class:`hoi.metrics.RedundancyMMI`, between a set of :math:`n` variables :math:`X^n = \{ X_1, \ldots, X_n\}` and a target :math:`Y` is defined as:

.. math::
redundancy (Y, X^n) = min_{i<n} I \left( Y, X_i \right)
When computing the redundancy in this way the definition of synergy follows:
When computing the redundancy in this way the definition of synergy, :class:`hoi.metrics.SynergyMMI`, follows:

.. math::
Expand Down
30 changes: 30 additions & 0 deletions docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,36 @@ @article{battiston2021physics
publisher={Nature Publishing Group UK London}
}

@article{james2011anatomy,
title={Anatomy of a bit: Information in a time series observation},
author={James, Ryan G and Ellison, Christopher J and Crutchfield, James P},
journal={Chaos: An Interdisciplinary Journal of Nonlinear Science},
volume={21},
number={3},
year={2011},
publisher={AIP Publishing}
}

@article{sun1975linear,
title={Linear dependence structure of the entropy space},
author={Sun, TH},
journal={Inf Control},
volume={29},
number={4},
pages={337--68},
year={1975},
publisher={Elsevier}
}

@article{studeny1998multiinformation,
title={The multiinformation function as a tool for measuring stochastic dependence},
author={Studen{\`y}, Milan and Vejnarov{\'a}, Jirina},
journal={Learning in graphical models},
pages={261--297},
year={1998},
publisher={Springer}
}

@article{battiston2020networks,
title={Networks beyond pairwise interactions: Structure and dynamics},
author={Battiston, Federico and Cencetti, Giulia and Iacopini, Iacopo and Latora, Vito and Lucas, Maxime and Patania, Alice and Young, Jean-Gabriel and Petri, Giovanni},
Expand Down

0 comments on commit 9206e9f

Please sign in to comment.