diff --git a/docs/overview/index.rst b/docs/overview/index.rst index 2f2408df..f3b40c7c 100644 --- a/docs/overview/index.rst +++ b/docs/overview/index.rst @@ -4,7 +4,7 @@ Overview This section contains References and Mathematical background about information-theory and HOI .. toctree:: - :maxdepth: 3 + :maxdepth: 4 ovw_theory ovw_refs diff --git a/docs/overview/ovw_theory.rst b/docs/overview/ovw_theory.rst index d44a3d56..6d6574c2 100644 --- a/docs/overview/ovw_theory.rst +++ b/docs/overview/ovw_theory.rst @@ -9,7 +9,7 @@ Core information theoretic measures In this section, we delve into some fundamental information theoretic measures, such as Shannon Entropy and Mutual Information (MI), and their applications in the study of pairwise interactions. Besides the fact that these measures play a crucial role in various fields such as data science and machine learning, as we will see in the following parts, they serve as the building blocks for quantifying information and interactions between variables at higher-orders. Measuring Entropy ------------------ +***************** Shannon entropy is a fundamental concept in IT, representing the amount of uncertainty or disorder in a random variable :cite:`shannon1948mathematical`. Its standard definition for a discrete random variable :math:`X`, with probability mass function :math:`P(X)`, is given by: @@ -29,7 +29,7 @@ A more complicated and common scenario is the one of continuous variables. To es Note that all the functions mentioned in the following part are based on the computation of entropies, hence we advise care in the choice of the estimator to use. Measuring Mutual Information (MI) ---------------------------------- +********************************* One of the most used functions in the study of pairwise interaction is the Mutual Information (MI) that quantifies the statistical dependence or information shared between two random variables :cite:`shannon1948mathematical, watanabe1960information`. It is defined mathematically using the concept of entropies. For two random variables X and Y, MI is given by: @@ -43,26 +43,70 @@ Where: :math:`H(X,Y)` is the joint entropy of :math:`X` and :math:`Y`. MI between two variables, quantifies how much knowing one variable reduces the uncertainty about the other and measures the interdependency between the two variables. If they are independent, we have :math:`H(X,Y)=H(X)+H(Y)`, hence :math:`MI(X,Y)=0`. Since the MI can be reduced to a signed sum of entropies, the problem of how to estimate MI from continuous data can be reconducted to the problem, discussed above, of how to estimate entropies. An estimator that has been recently developed and presents interesting properties when computing the MI is the Gaussian Copula estimator :cite:`ince2017statistical`. This estimator is based on the statistical theory of copulas and is proven to provide a lower bound to the real value of MI, this is one of its main advantages: when computing MI, Gaussian copula estimator avoids false positives. Play attention to the fact that this can be mainly used to investigate relationships between two variables that are monotonic. -From pairwise to higher-order interactions -++++++++++++++++++++++++++++++++++++++++++ +From pairwise to higher-order interactions +++++++++++++++++++++++++++++++++++++++++++ -The information theoretic metrics involved in this work are all based in principle on the concept of Shannon entropy and mutual information. Given a set of variables, a common approach to investigate their interaction is by comparing the entropy and the information of the joint probability distribution of the whole set with the entropy and information of different subsets. This can be done in many different ways, unveiling different aspects of HOIs :cite:`timme2014synergy, varley2023information`. The metrics implemented in the toolbox can be divided in two main categories: a group of metrics focus on the relationship between a set of source variables and a target one and another group measures the interactions within a set of variables. In the following part we are going through all the metrics that have been developed in the toolbox, providing some insights about their theoretical foundation and possible interpretations. +The information theoretic metrics involved in this work are all based in principle on the concept of Shannon entropy and mutual information. Given a set of variables, a common approach to investigate their interaction is by comparing the entropy and the information of the joint probability distribution of the whole set with the entropy and information of different subsets. This can be done in many different ways, unveiling different aspects of HOIs :cite:`timme2014synergy, varley2023information`. The metrics implemented in the toolbox can be divided in two main categories: a group of metrics measures the interaction behaviour prevailing within a set of variable, network behaviour, another group of metrics instead focuses on the relationship between a set of source variables and a target one, network encoding. In the following parts we are going through all the metrics that have been developed in the toolbox, providing some insights about their theoretical foundation and possible interpretations. -O-information +Network behaviour +***************** + +Total correlation +----------------- + +Total correlation, :class:`hoi.metrics.TC`, is the oldest exstension of mutual information to +an arbitrary number of variables :cite:`watanabe1960information, studeny1998multiinformation`. It is defined as: + +.. math:: + + TC(X^{n}) &= \sum_{j=1}^{n} H(X_{j}) - H(X^{n}) \\ + +The total correlation quantifies the strength of collective constraints ruling the systems, it is sentive to information shared between single variables and it can be associated with redundancy. + + +Dual Total correlation +---------------------- + +Dual total correlation, :class:`hoi.metrics.DTC`, is another extension of mutual information to +an arbitrary number of variables, also known as binding information and excess entropy, :cite:`sun1975linear`. It quatifies the part of the joint entropy that is shared by at least two or more variables in the following way: + +.. math:: + + DTC(X^{n}) &= H(X^{n}) - \sum_{j=1}^{n} H(X_j|X_{-j}^{n}) \\ + &= \sum_{j=1}^{n} H(X_j) - (n-1)H(X^{n}) + +where :math:`\sum_{j=1}^{n} H(X_j|X_{-j}^{n})` is the entropy of :math:`X_j` not shared by any other variable. This measure is higher in systems in which lower order constraints prevails. + +S information ------------- -One prominent metric that has emerged in the pursuit of higher-order understanding is the O-information. Introduced by Rosas in 2019 :cite:`rosas2019oinfo`, O-information elegantly addresses the challenge of quantifying higher-order dependencies by extending the concept of mutual information. Given a multiplet of :math:`n` variables, :math:`X^n = \{ X_0, X_1, …, X_n \}`, its formal definition is the following: +The S-information (also called exogenous information), :class:`hoi.metrics.Sinfo`, is defined +as the sum between the total correlation (TC) plus the dual total +correlation (DTC), :cite:`james2011anatomy`: .. math:: - \Omega(X^n)=(n-2)H(X^n)+\sum_{i=1}^n \left[ H(X_i) - H(X_{-i}^n) \right] + \Omega(X^{n}) &= TC(X^{n}) + DTC(X^{n}) \\ + &= nH(X^{n}) + \sum_{j=1}^{n} [H(X_{j}) + H( + X_{-j}^{n})] -Where :math:`X_{-i}` is the set of all the variables in :math:`X^n` apart from :math:`X_i`. The O-information can be written also as the difference between the total correlation and the dual total correlation and reflects the balance between higher-order and lower-order constraints among the set of variables of interest. It is shown to be a proxy of the difference between redundancy and synergy: when the O-information of a set of variables is positive this indicates redundancy, when it is negative, synergy. In particular when working with big data sets it can become complicated +It is sensitive to both redundancy and synergy, quantifying the total ammount of constraints ruling the system under study. + +O-information +------------- + +One prominent metric that has emerged in the pursuit of higher-order understanding is the O-information, :class:`hoi.metrics.Oinfo`. Introduced by Rosas in 2019 :cite:`rosas2019oinfo`, O-information elegantly addresses the challenge of quantifying higher-order dependencies by extending the concept of mutual information. Given a multiplet of :math:`n` variables, :math:`X^n = \{ X_0, X_1, …, X_n \}`, its formal definition is the following: + +.. math:: + + \Omega(X^n)= (n-2)H(X^n)+\sum_{i=1}^n \left[ H(X_i) - H(X_{-i}^n) \right] + +Where :math:`X_{-i}` is the set of all the variables in :math:`X^n` apart from :math:`X_i`. The O-information can be written also as the difference between the total correlation and the dual total correlation and reflects the balance between higher-order and lower-order constraints among the set of variables of interest. It is shown to be a proxy of the difference between redundancy and synergy: when the O-information of a set of variables is positive this indicates redundancy, when it is negative, synergy. In particular when working with big data sets it can become complicated Topological information ----------------------- -The topological information, a generalization of the mutual information to higher-order, :math:`I_k` has been introduced and presented to test uniformity and dependence in the data :cite:`baudot2019infotopo`. Its formal definition is the following: +The topological information, :class:`hoi.metrics.InfoTopo`, a generalization of the mutual information to higher-order, :math:`I_k` has been introduced and presented to test uniformity and dependence in the data :cite:`baudot2019infotopo`. Its formal definition is the following: .. math:: @@ -70,10 +114,13 @@ The topological information, a generalization of the mutual information to highe Note that :math:`I_2(X,Y) = MI(X,Y)` and that :math:`I_3(X,Y,Z)=\Omega(X,Y,Z)`. As the O-information this function can be interpreted in terms of redundancy and synergy, more into details when it is positive it indicates that the system is dominated by redundancy, when it is negative, synergy. +Network encoding +**************** + Gradient of O-information ------------------------- -The O-information gradient has been developed to study the contribution of one or a set of variables to the O-information of the whole system :cite:`scagliarini2023gradients`. In this work we proposed to use this metric to investigate the relationship between multiplets of source variables and a target variable. Following the definition of the O-information gradient of order 1 we have: +The O-information gradient, :class:`hoi.metrics.GradientOinfo`, has been developed to study the contribution of one or a set of variables to the O-information of the whole system :cite:`scagliarini2023gradients`. In this work we proposed to use this metric to investigate the relationship between multiplets of source variables and a target variable. Following the definition of the O-information gradient of order 1 we have: .. math:: @@ -84,7 +131,7 @@ This metric does not focus on the O-information of a group of variables, instead Redundancy-Synergy index (RSI) ------------------------------ -Another metric, proposed by Gal Chichek et al in 2001 :cite:`chechik2001group`, is the Redundancy-Synergy index, developed as an extension of mutual information, aiming to characterize the statistical interdependencies between a group of variables :math:`X^n` and a target variable :math:`Y`, in terms of redundancy and synergy, it is computed as: +Another metric, proposed by Gal Chichek et al in 2001 :cite:`chechik2001group`, is the Redundancy-Synergy index, :class:`hoi.metrics.RSI`, developed as an extension of mutual information, aiming to characterize the statistical interdependencies between a group of variables :math:`X^n` and a target variable :math:`Y`, in terms of redundancy and synergy, it is computed as: .. math:: @@ -95,13 +142,13 @@ The RSI is designed to measure directly whether the sum of the information provi Synergy and redundancy (MMI) ---------------------------- -Within the broad research field of IT a growing body of literature has been produced in the last 20 years about the fascinating concepts of synergy and redundancy. These concepts are well defined in the framework of Partial Information Decomposition, which aims to distinguish different “types” of information that a set of sources convey about a target variable. In this framework, the synergy between a set of variables refers to the presence of relationships between the target and the whole group that cannot be seen when considering separately the single parts. Redundancy instead refers to another phenomena, in which variables contain copies of the same information about the target. Different definition have been provided in the last years about these two concepts, in our work we are going to report the simple case of the Minimum Mutual Information (MMI), proposed by Barrett in :cite:`barrett2015exploration`, in which the redundancy between a set of :math:`n` variables :math:`X^n = \{ X_1, \ldots, X_n\}` and a target :math:`Y` is defined as: +Within the broad research field of IT a growing body of literature has been produced in the last 20 years about the fascinating concepts of synergy and redundancy. These concepts are well defined in the framework of Partial Information Decomposition, which aims to distinguish different “types” of information that a set of sources convey about a target variable. In this framework, the synergy between a set of variables refers to the presence of relationships between the target and the whole group that cannot be seen when considering separately the single parts. Redundancy instead refers to another phenomena, in which variables contain copies of the same information about the target. Different definition have been provided in the last years about these two concepts, in our work we are going to report the simple case of the Minimum Mutual Information (MMI) :cite:`barrett2015exploration`, in which the redundancy, :class:`hoi.metrics.RedundancyMMI`, between a set of :math:`n` variables :math:`X^n = \{ X_1, \ldots, X_n\}` and a target :math:`Y` is defined as: .. math:: redundancy (Y, X^n) = min_{i