Skip to content

Commit

Permalink
Pretty-print of LinearModelAnalysis
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaudin47 committed Aug 22, 2023
1 parent 1d861ea commit a1f37df
Show file tree
Hide file tree
Showing 7 changed files with 466 additions and 95 deletions.
6 changes: 6 additions & 0 deletions python/doc/bibliography.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Bibliography
*OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation.*
In: Ghanem R., Higdon D., Owhadi H. (eds) Handbook of Uncertainty Quantification. Springer
`pdf <https://arxiv.org/pdf/1501.05242>`__
.. [baron2014] Baron, M. (2019). *Probability and statistics for computer scientists*. CRC press.
.. [beirlant2004] Beirlant J., Goegebeur Y., Teugels J., Segers J.,
*Statistics of extremes: theory and applications*, Wiley, 2004
.. [bhattacharyya1997] Bhattacharyya G.K., and R.A. Johnson, *Statistical
Expand All @@ -39,6 +40,8 @@ Bibliography
.. [burnham2002] Burnham, K.P., and Anderson, D.R. *Model Selection and
Multimodel Inference: A Practical Information Theoretic Approach*, Springer,
2002.
.. [bingham2010] Bingham, N. H., & Fry, J. M. (2010).
*Regression: Linear models in statistics*. Springer.
.. [cambou2017] Mathieu Cambou, Marius Hofert, Christiane Lemieux, *Quasi-Random numbers for copula models*, Stat. Comp., 2017, 27(5), 1307-1329.
`pdf <https://arxiv.org/pdf/1508.03483.pdf>`__
.. [caniou2012] Caniou, Y. *Global sensitivity analysis for nested and
Expand Down Expand Up @@ -87,6 +90,7 @@ Bibliography
`pdf <https://tel.archives-ouvertes.fr/tel-00697026v2/document>`__
.. [fang2006] K-T. Fang, R. Li, and A. Sudjianto. *Design and modeling for
computer experiments.* Chapman & Hall CRC, 2006.
.. [faraway2014] Faraway, J. J. (2014). *Linear models with R*. Second Edition CRC press.
.. [freedman1981] David Freedman, Persi Diaconis, *On the histogram as a density
estimator: L2 theory*, December 1981, Probability Theory and Related Fields.
57 (4): 453–476.
Expand Down Expand Up @@ -293,6 +297,8 @@ Bibliography
.. [ScottStewart2011] W. F. Scott & B. Stewart.
*Tables for the Lilliefors and Modified Cramer–von Mises Tests of Normality.*,
Communications in Statistics - Theory and Methods. Volume 40, 2011 - Issue 4. Pages 726-730.
.. [sen1990] Sen, A., & Srivastava, M. (1990). *Regression analysis: theory, methods, and applications*.
Springer.
.. [simard2011] Simard, R. & L'Ecuyer, P. *Computing the Two-Sided Kolmogorov-
Smirnov Distribution.* Journal of Statistical Software, 2011, 39(11), 1-18.
`pdf <https://www.jstatsoft.org/article/view/v039i11/v39i11.pdf>`__
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,43 @@
"""
Create a linear model
=====================
In this example we create a surrogate model using linear model approximation.
In this example we create a surrogate model using linear model approximation
using the :class:`~openturns.LinearModelAlgorithm` class.
We show how the :class:`~openturns.LinearModelAnalysis` class
can be used to produce the statistical analysis of the least squares
regression model.
"""
# %%
# The following 2-dimensional function is used in this example
# :math:`h(x,y) = 2x - y + 3 + 0.05 \sin(0.8x)`.
# Introduction
# ~~~~~~~~~~~~
#
# The following 2-dimensional function is used in this example:
#
# .. math::
#
# \model(x,y) = 3 + 2x - y
#
# for any :math:`x, y \in \Rset`.
#
# Notice that this model is linear:
#
# .. math::
#
# \model(x,y) = \beta_1 + \beta_2 x + \beta_3 y
#
# where :math:`\beta_1 = 3`, :math:`\beta_2 = 2` and :math:`\beta_3 = -1`.
#
# We consider noisy observations of the output:
#
# .. math::
#
# Y = \model(x,y) + \epsilon
#
# where :math:`\epsilon \sim \cN(0, \sigma^2)` where :math:`\sigma > 0`
# is the standard deviation.
# In our example, we use :math:`\sigma = 0.1`.
#
# Finally, we use :math:`n = 1000` independent observations of the model.
#

# %%
Expand All @@ -14,73 +46,96 @@

# %%
# Generation of the data set
# --------------------------
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# We first generate the data and we add noise to the output observations:

# %%
ot.RandomGenerator.SetSeed(0)
distribution = ot.Normal(2)
distribution.setDescription(["x", "y"])
func = ot.SymbolicFunction(["x", "y"], ["2 * x - y + 3 + 0.05 * sin(0.8*x)"])
input_sample = distribution.getSample(30)
epsilon = ot.Normal(0, 0.1).getSample(30)
sampleSize = 1000
func = ot.SymbolicFunction(["x", "y"], ["3 + 2 * x - y"])
input_sample = distribution.getSample(sampleSize)
epsilon = ot.Normal(0, 0.1).getSample(sampleSize)
output_sample = func(input_sample) + epsilon

# %%
# Linear regression
# -----------------
# ~~~~~~~~~~~~~~~~~
#
# Let us run the linear model algorithm using the `LinearModelAlgorithm` class and get the associated results :
# Let us run the linear model algorithm using the
# :class:`~openturns.LinearModelAlgorithm` class and get the associated
# results:

# %%
algo = ot.LinearModelAlgorithm(input_sample, output_sample)
result = algo.getResult()

# %%
# Residuals analysis
# ------------------
# ~~~~~~~~~~~~~~~~~~
#
# We can now analyse the residuals of the regression on the training data.
# For clarity purposes, only the first 5 residual values are printed.

# %%
residuals = result.getSampleResiduals()
print(residuals[:5])
residuals[:5]

# %%
# Alternatively, the `standardized` or `studentized` residuals can be used:

# %%
stdresiduals = result.getStandardizedResiduals()
print(stdresiduals[:5])
stdresiduals[:5]

# %%
# Similarly, we can also obtain the underyling distribution characterizing the residuals:
# Similarly, we can also obtain the underyling distribution characterizing
# the residuals:

# %%
print(result.getNoiseDistribution())
result.getNoiseDistribution()


# %%
# ANOVA table
# -----------
# Analysis of the results
# ~~~~~~~~~~~~~~~~~~~~~~~
#
# In order to post-process the linear regression results, the `LinearModelAnalysis` class can be used:
# In order to post-process the linear regression results, the
# :class:`~openturns.LinearModelAnalysis` class can be used:

# %%
analysis = ot.LinearModelAnalysis(result)
print(analysis)
analysis

# %%
# The results seem to indicate that the linear hypothesis can be accepted. Indeed, the `R-Squared` value is nearly `1`.
# Furthermore, the adjusted value, which takes into account the data set size and the number of hyperparameters, is similar to `R-Squared`.
# The results seem to indicate that the linear model is satisfactory.
#
# - The basis uses the three functions :math:`1` (i.e. the intercept),
# :math:`x` and :math:`y`.
# - The table of coefficients test if one single coefficient is zero.
# The probability of observing a large value of the T statistics is close to
# zero for all coefficients.
# Hence, we can reject the hypothesis that any coefficient is zero.
# In other words, all the coefficients are significantly nonzero.
# - The `R-Squared` value is close to 1.
# Furthermore, the adjusted value, which takes into account the data set
# size and the number of hyperparameters, is similar to `R-Squared`.
# Most of the variance is explained by the linear model.
# - The F-test tests if all the coefficients are simultaneously zero.
# The `Fisher-Snedecor` p-value is lower than 1%.
# The linear model assumption is accepted.
# - The normality test checks that the residuals have a normal distribution.
# The normality assumption can be rejected if the p-value is close to zero
# (say, lower than 0.05).
# Based on the different statistical tests, we see that all p-values
# are larger than 0.05: the normality assumption is accepted.
#
# We can also notice that the `Fisher-Snedecor` and `Student` p-values detailed above are lower than 1%. This ensures an acceptable quality of the linear model.

# %%
# Graphical analyses
# ------------------
# ~~~~~~~~~~~~~~~~~~
#
# Let us compare model and fitted values:

Expand All @@ -89,7 +144,8 @@
view = viewer.View(graph)

# %%
# The previous figure seems to indicate that the linearity hypothesis is accurate.
# The previous figure seems to indicate that the linearity hypothesis
# is accurate.

# %%
# Residuals can be plotted against the fitted values.
Expand All @@ -107,9 +163,11 @@
view = viewer.View(graph)

# %%
# In this case, the two distributions are very close: there is no obvious outlier.
# In this case, the two distributions are very close: there is no obvious
# outlier.
#
# Cook's distance measures the impact of every individual data point on the linear regression, and can be plotted as follows:
# Cook's distance measures the impact of every individual data point on the
# linear regression, and can be plotted as follows:

# %%
graph = analysis.drawCookDistance()
Expand All @@ -118,27 +176,32 @@
# %%
# This graph shows us the index of the points with disproportionate influence.
#
# One of the components of the computation of Cook's distance at a given point is that point's *leverage*.
# High-leverage points are far from their closest neighbors, so the fitted linear regression model must pass close to them.
# One of the components of the computation of Cook's distance at a given
# point is that point's *leverage*.
# High-leverage points are far from their closest neighbors, so the fitted
# linear regression model must pass close to them.

# %%
graph = analysis.drawResidualsVsLeverages()
view = viewer.View(graph)

# %%
# In this case, there seem to be no obvious influential outlier characterized by large leverage and residual values, as is also shown in the figure below:
# In this case, there seem to be no obvious influential outlier characterized
# by large leverage and residual values, as is also shown in the figure below:
#
# Similarly, we can also plot Cook's distances as a function of the sample leverages:
# Similarly, we can also plot Cook's distances as a function of the sample
# leverages:

# %%
graph = analysis.drawCookVsLeverages()
view = viewer.View(graph)

# %%
# Finally, we give the intervals for each estimated coefficient (95% confidence interval):
# Finally, we give the intervals for each estimated coefficient (95% confidence
# interval):

# %%
alpha = 0.95
interval = analysis.getCoefficientsConfidenceInterval(alpha)
print("confidence intervals with level=%1.2f : " % (alpha))
print("confidence intervals with level=%1.2f: " % (alpha))
print("%s" % (interval))
Loading

0 comments on commit a1f37df

Please sign in to comment.