Skip to content

Multivariate Gaussian Subspatial Regression R package

License

Notifications You must be signed in to change notification settings

victorvicpal/MGSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multivariate Gaussian Subspatial Regression

This repository contains the required functions to apply the MGSR.


Table of Contents

  1. Summary
  2. Algorithm
  3. Installation
  4. Reference
  5. Example
  6. How to cite

Summary

Our proposal is a mixture between Factorial Techniques and Gaussian Processes.

Factorial techniques (Biplot, PCA, CA ...) are useful to represent our observations in terms of unobserved variables called factors. All these techniques provide a set of coordinates linked to the observations, which display information of our analyzed variables. These kind of procedures are merely descriptive and have a low predictive power.

On the other hand, Gaussian Processes are statistical methods where observations occur in a continuous domain (mainly time or space). Furthermore, variables have a multivariate normal distribution. Gaussian Processes use similarity between points to predict the value in an unobserved point.

The MGSR applies a Gaussian Process Regression to the created subspatial of the Factorial Technique. We make use of this subspace to simulate a continuous domain that permit the application of Gaussian Processes, such as Cokriging.

Algorithm

  1. Factorial Technique
  2. Cross-variogram
  3. Linear Model of Coregionalization
  4. Subspatial Grid
  5. Cokriging

Note:

  • Cross-variograms usually follows a "Power Distribution" due to the small scale of Factorial Techniques.
  • Unlike geostatistical analyses, we don't have a real field where boundaries restrict our study. On the other hand, this aspect is more positive than negative because we can portray a more simple grid without losing information.

Installation

install.packages('devtools')
library(devtools)
install_github("victorvicpal/MGSR")
library(MGSR)

Reference

DOI_software DOI_article1

Example

Iris Data

data(iris)

Data Exploration

summary(iris)
apply(iris[which(iris$Species=='versicolor'),1:4],2,function(x,y) plot(density(x))) #density function

"Versicolor" Train/subset

Versicolor <- iris[which(iris$Species=='versicolor'),-5]

ind_subTrain <- sample(50,10)

subTrain <- Versicolor[ind_subTrain,]
Train <- Versicolor[-ind_subTrain,]

Data Standardization

means_train <- apply(Train,2,mean)
sd_train <- apply(Train,2,sd)
Train_st <- Train

for (i in 1:length(Train[1,]))
{Train_st[,i] <- (Train[,i]-means_train[i])/sd_train[i]}

Principal Component Analysis

PC_train <- princomp(Train_st)
biplot(PC_train)

CrossVariogram

CV_train <- crossvariogram(as.data.frame(PC_train$scores[,1:2]),as.data.frame(Train_st),10)
plot.crossvariogram(CV_train)

lmc fitting

Tip: Range value may vary. Check different values within the "Power" function.

RES_train <- lmc(CV_train,'Pow',1.7)
plot.crossvariogram(CV_train,RES_train)

Grid

xygrid <- GRID_MGSR(as.data.frame(PC_train$scores[,1:2]),0.05)

Cokriging

Z_train_st <- cokrig(RES_train,xygrid)
Z_train <- Z_train_st

for (i in 1:length(Train[1,]))
{Z_train[,i+2] <- Z_train_st[,i+2]*sd_train[i]+means_train[i]}

Predicting Subset values

ind_pred <- apply(dist2(subTrain[,1:4],Z_train[,3:6]),1,which.min)
residuales <- subTrain[,1:4]-Z_train[ind_pred,3:6]

par(mfrow=c(2,2))
apply(residuales,2,function(x) plot(density(x)))

for (i in 1:4)
{
  qqnorm(residuales[,i])
  qqline(residuales[,i])
}

Try to fit a model with virginica and setosa species.

Citation

@book{mgsr,
      author    = "Vicente-Palacios, V",
      publisher = "victorvicpal/MGSR",
      year      = "2016",
      doi       = "(http://doi.org/10.5281/zenodo.264102)",
      title     = "Multivariate Gaussian Subspatial Regression"}