Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical (spatial) analysis #9

Open
BScheliga opened this issue Mar 18, 2021 · 7 comments
Open

Statistical (spatial) analysis #9

BScheliga opened this issue Mar 18, 2021 · 7 comments
Assignees

Comments

@BScheliga
Copy link
Collaborator

BScheliga commented Mar 18, 2021

I am just going to make a start here:

Based on our current aims see #8

We want to know, if there is an relationship between Social Distancing Score for Grampian (SDS-G) and Covid related variables (CRV) (see #8 for details).

The SDS-G and the CRV data is spatially connected through data zones. Hence, I would suggest in the first instance a correlation analysis.

I don't see the point of the I believe in the meeting suggested spatial auto-correlation analysis, yet. To my understanding, it would only describe the spatial relationship between the locations, their respective values and the neighbouring values of one variable. Spatial auto-correlation would answer question like e.g. is the SDS-G value in data zone Z high, because the neighbouring data zones have high SDS-G values as well? spatial-autocorrelation

@will-ball
Copy link
Contributor

I clicked the wrong button! Oops

@Zeiou
Copy link

Zeiou commented Mar 19, 2021

Because if you plot the map, which makes the data 3D not 2D anymore. It adds latitude Longitude to the data. It will be fine, if you just analyse the data without spatial information, you can just use the correlation matrix. But since you add this location information to the data, the correlation matrix is not enough anymore.

For example, you can analyse the relationship between x1... xn just by correlation matrix. However, when mapping you assign where the x1...xn is in the real world. It makes the data contains spatial information. You sure can just go ahead without using spatial statistics analysis, but it will waste information and there is no point to plot the map.

@BScheliga
Copy link
Collaborator Author

@Zeiou maybe it makes sense to discuss specific methods, do you already have a method in mind?

But since you add this location information to the data, the correlation matrix is not enough anymore.

&

but it will waste information and there is no point to plot the map.

The location information isn't really "wasted" though, right? In case of the correlation matrix, you would need the location information of the data do a correlation matrix. As the location information specifies which values belong to which location.

@Zeiou
Copy link

Zeiou commented Mar 22, 2021

@BScheliga I did and learned this analysis before and my master project is also spatial statistics. I may not expert of it, but I think I do have some experience. The data cannot just go ahead to analysis without checking the spatial autocorrelation, because with location, it is 3D not 2D anymore. It need to be checked by the Moran's I test. If only using the normal way to do it, people can question what we have done and it will make any results of project weak. That's why I say it is safer.

The spatial statistic:
After mapping the response and predictors, if the map show clusters, then it will be helpful to check whether the residuals of the model have spatial autocorrelation by Moran’s I statistic. Because when an important spatially autocorrelated covariate is unmeasured, the residuals will present spatial autocorrelation. And it will induce spatial autocorrelation into the response. The most common remedy is to add a set of spatially autocorrelated random effects into the linear predictor, as part of a Bayesian hierarchical model. The random effects are represented with a conditional autoregressive prior, which induces spatial autocorrelation through the neighbourhood structure of the areal units (Lee 2013).

The Moran's I :
https://en.wikipedia.org/wiki/Moran%27s_I

If there is no spatial autocorrelation, we can go ahead with the regular model. But, looking at the maps. You can see there are clusters, and it is not randomly distribution.

@BScheliga
Copy link
Collaborator Author

I may not expert of it

@Zeiou no worries, not an expert either especially regarding stats and its terminology. Hence, my asking. I am getting caught off guard sometimes by the term “predictor” and I must apologise, I believe I did not quite understood your first reply. But I think I am slowly getting to the same page.
See, if I am there. The rough plan is:

  1. looking at the relationship between SDS-G and COVID-19 testing demand, PCR positive incidence, Case fatality rates, Certified COVID-19 mortality (See Research Questions and aims #8 V2 for details) (maybe using correlation matrix) and see how the relationship can be described (e.g. liniear model, generalised additive model) [1]
  2. Calculating the residuals of the models
  3. Auto-correlations (Moran’s I) on the residuals of the models describing the relationships

Do you reckon a leave one out cross validation (LOOCV) [2] of the models describing the relationships makes sense? I think, it would allow us to understand how much the SDS-G values drive the relationship.

the areal units (Lee 2013)

Could link the Lee 2013 publication or give the full details?

Reference:
[1]http://environmentalcomputing.net/intro-to-gams/ (accessed 23/03/2021)
[2] Arlot and Celisse 2010

@Zeiou
Copy link

Zeiou commented Mar 23, 2021

@BScheliga Sorry, I may not explain things very well. I think this two link is not for the areal unit modelling. I got the lab material in spatial course. Maybe later I can share screen explain how to build model in one of the lab meeting? I also got the code, so it will be very handy to use.

For the Lee 2013 paper: https://www.jstatsoft.org/article/view/v055i13

For the R code, it will require CARBayes package.
The R document: https://cran.r-project.org/web/packages/CARBayes/CARBayes.pdf

@Zeiou
Copy link

Zeiou commented Mar 23, 2021

@BScheliga Please see the Covid-19-model-Scotland in the GitHub page: https://github.com/duncanplee

Professor Duncan Lee is the professor in the spatial statistics class I took when I was in Glasgow univeristy last year. He is also the author of this R package and expert in spatial statistics. I saw in his GitHub page, the Covid-19-model-Scotland is areal unit modelling.

And I think the lab material he used in spatial course for areal unit modelling is: https://github.com/duncanplee/Spatio-temporal-modelling-tutorials/blob/master/Pneumonia%20mortality%20example.R

@dblana dblana mentioned this issue Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants