Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to get COLLAPSE_SMALLEST() to work consistently for egocentric data. #57

Open
krivit opened this issue Jan 30, 2021 · 0 comments

Comments

@krivit
Copy link
Member

krivit commented Jan 30, 2021

While sorting out statnet/ergm#202, it turns out that even after updating ergm.ego's vertex attribute extraction and term defaults for consistency with ergm, COLLAPSE_SMALLEST() can still produce strange results if, in particular, frequencies of categories differ between egos and alters.

For example,

set.seed(0)
library(ergm.ego)
#> ergm: version 3.11.0-6010, created on 2021-01-30
#> ergm.ego: version 0.6.0-569, created on 2021-01-30
library(ergm)
library(magrittr)

n <- 100
e <- 150
ds <- c(10,15,5,20)

y <- network.initialize(n, directed=FALSE)
y %v% "a" <- sample(1:3+6,n,replace=TRUE)
aM <- matrix(FALSE, 3, 3)
aM[1,1] <- aM[1,3] <- TRUE
y %v% "b" <- sample(letters[1:4],n,replace=TRUE)
y %v% "c" <- sample(runif(10),n,replace=TRUE)
y %v% "d" <- runif(n)
y <- san(y~edges+degree(0:3), target.stats=c(e,ds))
y.e <- as.egodata(y)


f <- ~ nodefactor(COLLAPSE_SMALLEST("b",2, "x")) + mm(a~(~b) %>% COLLAPSE_SMALLEST(2,"x"), levels2=TRUE)

f.y <- statnet.common::nonsimp_update.formula(f, y~.)
environment(f.y) <- globalenv()
f.y.e <- statnet.common::nonsimp_update.formula(f, y.e~.)
environment(f.y.e) <- globalenv()

(f.y.s <- summary(f.y))
#> nodefactor.b.d nodefactor.b.x    mm[a=7,b=a]    mm[a=8,b=a]    mm[a=9,b=a] 
#>             67            163             20             25             25 
#>    mm[a=7,b=d]    mm[a=8,b=d]    mm[a=9,b=d]    mm[a=7,b=x]    mm[a=8,b=x] 
#>             21             24             22             48             68 
#>    mm[a=9,b=x] 
#>             47
(f.y.e.s <- summary(f.y.e))
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> nodefactor.b.b nodefactor.b.c nodefactor.b.d nodefactor.b.x    mm[a=7,b=a] 
#>           44.0           37.5           33.5          150.0           10.0 
#>    mm[a=8,b=a]    mm[a=9,b=a]    mm[a=7,b=b]    mm[a=8,b=b]    mm[a=9,b=b] 
#>           12.5           12.5           12.0           19.5           12.5 
#>    mm[a=7,b=c]    mm[a=8,b=c]    mm[a=9,b=c]    mm[a=7,b=d]    mm[a=8,b=d] 
#>           12.0           14.5           11.0           10.5           12.0 
#>    mm[a=9,b=d]    mm[a=7,b=x]    mm[a=8,b=x]    mm[a=9,b=x] 
#>           11.0           44.5           58.5           47.0
stopifnot(all.equal(f.y.s,f.y.e.s))
#> Error: f.y.s and f.y.e.s are not equal:
#>   Names: 11 string mismatches
#>   Numeric: lengths (11, 19) differ

Created on 2021-01-30 by the reprex package (v1.0.0)

This happens because among the egos, factor "b" has one set of most frequent levels, whereas for the alters, it's another, and they get pooled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant