The package provides functionality to assess calibration for a given
vector of predictions for binary classes on decentralized data. The
basis is the DataSHIELD](https://www.datashield.org/) infrastructure
for distributed computing. This package provides the calculation of the
Brier score as well as
calibration
curves.
In order to calculate the Brier score or calibration curves it is
necessary to have prediction values on the server. For instructions how
to push and predict models see the package
dsPredictBase
. Note
that DataSHIELD uses an option datashield.privacyLevel
to indicate the
minimal amount of numbers required to be allowed to share an aggregated
value of these numbers. Instead of setting the option, we directly
retrieve the privacy level from the
DESCRIPTION
file each time a function calls for it. This options is set to 5 by
default.
At the moment, there is no CRAN version available. Install the development version from GitHub:
remotes::install_github("difuture-lmu/dsCalibration")
It is necessary to register the aggregate methods in the OPAL administration. The assign methods are:
brierScore
calibrationCurve
These methods are registered automatically when publishing the package
on OPAL (see
DESCRIPTION
).
Note that the package needs to be installed at both locations, the server and the analysts machine.
library(DSI)
library(DSOpal)
library(DSLite)
library(dsBaseClient)
library(dsCalibration)
builder = DSI::newDSLoginBuilder()
builder$append(
server = "ibe",
url = "******",
user = "***",
password = "******",
table = "ProVal.KUM"
)
logindata = builder$build()
connections = DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D",
opts = list(ssl_verifyhost = 0, ssl_verifypeer = 0))
### Get available tables:
DSI::datashield.symbols(connections)
### Test data with same structure as data on test server:
dat = read.csv("data/test-kum.csv")
### Model we want to upload:
mod = glm(gender ~ age + height, family = "binomial", data = dat)
### Upload model to DataSHIELD server
pushObject(connections, mod)
predictModel(connections, mod, "pred", "D", predict_fun = "predict(mod, newdata = D, type = 'response')")
DSI::datashield.symbols(connections)
### Calculate brier score:
dsBrierScore(connections, "D$gender", "pred")
### Calculate and plot calibration curve:
cc = dsCalibrationCurve(connections, "D$gender", "pred", 10, 3)
plotCalibrationCurve(cc)
DSI::datashield.logout(conns = connections, save = FALSE)