Analysis of Predictive inference with jackknife+
We gathered data from California Cooperative Oceanic Fisheries Invesitgations (CalCOFI) from the CalCOFI website. The downloaded CSV files (for the bottle and cast data) should go in data/raw/CalCOFI_Database_194903-202001_csv_22Sep2021/
to work with our written scripts. From here, we joined and processed the data to include fully-present observations across our response variable (salinity, denoted as Salnty
) and 20 predictor variables: Distance
, Bottom_D
, Wind_Spd
, Depthm
, T_degC
, O2ml_L
, STheta
, O2Sat
, Oxy_µmol/Kg
, ChlorA
, Phaeop
, PO4uM
, SiO3uM
, NO2uM
, NO3uM
, NH3uM
, DarkAs
, MeanAs
, R_DYNHT
, and R_Nuts
. Please see the CalCOFI website for a codebook explaining each feature.
After processing, we were left with 6,102 complete observations. Similar to Barber, et al., we had a training set of 200 observations with the rest as our test set. We wanted to see how the jackknife+ would perform with a smaller number of predictors. Beyond this, though, we aimed to further test the generalizability of this method by constructing two different models: LASSO (with a hyperparameter value identical to the one proposed for the ridge regression simulations, and a boosting regressor (both used the default arguments in the model object from scikit-learn). Upon running these trials, we noticed that the performance remained similar across the models and interval types: the jackknife+ slightly outperformed the jackknife and met the coverage rate