Skip to content

Cuatro olas de SARS-CoV-2 en Bogotá: un análisis estadístico comparativo

Notifications You must be signed in to change notification settings

ntorresd/covid19-waves-bogota

 
 

Repository files navigation

Four waves of SARS-CoV-2 in Bogotá: a detailed retrospective statistical comparison

In this repository we characterized and compared, using statistical tools, the first four consecutive epidemiological waves in the Bogotá, Colombia, that occurred between March 2020 and April 2022. We used the report of confirmed cases from the District Health Secretary of Bogotá , and the genomic surveillance data published by the Global Initiative on Sharing All Influenza Data (GISAID). We focused mainly on the estimation of:

  1. The instantaneous reproduction number R(t).
  2. The transmissibility advantage between variants.
  3. The delay times for onset-to-hospitalisation, onset-to-ICU, onset-to-death, hospital stay, and ICU stay.
  4. The characterization of severe outcomes using the severe ratios: Hospitalisation/ICU Case Rate (H/ICU-CR), Case Fatality Ratio (CFR), Hospitalisation/ICU Fatality Rate (H/ICU-FR) per age group and wave; and the percentages of Hospitalisation, ICU admission and Deaths per age group and wave.

Data sources

  1. Report of confirmed cases from the District Health Secretary of Bogotá (Private database - last update: 2022-08-02)

  2. Genomic surveillance data published by the Global Initiative on Sharing All Influenza Data (GISAID) (Public database - last update: 2022-08-02)

Methods

  1. Reproduction number R(t): we estimated the time-varying instantaneous reproduction number R(t) using the epidemiological package for R: EpiEstim

  2. Transmissibility advantage: we evaluated the transmissibility advantage using a multinomial logistic regression with a single explanatory variable $t$ given by:

$$f(v,t)=\alpha + \beta_{v,0}t$$

In the previous expression $\alpha$ is the intercept of the model and $\beta_{v,0}$ is the variant-specific parameter for the time covariate, which was computed with respect to a reference (or pivot) variant. In order to compute the transmissibility advantage of $v$ with respect to $0$ from $\beta_{v,0}$ we used the following expression:

$$ T_{v,0} = \exp(\frac{\beta_{v,0}}{7} * g_0) $$

Where $g_0$ is the generation time of the pivot variant. Notice that we divided $\beta_{v,0}$ by 7 to convert time scale of the coefficients to daily.

With these coefficientes we computed the relative transmissibiliy advantage between two variants $w$ and $v$ as:

$$ T_{w,v} = \frac{T_{w,0}}{T_{v,0}}$$

The multinomial regressions were run in stan using the library PyStan for python.

  1. Probability distributions of delay times: we used a bayesian hierarchical model adapted from this repository. We fitted initial parameters for the district level and then sample the parameters for each wave as follows:

$$q_{i,j} \sim N(q_{i, Bog},\sigma_i)$$

where $i=1,2,3, .., n$ runs over the $n$ parameters of the PDF, $j=1,2,3,4$ is the number of wave, $q_{i, Bog}$ is the value of the i-th parameter of the PDF estimated for Bogotá and $\sigma_i \sim N^+(0,1)$ is the standard deviation which is assumed to be distributed as a truncated normal distribution. We used a Hamiltonian Monte Carlo (HMC) algorithm implemented in Stan, setting four chains of 2000 iterations (1000 for warming up and 1000 for sampling). These models were also run using PyStan.

  1. Severe outcomes: all the results of this section where calculated for subpopulations $(i,g)$ defined by the waves $i \in$ { $1,2,3,4$ } and the age groups $g \in$ { $all, 0-9, 9-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+$}.

For the CFR and H/ICU-FR we used the following formula:

$$XCR_{i,g} =\frac{X_{i,g}}{C_{i,g}}$$

Where $X_{i,g} \in$ { $H_{i,g} ,ICU_{i,g}$ } is the cumulative number of hospitalised patients and the cumulative number of patients at ICU; and $C_{i,g}$ is the cumulative number of cases per subpopulation.

For the H/ICU-CR we used:

$$XFR_{i,g} =\frac{D|X_{i,g}}{X_{i,g}}$$

In this case $X_{i,g} \in$ { $C_{i,g} , H_{i,g} ,ICU_{i,g}$ } and $D|X_{i,g}$ is the cumulative number of deaths given that they belong to the population $X_{i,g}$.

For the percentages we used:

$$Y_{i,g} = 100 \times \frac{Y_{i,g}}{Y_i}$$

Where $Y_{i,g} \in$ { $H_{i,g}, ICU_{i,g}, D_{i,g}$ } is the number of cases for each outcome per wave and age group and $Y_i \in$ { $H_i, ICU_i, D_i$ } is the total number of cases for each outcome per wave.

In all the cases we estimated a confidence interval of 95% using binomial proportions.

Repository description

All the folders, except plots and tables, are organized following the same structure: a scripts subfolder that contains the necessary codes to run the models and the analysis, and an outputs subfolder that contains the results.

  1. The folder epidemiological_distributions contains the following scripts:
  1. The folder genomics contains the following scripts:
  1. The folder rt contains the following script:
  • rt.R: R script to estimate the Reproduction number.
  1. The folder severe_outcomes contains the following scripts:
  • percentages.py: python script to calculate the percentages.
  • proportions.py: python script to calculate the binomial proportions.
  • rates.py: python script to calculate the CFR, HCR, ICU-CR, HFR and ICU-FR.
  • utilities_severity.py: python script with tools used for the severity analysis.
  1. The folder waves contains the following scripts:
  • roots_confirmed_cases.py: python script to find the roots of the epidemic curve using gaussian smoothing and interpolation.
  • process_waves.py: python script to process the waves after visual inspection of the roots.
  • utilities_waves.py: python script with tools used for determining the waves.
  1. The folder tables contains the following scripts:

This scripts import and call functions from the utilities of each section and the results contained in the corresponding output subfolder.

  1. The folder plots contains the following scripts:

Config file

This YAML file contains information about the models, the paths used in the scripts and the roots selected for the waves. It is called in the beginning of all the scripts.

Authors and contributors

davidsantiagoquevedo, ntorresd, cwhittaker1000, zmcucunuba.

About

Cuatro olas de SARS-CoV-2 en Bogotá: un análisis estadístico comparativo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.2%
  • R 7.5%
  • Stan 5.3%