Table of Contents | |
1. Overview | 2. Data and Variables |
3. Methodology - Bayesian Poisson Modeling - Model Assumptions and Fit |
|
6. Conclusion |
This project investigates the relationship between education level and the number of children for men over 30 years old. It aims to determine if there is a significant difference in the average number of children between men with and without a bachelor's degree.
The dataset includes observations of men over 30, detailing:
- The number of children each man has.
- Whether he possesses a bachelor's degree.
-
$\theta_A$ : Average number of children for men with a bachelor's degree. -
$\theta_B$ : Average number of children for men without a bachelor's degree.
Since our data has limited observations, and the number of children is right-skewed due to external social, economic, and biological factors, we analyze the data with a Poisson distribution due to its suitability for count data. A weak gamma prior is chosen due to its conjugacy with the Poisson likelihood.
The model is defined as follows:
where
-
$Y_A$ : Number of children for men with a degree -
$Y_B$ : Number of children for men without a degree
The prior for θ follows a Gamma(a, b) distribution:
Using Poisson sampling, 5,000 samples of
theta_samples_a <- rgamma(5000, 2 + sum(Y_a), 1 + length(Y_a))
theta_samples_b <- rgamma(5000, 2 + sum(Y_b), 1 + length(Y_b))
theta_diff_samples <- theta_samples_b - theta_samples_a
quantile(theta_diff_samples, probs = c(0.025, 0.5, 0.975)) # 95% credible interval
A 95% credible interval for the difference in average number of children between groups is:
[ (0.15, 0.74) ]
Since this interval does not include zero, we conclude that men without a bachelor's degree tend to have more children than those with a degree. However, the effect size is modest.
To assess the robustness of our findings, we perform a sensitivity analysis by varying the Gamma prior parameters:
-
Hyperparameters:
-
$a_{\theta} = 2$ ,$b_{\theta} = 1$ - Number of posterior samples:
$S = 5000$ - Gamma prior values:
$ab_{\gamma} = {8, 16, 32, 64, 128}$
-
The analysis indicates that since the prior beliefs
A key assumption of the Poisson model is that the mean and variance of the data should be approximately equal. Overdispersion occurs when the variance significantly exceeds the mean, which may indicate the need for an alternative model (e.g., negative binomial regression).
We compute the dispersion statistic:
dispersion_stat <- sum(residuals_pearson^2) / model$df.residual
Results:
- For men with a degree: Dispersion = 1.28
- For men without a degree: Dispersion = 1.37
Since both values are slightly greater than 1, we detect mild overdispersion, meaning that factors beyond education level—such as socioeconomic status, cultural norms, or relationship status—may also influence family size. Future research could explore alternative models, such as a negative binomial regression, to account for this additional variation. However the mild dipersion is not enough to discredit the our model or anyfindings.
To further check if the model appropriately fits the data, we simulate Monte Carlo samples of the t-statistic:
Histograms of the simulated t-statistics for both groups are plotted alongside the observed t-statistic (blue line) to ensure that the distribution of simulated values aligns with the observed data. Since our observed statistic is close to the mode of our expectation, it appears that we have an expected and reliable t-statistic estimate.
# Run the Monte Carlo simulation (1000 samples)
for (s in 1:1000) {
# Sample theta from Gamma distribution
theta1 <- rgamma(1, a + sum_a, a2 + n_a)
# Generate random sample of 10 from Poisson distribution with parameter theta1
y1_mc <- rpois(10, theta1)
# Calculate the t-statistic (mean/sd) for the sample
t_mc <- c(t_mc, mean(y1_mc) / sd(y1_mc))
}
This study finds that education level has a modest association with the number of children men have, with those without a bachelor's degree tending to have slightly more children on average. However, the effect is not as strong as commonly assumed, and mild overdispersion suggests that additional factors contribute to family size differences. This finding challenges the stereotype that less educated individuals tend to have more children. Further research incorporating socioeconomic and cultural variables may provide a more comprehensive understanding of the relationship between education and fertility.