Traditionally, estimators use variance formulas for estimation.
Standard errors are derived from these variances to construct confidence intervals, typically 95%.
Challenges with Traditional Methods
Bias in variance estimates can occur under complex survey conditions.
Difficulties arise in deriving closed-form expressions for variance in surveys involving clustering, stratification, and weighting.
Why Bootstrap?
Advantages of Bootstrap
Bootstrap methods do not rely on variance formulas, reducing potential biases.
These methods use resampling techniques to estimate the distribution of sample statistics directly.
Bootstrap Approach
Involves drawing repeated samples from the survey data with replacement.
Each resample is used to calculate estimates, building an empirical distribution of the estimator.
Implementing Bootstrap in R
Setting Up in R
Utilize the survey package to create a survey design object.
Convert the design object to a bootstrap design for resampling.
Bootstrap Computation
Generate bootstrap replicates of the statistic of interest.
Analyze the variability and stability of these estimates to gauge their reliability.
# Create a vector of size 10obs <-c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50)# Define the number of bootstrap samplesn_bootstraps <-50# Generate bootstrap samples and compute their meansset.seed(123) # For reproducibilitybootstrap_means <-replicate(n_bootstraps, { sample_data <-sample(obs, replace =TRUE)mean(sample_data)})# Calculate the original meanoriginal_mean <-mean(obs)original_mean
[1] 27.5
bootstrap_means # collection of bootstrapped means
[[1]]
mean SE
api00 634.38 24.291
[[2]]
mean SE
api00 642.17 31.605
[[3]]
mean SE
api00 637.4 24.491
[[4]]
mean SE
api00 638.52 20.391
[[5]]
mean SE
api00 638.14 24.211
[[6]]
mean SE
api00 647.31 23.668
# Convert the list of estimates to a vectorestimate_values <-sapply(replicate_estimates, function(est) coef(est))estimate_values %>%head(100)
# Calculate mean and SEbootstrap_mean <-mean(estimate_values)bootstrap_se <-sd(estimate_values) # Plotting the bootstrap estimateslibrary(ggplot2)ggplot(data =data.frame(estimate_values), aes(x = estimate_values)) +geom_dotplot(binwidth =0.8, stackdir ="up", dotsize =0.5, fill ="maroon", col ="gold") +geom_density(aes(y = ..scaled..), fill ="lightblue", alpha =0.2, adjust =1) +geom_vline(xintercept = bootstrap_mean, color ="blue", linetype ="dashed") +annotate("text", x = bootstrap_mean +5, y =0.8, label =sprintf("Mean = %.2f\nSE = %.2f",bootstrap_mean, bootstrap_se), color ="blue") +xlab("Bootstrap Estimates") +ggtitle("Distribution of API Score Estimates")
The American Housing Survey tracks housing characteristics in the U.S., including ownership costs and house values across 47 metropolitan statistical areas (MSAs). The 2002 survey sampled 13 MSAs, providing data on typical monthly ownership costs and house values for 2002 and 1994.
Mean Value 2002: Simulate a bootstrap confidence interval for the mean typical house value in 2002.
Median Value 2002: Simulate a bootstrap confidence interval for the median typical house value in 2002.
Ratio of Mean Values 2002 to 1994: Simulate a bootstrap confidence interval for the ratio of mean house values from 2002 compared to 1994.