Day 13

MATH 313: Survey Design and Sampling

Bastola

Recap: Stratified Sampling

  • Objective: Leverage stratified random sampling to estimate population characteristics more accurately.
  • Approach: Divide population into homogeneous groups (strata) to reduce sampling error.
  • Application: Widely used in surveys and research where population subgroups are to be measured with greater precision.

Stratification Explained

  • Strata: \(L\) distinct groups within the population, each representing key sub-populations.
  • Sizes:
    • Total population: \(N = \sum_{i=1}^L N_i\)
    • Each stratum: \(N_i\) (sub-population size), \(n_i\) (sample size)
  • Metrics:
    • Sample mean for stratum \(i\): \(\bar{Y}_i\)
    • Total for stratum \(i\): \(\tau_i\)
    • Population total: \(\tau = \sum_{i=1}^L \tau_i\)

Estimation Methodology


  • Population Mean: \(\bar{y}_{st} = \frac{1}{N} \sum_{i=1}^L N_i \bar{Y}_i\), a weighted average where weights are proportional to stratum size.
  • Population Total: \(\hat{\tau}_{st} = N \cdot \bar{y}_{st}\), scaled up mean to reflect total population size.

Variance and Precision

  • Key Concept: Estimating variance helps assess the precision and reliability of our estimates.
  • Variance of Mean Estimation: \[\hat{V}(\bar{y}_{st}) = \frac{1}{N^2} \sum_{i=1}^L N_i^2 \left(1 - \frac{n_i}{N_i}\right) \frac{s_i^2}{n_i}\]
  • Variance of Total Estimation: \[\hat{V}(\hat{\tau}_{st}) = N^2 \cdot \hat{V}(\bar{y}_{st})\]

Degree of Freedom and Critical Values

  • Degrees of Freedom: Crucial for determining the appropriate \(t\) critical value when constructing confidence intervals. \[df = \frac{\left(\sum_{i=1}^L a_i s_i^2\right)^2}{\sum_{i=1}^L \frac{(a_i s_i^2)^2}{n_i-1}}\]
  • Factor: \(a_i = \frac{N_i(N_i-n_i)}{n_i}\) represents the adjusted sample size contributing to variance calculation.

Error Bounds and Confidence Intervals

  • Error Bound: \[B = t_{1-\frac{\alpha}{2}, df} \cdot \sqrt{\hat{V}(\bar{y}_{st})}\]
  • Confidence Interval: \[\text{Estimator} \pm B\] Provides a range within which we can be \((1-\alpha) \times 100\%\) confident that the population parameter lies.

Practical Implications

  • Use Case: Stratified sampling is ideal when certain strata are expected to differ significantly, and each requires individual analysis.
  • Advantage: Increases efficiency and reduces costs by focusing resources on strategically sampled subgroups.
  • Limitation: Requires detailed advance knowledge of the population structure, which may not always be available.

Example 1: In a continuation of Example 1 from Day 12, imagine an advertising firm conducting a survey to measure weekly television-viewing habits across different locales. The firm, equipped with sufficient resources, opts to collect random samples of households from town A, town B, and a rural area. Details regarding the sample sizes will be addressed subsequently. Each selected household is surveyed to record their television-viewing hours per week, and the collected data is summarized in a table. Using this data, the task is to calculate both the average and total weekly television-viewing hours for households throughout the entire county and to establish an error bound for these estimates.

Town A Town B Rural
35 27 8
43 15 14
36 4 12
39 41 15
28 49 30
28 25 32
29 10 21
25 30 20
38 34
27 7
26 11
32 24
29
40
35
41
37
31
45
34

Calculations
A <- c(35, 43, 36, 39, 28, 28, 29, 25, 38, 27, 26, 32, 29, 40, 35, 41, 37, 31, 45, 34)
B <- c(27, 15, 4, 41, 49, 25, 10, 30)
R <- c(8, 14, 12, 15, 30, 32, 21, 20, 34, 7, 11, 24)
mean(A); sd(A); length(A)
[1] 33.9
[1] 5.94625
[1] 20
mean(B); sd(B); length(B)
[1] 25.125
[1] 15.24502
[1] 8
mean(R); sd(R); length(R)
[1] 19
[1] 9.36143
[1] 12
# Given
N_A <- 155
N_B <- 62
N_R <- 93