Day 14

MATH 313: Survey Design and Sampling

Bastola

Introduction to Stratified Sampling


  • Overview: In stratified sampling, the population is divided into homogeneous subgroups, and samples are taken from each.
  • Key Concept: Unlike simple random sampling with one total sample size \(n\), stratified sampling requires determining \(L+1\) sample sizes: \(n\), the total sample size, and \(n_1, n_2, \ldots, n_L\), the sample size for each stratum.

Objective of Sample Allocation

  • Objective: Minimize the variance of the estimator at the lowest possible cost.
  • Factors Influencing Allocation:
    1. Number of elements in each stratum, \(N_i\).
    2. Variability within each stratum, \(\sigma_i\).
    3. Cost of sampling from each stratum, \(c_i\).

Allocation Formulas

  • Neyman Allocation (Equal Costs): \[ w_i=\frac{N_i \sigma_i}{\sum_{j=1}^L N_j \sigma_j} \]
  • Proportional Allocation (Equal Costs and Variability): \[ w_i=\frac{N_i}{N} \]
  • General Allocation (Varying Costs and Variability): \[ w_i=\frac{\frac{N_i \sigma_i}{\sqrt{c_i}}}{\sum_{k=1}^L \frac{N_k \sigma_k}{\sqrt{c_k}}} \]

Sample Size Determination

  • Estimation of Mean/Average or Total: \[ n=\frac{\sum_{i=1}^L N_i^2 \frac{\sigma_i^2}{w_i}}{N^2 D^2 + \sum_{i=1}^L N_i \sigma_i^2} \]
  • Where:
    • \(D=\frac{B}{Z_{1-\frac{\alpha}{2}}}\) for estimating means.
    • \(D=\frac{B}{N \cdot Z_{1-\frac{\alpha}{2}}}\) for estimating totals.

Practical Considerations

  • Decision Points:
    1. Determining total sample size \(n\) based on budget and desired precision.
    2. Choosing an allocation strategy based on known stratum characteristics and costs.
  • Common Errors: Misestimating \(N_i\), \(\sigma_i\), or \(c_i\) can lead to inefficient allocations.
  • Summary: Effective sample size selection in stratified sampling crucially depends on accurate stratum characterization and cost considerations.
  • Further Steps: Implementation of chosen sampling strategy and subsequent data collection.

Example 1: The advertising firm in Example 1 of Day 13 finds that obtaining an observation from a rural household costs more than obtaining a response in town A or B . The increase is due to the costs of traveling from one rural household to another. The cost per observation in each town is estimated to be \(\$ 9\) (i.e., \(c_1=c_2=9\) ), and the costs per observation in the rural area to be \(\$ 16\) (i.e., \(c_3=16\) ). The stratum standard deviations (approximated by the strata sample variances from a prior survey) are \(\sigma_1 \approx 5, \sigma_2 \approx 15\), and \(\sigma_3 \approx 10\). Find the overall sample size \(n\) and the stratum sample sizes, \(n_1, n_2\), and \(n_3\), that allow the firm to estimate, at minimum cost, the average television-viewing time with a bound on the error of estimation equal to 2 hours. (Recall that \(N=310, N_1=155, N_2=62, N_3=93\))