Day 15

MATH 313: Survey Design and Sampling

Bastola

Introduction to Estimating Proportions

  • Context: Estimating proportions in a population is key for decisions in marketing, public health, and policy making.
  • Example: Determining the proportion of voters favoring a particular policy, the proportion of households with internet access, or the percentage of people immunized in different regions.

Real-World Application

  • Marketing: A company may want to estimate the proportion of consumers who prefer a new product over existing ones. Stratified sampling can be used to ensure various demographics are represented.
  • Public Health: Health officials estimate the proportion of the population that has been vaccinated by stratifying the population by age, location, and other relevant factors.

Objective of Sample Allocation

  • Objective: Minimize the variance of the estimator while controlling costs.
  • Factors Influencing Allocation:
    1. Population proportion in each stratum, \(p_i\).
    2. Variability of the proportion, expressed as \(p_i(1-p_i)\).
    3. Sampling cost per stratum, \(c_i\).

Step 1: Determine Weights for Sample Sizes

  • Weight Formula: \[ W_i=\frac{N_i \frac{\sqrt{p_i (1-p_i)}}{\sqrt{c_i}}}{\sum_{j=1}^L N_j \frac{\sqrt{p_j (1-p_j)}}{\sqrt{c_j}}} \]
  • Purpose: Adjust sample sizes to optimize precision and cost.

Step 2: Calculate Total Sample Size

  • Sample Size Formula: \[ n=\frac{\sum_{i=1}^L N_i^2 \frac{p_i (1-p_i)}{w_i}}{N^2 D^2 + \sum_{i=1}^L N_i p_i (1-p_i)} \]
  • Where:
    • \(D=\frac{B}{Z_{1-\frac{\alpha}{2}}}\), defining the precision level for confidence intervals.

Step 3: Data Collection



  • Implementation: Collect data according to the sample sizes and weights determined in Steps 1 and 2.

Step 4: Estimation of Population Proportion

  • Estimator Formula: \[ \hat{p}_{st}=\frac{1}{N}(N_1\hat{p}_1 + N_2\hat{p}_2 + \cdots + N_L\hat{p}_L) \]
  • Variance of Estimator: \[ \hat{V}(\hat{p}_{st})=\frac{1}{N^2} \cdot \sum_{i=1}^L N_i^2\left(1-\frac{n_i}{N_i}\right)\cdot\left(\frac{\hat{p}_i(1-\hat{p}_i)}{n_i-1}\right) \]

Degrees of Freedom for Variance Estimation

  • Degrees of Freedom Formula: \[ df=\frac{\left(\sum_{i=1}^L a_i p_i(1-p_i)\right)^2}{\sum_{i=1}^L \frac{(a_i p_i(1-p_i))^2}{n_i-1}} \]
  • Importance: Correct degrees of freedom are crucial for accurate inference and confidence interval construction.

Example 1

An advertising firm wants to estimate the proportion of households viewing TV show X in a county. The county is divided into three strata: town A, town B, and the rural area, containing \(N_1=155\), \(N_2=62\), and \(N_3=93\) households, respectively. From a study conducted three years ago, the previous estimates of proportions were \(\hat{p}_1^*=0.80\), \(\hat{p}_2^*=0.25\), and \(\hat{p}_3^*=0.50\) (these values are used for selecting sample sizes, not as the current proportions).

Costs: The costs of obtaining an observation are \(c_1=c_2=9\), \(c_3=16\).

  1. The firm aims to estimate the population proportion \(p\) with a 95% bound on the error of estimation equal to 0.1. The task is to find the total sample size \(n\) and the strata sample sizes \(n_1\), \(n_2\), \(n_3\) that will achieve the desired bound at minimum cost.

  2. Based on the new survey performed with the determined sample sizes, the new sample proportions are \(\hat{p}_1=0.78\), \(\hat{p}_2=0.35\), \(\hat{p}_3=0.52\). Estimate the proportion of households viewing TV show X with a 95% bound on the error of estimation.