Day 13

MATH 313: Survey Design and Sampling

Bastola

Recap: Stratified Sampling

Objective: Leverage stratified random sampling to estimate population characteristics more accurately.
Approach: Divide population into homogeneous groups (strata) to reduce sampling error.
Application: Widely used in surveys and research where population subgroups are to be measured with greater precision.

Stratification Explained

Strata: \(L\) distinct groups within the population, each representing key sub-populations.
Sizes:
- Total population: \(N = \sum_{i=1}^L N_i\)
- Each stratum: \(N_i\) (sub-population size), \(n_i\) (sample size)
Metrics:
- Sample mean for stratum \(i\): \(\bar{Y}_i\)
- Total for stratum \(i\): \(\tau_i\)
- Population total: \(\tau = \sum_{i=1}^L \tau_i\)

Estimation Methodology

Population Mean: \(\bar{y}_{st} = \frac{1}{N} \sum_{i=1}^L N_i \bar{Y}_i\), a weighted average where weights are proportional to stratum size.
Population Total: \(\hat{\tau}_{st} = N \cdot \bar{y}_{st}\), scaled up mean to reflect total population size.

Variance and Precision

Key Concept: Estimating variance helps assess the precision and reliability of our estimates.
Variance of Mean Estimation: \[\hat{V}(\bar{y}_{st}) = \frac{1}{N^2} \sum_{i=1}^L N_i^2 \left(1 - \frac{n_i}{N_i}\right) \frac{s_i^2}{n_i}\]
Variance of Total Estimation: \[\hat{V}(\hat{\tau}_{st}) = N^2 \cdot \hat{V}(\bar{y}_{st})\]

Degree of Freedom and Critical Values

Degrees of Freedom: Crucial for determining the appropriate \(t\) critical value when constructing confidence intervals. \[df = \frac{\left(\sum_{i=1}^L a_i s_i^2\right)^2}{\sum_{i=1}^L \frac{(a_i s_i^2)^2}{n_i-1}}\]
Factor: \(a_i = \frac{N_i(N_i-n_i)}{n_i}\) represents the adjusted sample size contributing to variance calculation.

Error Bounds and Confidence Intervals

Error Bound: \[B = t_{1-\frac{\alpha}{2}, df} \cdot \sqrt{\hat{V}(\bar{y}_{st})}\]
Confidence Interval: \[\text{Estimator} \pm B\] Provides a range within which we can be \((1-\alpha) \times 100\%\) confident that the population parameter lies.

Practical Implications

Use Case: Stratified sampling is ideal when certain strata are expected to differ significantly, and each requires individual analysis.
Advantage: Increases efficiency and reduces costs by focusing resources on strategically sampled subgroups.
Limitation: Requires detailed advance knowledge of the population structure, which may not always be available.

Example 1: In a continuation of Example 1 from Day 12, imagine an advertising firm conducting a survey to measure weekly television-viewing habits across different locales. The firm, equipped with sufficient resources, opts to collect random samples of households from town A, town B, and a rural area. Details regarding the sample sizes will be addressed subsequently. Each selected household is surveyed to record their television-viewing hours per week, and the collected data is summarized in a table. Using this data, the task is to calculate both the average and total weekly television-viewing hours for households throughout the entire county and to establish an error bound for these estimates.

Town A	Town B	Rural
35	27	8
43	15	14
36	4	12
39	41	15
28	49	30
28	25	32
29	10	21
25	30	20
38		34
27		7
26		11
32		24
29
40
35
41
37
31
45
34

Calculations

A <- c(35, 43, 36, 39, 28, 28, 29, 25, 38, 27, 26, 32, 29, 40, 35, 41, 37, 31, 45, 34)
B <- c(27, 15, 4, 41, 49, 25, 10, 30)
R <- c(8, 14, 12, 15, 30, 32, 21, 20, 34, 7, 11, 24)
mean(A); sd(A); length(A)

[1] 33.9

[1] 5.94625

[1] 20

mean(B); sd(B); length(B)

[1] 25.125

[1] 15.24502

[1] 8

mean(R); sd(R); length(R)

[1] 19

[1] 9.36143

[1] 12

# Given
N_A <- 155
N_B <- 62
N_R <- 93

Town A	Town B	Rural
35	27	8
43	15	14
36	4	12
39	41	15
28	49	30
28	25	32
29	10	21
25	30	20
38		34
27		7
26		11
32		24
29
40
35
41
37
31
45
34

Town A	Town B	Rural
35	27	8
43	15	14
36	4	12
39	41	15
28	49	30
28	25	32
29	10	21
25	30	20
38		34
27		7
26		11
32		24
29
40
35
41
37
31
45
34

Town A	Town B	Rural
35	27	8
43	15	14
36	4	12
39	41	15
28	49	30
28	25	32
29	10	21
25	30	20
38		34
27		7
26		11
32		24
29
40
35
41
37
31
45
34