Day 25

MATH 313: Survey Design and Sampling

Bastola

Introduction to Systematic Sampling

Systematic sampling simplifies the sample selection process, making it a widely used design in survey sampling due to:

  • Ease of implementation in the field.
  • Lower susceptibility to selection errors compared to simple random and stratified random sampling.

Definition: A 1-in-k systematic sample with a random start is obtained by randomly selecting one element from the first \(k\) elements and every \(k^{th}\) element thereafter.

Advantages of Systematic Sampling

Advantages:

  • Simplicity, especially useful with manual frames or registers.
  • High intra-sample variability can make it more accurate than simple random sampling (SRS).

How to Draw a Systematic Sample

To draw a systematic sample:

  • Choose \(k\) such that \(k \leq \frac{N}{n}\) where \(N\) is the population size and \(n\) is the desired sample size.
  • Randomly select one element from the first \(k\) elements.
  • Continue selecting every \(k^{th}\) element thereafter.

Example: For \(N=15000\) and \(n=1600\), choose \(k=9\) for a 1-in-9 systematic sample.

Understanding Sampling Probabilities in Systematic Sampling

In systematic sampling:

  • Each unit’s selection probability, \(\pi_k\), is \(\frac{1}{a} = \frac{n}{N}\) when \(N\) is a multiple of \(a\).
  • Issues arise when there are extra units beyond \(a \times n\), leading to slight imbalances in probabilities.

Key Point: The entire sample can be considered as a single cluster, which complicates variance estimation as many second-order inclusion probabilities are zero.

Practical Considerations in Systematic Sampling

While systematic sampling offers many advantages, its has some drawbacks:

  • Cycled behavior in the sampling frame may bias the results.
  • It is challenging to estimate unbiased sampling errors due to the dependency on a single starting point.

The spread of the sample through the population should ideally cover the entire range to ensure representative data collection.

Applications of Systematic Sampling

Systematic sampling’s methodology allows for diverse applications, from national censuses to quality control in manufacturing:

  • Ensures consistent coverage across a population.
  • Facilitates easier and less error-prone data collection in field settings.

This versatility makes it a favored choice in many large-scale and long-term studies.

Handling Non-Integer \(\frac{N}{n}\) in Systematic Sampling

When \(\frac{N}{n}\) is not an integer (\(N = na + c\)):

  • If the starting value \(r \leq c\), the sample includes \(n+1\) units.
  • Otherwise, the sample includes \(n\) units.

Alternative Method:

  1. Generate a random number \(\varepsilon\) from a uniform distribution over \((0, a)\).
  2. Select elements \(i\) such that \(i-1 < \varepsilon + (j-1)a \leq i\); for \(j = 1, \ldots, n\) and \(i = 1, \ldots, N\).

This method ensures each element has an equal chance of selection, regardless of the frame’s total size relative to \(n\).

Example 1: (Ex 7.3, Textbook) We are working with a retail store with four departments. Each department has charge accounts, and we want to estimate the proportion of past-due accounts using systematic sampling. The key questions involve listing systematic samples and computing exact variances, followed by comparing these results with simple random sampling variances.

We are given the following information:

Department 1 Department 2 Department 3 Department 4
Account numbers 1-11 12-20 21-28 29-40
Delinquent accounts 1, 2, 3, 4 12, 13, 14 21, 22, 23, 24, 25 29, 30, 31, 32

The store wishes to estimate the proportion of past-due accounts using systematic sampling.

Part (a): 1-in-10 Systematic Samples

List all possible 1-in-10 systematic samples and compute the exact variance of the sample proportion.

Part (b): 1-in-5 Systematic Samples

Next, list all possible 1-in-5 systematic samples and compute the exact variance of the sample proportion.

Part (c): Comparing Systematic Sampling with Simple Random Sampling

We will now compare the variances obtained in parts (a) and (b) with simple random sampling of sizes \(n=4\).