Day 6

MATH 313: Survey Design and Sampling

Bastola

Population Probability Calculation

In general, if \(X\) is a random variable following a normal distribution with the mean \(a\) and standard deviation \(b\), then we can compute the probability of \(X>c\) for some number \(c\) with the following R function:

```{r}
1 - pnorm(x = c, mean = a, sd = b)
```

Sampling Distribution

When dealing with a sample mean \(\bar{y}\), the variable \(\bar{y}\) is normally distributed with mean \(\mu\) and standard deviation \(\frac{\sigma}{\sqrt{n}}\), where \(n\) is the sample size. This is described mathematically as: \[ X = \bar{y}, \quad a = \mu, \quad b = \frac{\sigma}{\sqrt{n}} \] For computing the probability of \(X>c\), use the R code below:

```{r}
1 - pnorm(x = c, mean = a, sd = b)
```

Example 1

Suppose the heights of all male adults from a particular region follow a normal distribution with mean 5.6 ft and variance \(1.44 \mathrm{ft}^2\). A sample of 100 male adults from this region is randomly selected and their heights are measured.

  1. What is the expected value and standard deviation of the average height, \(\bar{y}\), of this sample?


Example 1

  1. What is the probability that the average height of this sample is less than 5 ft ?


  1. What is the probability that the average height of this sample is greater than 6 ft ?


Example 1

  1. What is the probability that the average height of this sample is between 5.2 ft and 5.9 ft ?


  1. What is the probability that a randomly selected male adult from this region has a height greater than 6 ft ?


Percentiles in Statistical Distributions

The \(100 \times p\)th percentile of a random variable \(X\) is a value \(c\) such that: \[P(X<c) = p\] This defines the point below which a given percentage (\(p\)) of observations fall.

Standard Normal Distribution

  • For a standard normal random variable \(Z\), the \(100 \times p\)th percentile is denoted \(z_p\).
  • Example in R: qnorm(p) where p is the desired percentile.

Percentiles in Statistical Distributions

  • General Normal Distribution
    • For a normally distributed variable \(X\) with mean \(\mu\) and standard deviation \(\sigma\): \[c = \mu + z_p \cdot \sigma\]
    • This is calculated in R using: qnorm(p, mean = mu, sd = sigma)
    • Represents the value below which \(100 \times p\) percent of the data in the distribution lies.

Example 2

According to the National Center of Education Statistics report, the SAT scores for all exam takers in Maryland in 2017 has a mean of 1060 and standard deviation of 199. Assume that the SAT scores follow a normal distribution.

  1. A top-ranked university only accepts the top \(10 \%\) students in Maryland. Find the minimum SAT score for being admitted to that university in 2017.

Example 2

  1. Find the SAT scores that separate the middle \(80 \%\) of the scores from the top and bottom \(10 \%\).


  1. A random sample of 200 SAT scores is selected. Find the minimum possible value \(c\) such that only \(5 \%\) of chance the sample mean \(\bar{y}\) is greater than \(c\).

What is Simple Random Sampling?

  • Definition: Simple Random Sampling (SRS) is a method where each sample of size \(n\) from a population of size \(N\) has an equal chance of being selected.
  • Properties:
    • Every element has an equal probability of selection.
    • Selections are independent of each other.

Why Use Simple Random Sampling?

  • Fairness and Simplicity:
    • Ensures each member of the population has an equal opportunity to be chosen.
    • Easy to understand and implement.
  • Minimized Bias:
    • Reduces the chance of sampling bias; results are more likely to be representative of the population.

Limitations of Simple Random Sampling

  • Feasibility and Cost:
    • Not practical for very large populations due to logistical complexities and costs.
  • Variability:
    • Can lead to higher variability between samples compared to more complex designs.

Simple Random Sampling in Action: Nielsen Ratings

  • Application:
    • Nielsen uses a form of SRS to measure television viewership across the U.S.
  • Process:
    • Randomly selects households to install People Meters.
    • Collects and processes viewership data to provide ratings.
  • Impact:
    • Ratings influence TV programming and advertising decisions globally.