32 Finding sample sizes for CIs

You have learnt to ask a RQ, design a study, classify and summarise the data, construct confidence intervals, and conduct hypothesis tests. In this chapter, you will learn to:

  • estimate the sample size for producing a CI of given width for a proportion, mean, mean difference, difference between two means, and difference between two proportions.
  • explain issues relevant to estimating sample sizes.

32.1 Introduction

A confidence interval is an interval which gives a range of values of the parameter that could plausibly have produced the observed value of the statistic. All other things being equal, a larger sample size gives a more precise (Sect. 6.3) estimate of the parameter; that is, a narrower CI. After all, that's why larger samples are preferred over smaller samples: they provide more precise estimates.

Example 32.1 (Impact of sample size on CIs) Suppose we wish to estimate an unknown proportion, and find that \(\hat{p} = 0.52\) from a sample of size \(n = 25\). The approximate \(95\)% CI is \(0.52 \pm 0.200\) (so the margin of error is \(0.200\))

If the estimate of \(\hat{p} = 0.52\) was found from a sample of size \(n = 100\) (rather than \(n = 25\)), a more precise estimate should be expected. The approximate \(95\)% CI is \(0.52\pm 0.100\); the margin of error is \(0.100\), so the estimate is indeed more precise.

If the estimate of \(\hat{p} = 0.52\) was found from a sample of size \(n = 400\), the approximate \(95\)% CI is \(0.52\pm 0.050\); the margin of error is \(0.050\) (which is more precise again).

At each step, the sample size was four times as large, but the margin of error was halved.

The approximate width of the CI changes for various sample sizes (all else being equal). Try changing the sample size in the interaction below (Fig. 32.1). Observe that:

  • greater precision (smaller CI width) is obtained using larger sample sizes.
  • for small sample sizes (say, smaller than \(15\)), precision greatly increases with small increases in the sample size.
  • for large sample sizes (say, greater than \(30\)), precision improves only slightly when the sample size is increased.

FIGURE 32.1: The approximate 95% confidence interval for various sample sizes.

That is, improving precision gets more difficult as sample sizes get larger. Large gains in precision are made by moderately increasing small sample sizes, but only small gains in precision are made by large increases in already-large sample sizes.

Remember that the sample size is the number of units of analysis.

32.2 General ideas

If larger samples give more precise estimates, should the largest sample possible always be used? Not necessarily: using large samples also has disadvantages:

  • As seen above, very large sample sizes only slightly improve precision.
  • Studies with larger samples sizes take longer to complete.
  • Studies with larger samples sizes are more expensive.
  • Ethics committees aim to keep sample sizes as small as possible, so that:
    • The environment is impacted as little as possible.
    • The fewest possible animals are harmed.
    • The fewest possible people are harmed or inconvenienced.
    • Resources, time and money are not wasted.

Example 32.2 (The cost of research) Farrar et al. (2021) studied the residual effect of organic biochar compound fertilizers (BCFs) two years after application. This study required planting turmeric in pots using soil previously treated with BCFs.

After the turmeric was grown, the concentration of potassium, phosphorus and nitrogen---as well as many trace minerals---was determined from the soil in every pot. In addition, every turmeric plant was analysed for the number of shoots, the leaf mass fraction, and foliar nutrient information.

Every pot that is used has a substantial cost, both in terms of time and money. Using more pots increases precision, but also increases costs and the time to complete the study.

Determining the sample size to use in a study is a trade-off between the advantages of increasing precision, and the challenges of cost, time, and remaining ethical (Chap. 5). In addition, how the sample is obtained is important: random samples give more accurate samples (Sect. 6.3) than non-random samples. That is, the sample size is not the only issue to consider.

For these reasons, researchers usually identify a margin of error that is meaningful (i.e., of practical importance) in the context of their study.

Example 32.3 (Practical importance in sample size calculations) In a weight-loss study, estimating the weight loss with a precision of \(1\,\text{g}\) is far more precise than is necessary: a weight loss of \(1\,\text{g}\) has no practical importance, but would require a massive sample size to estimate.

In contrast, the sample size needed to detect a mean weight loss with a precision of \(50\,\text{kg}\) would be far smaller. However, a weight loss so great is of no practical importance either, as most people who are looking to lose weight are hoping to lose far less than \(50\,\text{kg}\).

The researchers may decide that a weight loss of \(5\,\text{kg}\) is sufficient to be of practical importance, and determine the sample size based on this value.

In this chapter, we learn how to compute the (approximate) minimum sample size needed to obtain a given precision (i.e., for a given margin of error) for a \(95\)% confidence interval. The estimation of sample sizes for constructing a CI is studied for these situations:

  • Estimating a proportion: Sect. 32.3.
  • Estimating a mean: Sect. 32.4.
  • Estimating a mean difference: Sect. 32.5.
  • Estimating a difference between two means: Sect. 32.6.
  • Estimating a difference between two proportions: Sect. 32.7.

The formulas given in this chapter only apply for forming \(95\)% CIs, and are very conservative: they will probably give minimum samples sizes that are a little too large, but that is better than being too small.

To ensure that the required targets are met, the results from the sample size calculation should always be rounded up. In addition, sample sizes slightly larger than calculated are often used, to allow for drop-outs: animals or plants that die; people who can no longer be contacted; and so on.

32.3 Sample size for estimating one proportion

In Sect. 22.8, a CI was formed for the population proportion of female college students in the United States that drink coffee daily (Kelpin et al. 2018). From a sample of \(n = 360\), the CI was \(0.1694 \pm 0.0395\) (i.e., the margin of error is \(0.0395\)), or from \(0.130\) to \(0.209\).

To obtain a more precise estimate (i.e., a narrower CI), a larger sample is needed, but how much larger? For instance, suppose we would like a CI with margin of error of \(0.02\) (rather than \(0.0395\)). What size sample is needed?

Definition 32.1 (Sample size: proportion) Conservatively, the size of the simple random sample needed for a \(95\)% CI for a proportion with a specified margin of error is at least \[ \frac{1}{(\text{Margin of error})^2}. \]

For the coffee-drinking situation above, at least \(\displaystyle 1\div (0.02^2) = 2\ 500\) female college students in the US is needed. This is a substantial increase from the original sample size of \(360\).

Example 32.4 (Sample size calculations for one proportion) To estimate the population proportion of South Africans that smoke, within \(0.07\) with \(95\)% confidence, at least \[ \frac{1}{(\text{Margin of error})^2} { = \frac{1}{0.07^2}} \] people are needed; at least \(n = 204.08\) people. In practice, at least \(205\) people are needed to achieve this desired level of precision (that is, always round up in sample size calculations).

Always round up the result of the sample size calculation.

The following short video may help explain some of these concepts:

32.4 Sample size for estimating one mean

Estimating a mean depends on the variation in the observations. If the data have a small amount of variation, estimating the mean requires a smaller sample size as most observations are similar.

Definition 32.2 (Sample size: mean) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the mean with a specified margin of error is at least \[ \left( \frac{2 \times s}{\text{Margin of error}}\right)^2, \] where \(s\) is an estimate of the standard deviation in the population.

The formula requires a value for the sample standard deviation, \(s\). But if we don't have a sample yet, how can we have a value for the standard deviation of the sample? An approximate value for \(s\) is used, which can come from:

  • the value of \(s\) from the results of a pilot study (Sect. 9.2).
  • the results of a similar study, where the value \(s\) there can be used (see Example 32.5).

Example 32.5 (Sample size estimation for one mean) Sect. 23.6 discusses a study about the mean cadmium concentrations in peanuts in the United States, where \(s = 0.0460\) ppm (Blair and Lamb 2017).

To estimate the mean cadmium concentration in Canadian peanuts, within \(0.005\) ppm with \(95\)% confidence, this value for \(s\) can be used. Then: \[ \left( \frac{2 \times 0.0460}{0.005}\right)^2 = 338.56; \] we would need at least \(339\) Canadian peanuts.

32.5 Sample size for estimating a mean difference

The ideas in the previous section also work for computing sample sizes for estimating mean differences, since the differences can be treated like a single sample.

Definition 32.3 (Sample size: mean difference) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the mean difference with a specified margin of error is at least \[ \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2, \] where \(s_d\) is an estimate of the standard deviation of the population differences.

Again, an approximate value for \(s_d\) can come from a pilot study (Sect. 9.2), or from the results of a similar study.

Example 32.6 (Sample size estimation for mean differences) In Sect. 29.4, a CI is computed for the difference between the distances walked in \(6\,\text{mins}\) (the six-minute walk test, 6MWT), using a \(20\,\text{m}\) and \(30\,\text{m}\) walkway (Saiphoklang, Pugongchai, and Leelasittikul 2022), for \(50\) Thai patients. The approximate \(95\)% CI is from \(15.80\,\text{m}\) to \(28.26\,\text{m}\), further for a \(30\,\text{m}\) walkway (i.e., the margin of error is \(6.234\,\text{m}\)).

Suppose we wanted to estimate the mean difference in the 6MWT distances for Malaysian patients; we could use the value of \(s\) from this study (i.e., \(s = 22.03920\)). Also, suppose we wanted a precision of \(4\,\text{m}\) (that is, the margin of error is \(4\)). For this more precise estimate, we would need a larger sample. So compute: \[ \left( \frac{2 \times 22.03920}{4}\right)^2 = 121.43; \] we would need at least \(122\) students, after rounding up.

32.6 Sample size for estimating a difference between two means

A formula for computing sample sizes for estimating the difference between two means is simple if we make some assumptions:

  • the sample size in both groups being compared is the same; and
  • the standard deviation in both groups being compared is the same.

Formulas are available for computing sample sizes without these restrictions, but are more complicated than that given here.

Definition 32.4 (Sample size: difference between two means) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the difference between two means with a specified margin of error is at least \[ 2\times \left( \frac{2 \times s}{\text{Margin of error}}\right)^2 \] for each sample, where \(s\) is an estimate of the common standard deviation in the population for both groups.

Example 32.7 (Sample size estimation for difference between means) In Sect. 30.7, a CI is computed for difference between the mean speeds of cars before and after signage was added (Ma et al. 2019). Suppose we wanted to estimate the difference between the mean reaction times within \(5\,\text{km}\).h\(-1\).

In Sect. 30.7, the two groups (before and after signage added) produced standard deviations of \(13.194\) and \(13.134\) (which are very similar). We decide to use \(s = 13.15\) in the sample-size calculation as the common value of \(s\): \[ 2 \times \left( \frac{2 \times 13.15}{5}\right)^2 = 55.335. \] We would need to measure the speed of at least \(56\) cars before signage was added, and another \(56\) cars after the addition of signage (rounding up the result).

32.7 Sample size for estimating a difference between proportions

A formula for computing sample sizes for estimating the difference between two proportions is simple if we assume the sample size in both groups being compared is the same. Formulas are available for computing sample sizes without this restriction, but are more complicated than that given here.

Definition 32.5 (Sample size: difference between two proportions) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the difference between two proportions with a specified margin of error is at least \[ \frac{2}{(\text{Margin of error})^2} \] for each sample.

Example 32.8 (Sample size estimation for difference between proportions) In Sect. 31.9, a CI is computed for difference between the proportion of infected turtles nests, comparing natural and relocated nests (Candan, Katılmış, and Ergin 2021). Suppose we wanted to estimate the difference between the proportion of infected nests within \(0.15\).

We compute: \[ \frac{2}{0.15^2} = 88.89. \] We would need to record data from at least \(89\) natural nests and \(89\) relocated nests.

32.8 More details about these sample size calculations

The above calculations form just one part of the information needed to make the final decision about the necessary sample size. For example, the cost (time and money) of taking the samples has not been considered.

The calculations in this chapter assume a simple random sample will be used, which is often unreasonable. Other, more complex, formulas are available for computing sample sizes for other random-sampling schemes (such as stratified samples). However, the above calculations give an approximate minimum sample size required. In addition, the calculations in this chapter are only for producing \(95\)% confidence intervals.

In practice, researchers often start with a slightly larger sample than calculated to allow for drop-outs (e.g., plants die, or people withdraw from the study).

32.9 Example: emergency residential aged care

Dwyer et al. (2021) studied residential aged care residents in Australia needing emergency care and recorded, among other information, the average age of such residents (\(\bar{x} = 85\); \(s = 7.3\)) and the proportion of calls related to falls (\(\hat{p} = 0.156\)).

Suppose a similar study was to be conducted in New Zealand. The aim was to estimate the mean age of residents within \(2\) years of age, and the proportion of incidents related to falls within \(0.10\).

Using the value of \(s\) from Australia, the sample size required to meet the age requirement is at least \[ n = \left(\frac{2\times s}{\text{Margin of error}}\right)^2 = \left(\frac{2\times 7.3}{2}\right)^2 = 53.29, \] or at least \(54\) residents (rounding up). The sample size required to meet the falls requirement is at least \[ n = \frac{1}{(\text{Margin of error}^2)} = \frac{1}{0.1^2} = 100. \] Since the same subjects are needed for both estimates, at least \(100\) residents are needed.

32.10 Chapter summary

Estimating a sample size is a compromise between the precision of the estimate, and the need to remain ethical and reduce costs. All other things being equal, making a sample size four times as large results in a confidence interval half as wide. This means that large gains in precision are made by increasing small sample sizes, but only small gains are made by increasing already-large sample sizes.

32.11 Quick review questions

Are the following statements true or false?

  1. A larger sample size produces a more accurate estimate of the parameter, all other things being equal.
  2. A larger sample size produces a more random sample.
  3. We should always take the largest possible sample size.

32.12 Exercises

Answers to odd-numbered exercises are given at the end of the book.

Exercise 32.1 To obtain a narrower CI, is a larger or smaller sample size necessary (all else being equal)?

Exercise 32.2 Does a narrow CI imply a precise estimate, or an accurate estimate of the parameter?

Exercise 32.3 Suppose we need to estimate a population mean (with \(95\)% confidence), using \(s = 1\,\text{kg}\).

  1. What size sample is needed to estimate the population mean within \(0.4\,\text{kg}\)?
  2. What size sample is needed to estimate the population mean within \(0.2\,\text{kg}\) (that is, the CI will be half as wide as in the first calculation)?
  3. What size sample is needed to estimate the population mean within \(0.1\,\text{kg}\) (that is, the CI will be a quarter as wide as in the first calculation)?
  4. To get a CI half as wide, how many times more units of analysis are needed?
  5. To get a CI a quarter as wide, how many times more units of analysis are needed?
  6. Would a smaller or larger sample be needed to estimate the population mean within \(0.4\,\text{kg}\), with \(99\)% confidence? Explain.

Exercise 32.4 Suppose we need to estimate a difference between two population means (with \(95\)% confidence), using \(s = 8\,\text{cm}\).

  1. What size samples are needed to estimate the difference between the population means within \(4\,\text{cm}\)?
  2. What size samples are needed to estimate the difference between the population means within \(2\,\text{cm}\) (that is, the CI will be half as wide as in the first calculation)?
  3. What size samples are needed to estimate the difference between the population means within \(1\,\text{cm}\) (that is, the CI will be a quarter as wide as in the first calculation)?
  4. To get a CI half as wide, how many times more units of analysis are needed?
  5. To get a CI a quarter as wide, how many times more units of analysis are needed?
  6. Would a smaller or larger sample be needed to estimate the population mean within \(4\,\text{cm}\), with \(99\)% confidence? Explain.

Exercise 32.5 Mann and Blotnicky (2017) studied of the eating habits of university students in Canada (Sect. 22.3). They estimated the proportion of Canadian students that ate a sufficient number of servings of grains each day.

Suppose we wished to repeat the study but for New Zealand university students; that is, we seek an estimate of the population proportion of New Zealand students that eat a sufficient number of servings of grains each day (with \(95\)% confidence).

  1. What size sample is needed to estimate the proportion within \(0.01\)?
  2. What size sample is needed to estimate the proportion within \(0.02\)?
  3. What size sample is needed to estimate the proportion within \(0.10\)?
  4. Do you think this study would be costly, in terms of time and money?

Exercise 32.6 We wish to estimate the population proportion of Kenyans that smoke.

  1. Suppose we wish our \(95\)% CI to have a margin of error of \(0.05\). How many Kenyans would need to be surveyed?
  2. Suppose we wish our \(95\)% CI to have a margin of error of \(0.025\); that is, we wish to halve the width of the interval above. How many Kenyans would need to be surveyed?
  3. How many times as many Kenyans are needed to halve the width of the CI?

Exercise 32.7 Tager et al. (1979) measured the lung capacity of \(11\)-year-old girls in East Boston, using the forced expiratory volume (FEV) of the children (Exercise 23.3). Suppose we wished to repeat the study, and find a \(95\)% CI for the mean FEV for \(11\)-year-old Australian girls.

Since Australian and American children might be somewhat similar, we could use, as an approximation, the standard deviation from that study: \(s = 0.43\).

  1. What size sample is needed to estimate the mean within \(0.02\,\text{L}\)?
  2. What size sample is needed to estimate the mean within \(0.05\,\text{L}\)?
  3. What size sample is needed to estimate the mean within \(0.10\,\text{L}\)?
  4. Suppose we wished to find \(99\)% (not \(95\)%) confidence interval for the mean FEV for \(11\)-year-old Australian girls, within \(0.10\,\text{L}\). Would this sample size be larger or smaller than the sample size found for a \(95\)% confidence interval (also within \(0.10\,\text{L}\))?
  5. Do you think this study would be costly, in terms of time and money?

Exercise 32.8 B. Williams and Boyle (2007) asked paramedics (\(n = 199\)) to estimate the amount of blood loss on four different surfaces. When the actual amount of blood spill on concrete was \(1\,000\,\text{mL}\), the mean guess was \(846.4\,\text{mL}\) (with a standard deviation of \(651.1\,\text{mL}\)). For a different study:

  1. how many paramedics are needed to estimate the mean with a precision of \(50\,\text{mL}\)?
  2. how many paramedics are needed to estimate the mean with a precision of \(25\,\text{mL}\)?
  3. how many times greater does the sample size need to be to halve the width of the margin of error?

Exercise 32.9 Skypilot is an alpine wildflower native to the Colorado Rocky Mountains (USA). In recent years, a willow shrub has been encroaching on skypilot territory and, because willow often flowers early, Kettenbach et al. (2017) studied whether the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p. \(6\,965\)). Data for both species was collected at \(25\) different sites, so the data are paired by site. The 'first-flowering day' is the number of days since the start of the year (e.g., January \(12\) is 'day \(12\)') when flowers were first observed.

Suppose a similar paired study was to be conducted on skypilot growing in Sierra Nevada, California. Using the software output in Fig. 13.4:

  1. determine the sample size needed to estimate the mean difference in first-flowering day within two days.
  2. determine the sample size needed to estimate the mean difference in first-flowering day within three days.

Exercise 32.10 MacGregor et al. (1979) studied treating hypertension with Captopril. Patients had their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug. A pilot study showed that the difference between the two measurements had a standard deviation of about \(9\,\text{mm}\) Hg.

  1. Determine the sample size needed to estimate the mean reduction in systolic blood pressure within \(2\,\text{mm}\) Hg.
  2. Determine the sample size needed to estimate the mean reduction in diastolic blood pressure within \(1.5\,\text{mm}\) Hg.

Exercise 32.11 Agbayani, Fortune, and Trites (2020) studied gray whales (Eschrichtius robustus) and measured (among other things) the length of whales at birth. Summary information is shown in Table 30.6. Suppose another research study wanted to study sperm whales, which have an approximately similar size.

  1. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, within \(0.15\,\text{m}\).
  2. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, within \(0.10\,\text{m}\).
  3. Determine the sample size needed to estimate the difference between the mean lengths for female and male goldfish at birth, within \(1\,\text{mm}\).

Exercise 32.12 Suppose researchers are trialling a new drug to reduce the recovery time (compared to standard treatments) after contracting double pneumonia. They conduct a pilot study, and find the standard deviation of the duration of the symptoms, in both groups, is about \(s = 1.25\) days.

  1. What size sample is needed to estimate the difference between the mean recovery times between the two treatments within \(1\) day.
  2. What size sample is needed to estimate the difference between the mean recovery times between the two treatments within \(0.5\) days.

Exercise 32.13 Table 31.10 summarises the data from a study of the incidents of in-hospital heart attacks for people admitted following an earlier heart attack. To estimate the difference between the proportion of patients having an in-hospital heart attack (between patients with a low body temperature and patients with a high body temperature) within \(0.03\), what size samples are needed?

Exercise 32.14 Exercise 31.13 describes a study comparing the proportion of females and males who wore sunglasses in Brisbane, Australia (B. Dexter et al. 2019). Suppose we wished to make a similar comparison for people in Auckland, estimating the difference in the proportions within \(0.07\). How many females and males would be needed?