29 Sample sizes for CIs

So far, you have learnt to ask a RQ, design a study, classify and summarise the data, and form confidence intervals. In this chapter, you will learn to:

  • estimate the sample size for producing a CI of given width for a proportion, mean, mean difference and difference between two means.
  • explain issues relevant to estimating sample sizes.

29.1 Introduction

A confidence interval is an interval which gives a range of values of the parameter that could plausibly have produced the observed value of the statistic. All other things being equal, a larger sample size gives a more precise (Sect. 6.3) estimate of the parameter. After all, that's why we prefer larger samples: to get more precise estimates, and hence narrower CIs. If that was not the case, we could take the smallest, cheapest and easiest possible sample of size one... which is clearly absurd.

Example 29.1 (Impact of sample size on CIs) Suppose we wish to estimate an unknown proportion, and find that \(\hat{p} = 0.52\) from a sample of size \(n = 25\). The approximate \(95\)% CI is \(0.52 \pm 0.200\) (so the margin of error is \(0.200\))

If the estimate of \(\hat{p} = 0.52\) was found from a sample of size \(n = 100\) (rather than \(n = 25\)), a more precise estimate should be expected. The approximate \(95\)% CI is \(0.52\pm 0.100\); the margin of error is \(0.100\).

If the estimate of \(\hat{p} = 0.52\) was found from a sample of size \(n = 400\), the approximate \(95\)% CI is \(0.52\pm 0.050\); the margin of error is \(0.050\).

At each step, the sample size was four times as large, but the margin of error was halved.

The approximate width of the CI changes for various sample sizes (all else being equal). Try changing the sample size in the interaction below (Fig. 29.1). We can see that:

  • greater precision (smaller CI width) is obtained using larger sample sizes.
  • for small sample sizes (say, smaller than \(15\)), precision greatly increases with small increases in the sample size.
  • for large sample sizes (say, greater than \(30\)), precision improves only slightly when the sample size is increased.

FIGURE 29.1: The approximate 95% confidence interval for various sample sizes

That is, improving precision gets more difficult as sample sizes get larger. Large gains in precision are made by moderately increasing small sample sizes, but only small gains in precision are made by large increases in already-large sample sizes.

Remember that the sample size is the number of units of analysis.

29.2 General ideas

If larger samples give more precise estimates, should the largest sample possible always be used? Not necessarily: using large samples also has disadvantages:

  • Studies with larger samples sizes take longer to complete.
  • Studies with larger samples sizes are more expensive.
  • Ethics committees aim to keep sample sizes as small as possible, so that:
    • The environment is impacted as little as possible.
    • The fewest possible animals are harmed.
    • The fewest possible people are harmed or inconvenienced.
    • Resources, time and money are not wasted.

Example 29.2 (The cost of research) Farrar et al. (2021) studied the residual effect of organic biochar compound fertilizers (BCFs) two years after application. This study required planting turmeric in pots using soil previously treated with BCFs.

After the turmeric was grown, the concentration of potassium, phosphorus and nitrogen---as well as many trace minerals---was determined from the soil in every pot. In addition, every turmeric plant was analysed for the number of shoots, the leaf mass fraction, and foliar nutrient information.

Clearly, every pot that is used comes with a substantial cost, both in terms of time and money.

Determining the sample size to use is a trade-off between the advantages of increasing precision, and the challenges of cost, time, and remaining ethical (Chap. 5). In addition, how the sample is obtained is important also: random samples give more accurate samples (Sect. 6.3) than non-random samples. For these reasons, researchers usually identify a margin-of-error that is meaningful (i.e., of practical importance) in the context of their study.

Example 29.3 (Practical importance in sample size calculations) In a weight-loss study, estimating the weight reduction to within \(1\) g is far more precise than is necessary: a weight loss of \(1\) g is of no practical importance, but would require a massive sample size to estimate.

In contrast, the sample size needed to detect a weight loss to within \(50\) kg would be far smaller. However, a weight loss so great is of no practical importance either, as most people who are looking to lose weight are hoping to lose far less than \(50\), kg.

The researchers may decide that a weight loss to within \(5\) kg is sufficient to be of practical importance, and determine the sample size based on this value.

In this chapter, we learn how to compute the (approximate) minimum sample size needed to obtain a given precision (i.e., for a given margin of error) for a \(95\)% confidence interval. We only study the estimation of sample sizes for constructing a CI in these situations:

  • Estimating a proportion: Sect. 29.3.
  • Estimating a mean: Sect. 29.4.
  • Estimating a mean difference: Sect. 29.5.
  • Estimating a difference between two means: Sect. 29.6.

The formulas given in this chapter only apply for forming \(95\)% CIs, and are very conservative: they will probably give minimum samples sizes a bit too large, but that is better than being too small. In any case, sample sizes slightly larger than calculated are often used anyway, to allow for drop outs: animals or plants that die; people who can no longer be contacted; and so on.

29.3 Sample size for estimating one proportion

In Sect. 23.7, a CI was formed for the population proportion of female college students in the United States that drink coffee daily (Kelpin et al. 2018). From a sample of \(n = 360\), the CI was \(0.1694 \pm 0.0395\) (i.e., the margin of error is \(0.0395\)), or from \(0.130\) to \(0.209\).

To obtain a more precise estimate (i.e., a narrower CI), a larger sample is needed. For instance, suppose we would like a CI with margin of error of \(0.02\). What size sample is needed? Since we seek a more precise estimate, a larger sample is needed... but how much larger?

Definition 29.1 (Sample size: proportion) Conservatively, the size of the simple random sample needed for a \(95\)% CI for a proportion with a specified margin-of error is at least \[ \frac{1}{(\text{Margin of error})^2}. \]

For the coffee-drinking situation above, a sample size of at least \(\displaystyle 1\div (0.02^2) = 2\ 500\) female college students in the US is needed. This is a substantial increase from the original sample size of \(360\).

Example 29.4 (Sample size calculations for one proportion) To estimate the population proportion of South Africans that smoke, to within \(0.07\) with \(95\)% confidence, a sample size of at least \[ \frac{1}{(\text{Margin of error})^2} { = \frac{1}{0.07^2}} \] is needed; at least \(n = 204.0816\) people. In practice, at least \(205\) people are needed to achieve this desired level of precision (that is, always round up in sample size calculations).

Always round up the result of the sample size calculation.

The following short video may help explain some of these concepts:

29.4 Sample size for estimating one mean

Definition 29.2 (Sample size: mean) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the mean with a specified margin-of error is at least \[ \left( \frac{2 \times s}{\text{Margin of error}}\right)^2, \] where \(s\) is an estimate of the standard deviation in the population.

The formula requires a value for the sample standard deviation, \(s\). But if we don't have a sample yet... how can we have a value for the standard deviation of the sample? An approximate value for \(s\) is used, which can come from:

  • the value of \(s\) from the results of a pilot study (Sect. 9.1).
  • the results of a similar study, where the value \(s\) there can be used (see Example 29.5).

Example 29.5 (Sample size estimation for one mean) Sect. 24.5 discusses a study about the mean cadmium concentrations in peanuts in the United States, where \(s = 0.0460\) ppm (Blair and Lamb 2017).

Suppose we wanted to estimate the mean cadmium concentration in Australian peanuts, to give-or-take \(0.005\) ppm with \(95\)% confidence. We could use this value for \(s\) as a starting point, and then compute: \[ \left( \frac{2 \times 0.0460}{0.005}\right)^2 = 338.56; \] we would need at least \(339\) peanuts.

29.5 Sample size for estimating a mean differences

The ideas in the previous section also work for computing sample sizes for estimating mean differences, since the differences can be treated like a single sample.

Definition 29.3 (Sample size: mean difference) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the mean difference with a specified margin-of error is at least \[ \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2, \] where \(s_d\) is an estimate of the standard deviation in the population.

Again, an approximate value for \(s\) can come from a pilot study (Sect. 9.1), or from the results of a similar study.

Example 29.6 (Sample size estimation for mean differences) In Sect. 26.4, a CI is computed for the mean weight gain by \(n = 68\) Cornell University students from Week 1 to Week 12 (D. A. Levitsky, Halbmaier, and Mrdjenovic (2004), D. Levitsky (n.d.)). The CI is \(0.862\pm 0.232\) kg, where the margin of error is \(0.232\) kg.

Suppose we wanted to estimate the mean weight change at a different university; we could use the value of \(s\) from this study (i.e., \(s = 0.956\)). Also, suppose we wanted a more precise estimate, to give-or-take \(0.15\) kg. For a more precise estimate, we would need a larger sample. So we compute: \[ \left( \frac{2 \times 0.965}{0.15}\right)^2 = 162.4775; \] we would need at least \(163\) students after rounding up.

29.6 Sample size for estimating a difference between two means

A formula for computing sample sizes for estimating difference between two means is simple if we make some assumptions:

  • the sample size in each group is the same; and
  • the standard deviation in each group is the same.

Formulas are available for computing sample sizes without these restrictions, but are more complicated than that given here.

Definition 29.4 (Sample size: difference between two means) Conservatively, the size of the simple random sample needed for a \(95\)% CI for the difference between two means with a specified margin-of error is at least \[ 2\times \left( \frac{2 \times s}{\text{Margin of error}}\right)^2 \] for each sample, where \(s\) is an estimate of the common standard deviation in the population for both groups.

Example 29.7 (Sample size estimation for difference between means) In Sect. 27.7, a CI is computed for difference between the mean speeds of cars before and after signage was added (Ma et al. 2019). Suppose we wanted to estimate the difference between the mean reaction times to within \(5\) km.h\(-1\).

In Sect. 27.7, the two groups (before and after signage added) produced standard deviations of \(13.194\) and \(13.134\) (which are very similar). We decide to use \(s = 13.15\) in the sample-size calculation: \[ 2 \times \left( \frac{2 \times 13.15}{5}\right)^2 = 55.335. \] We would need to measure the speed of at least \(56\) cars before and after the addition of signage.

29.7 Other issues related to sample size

The above calculations form just one part of the information needed to make the final decision about the necessary sample size. For example, the cost (time and money) of taking sample of this size has not been considered.

The calculations in this chapter assume a simple random sample will be used, which is often unreasonable. Other, more complex, formulas are available for computing sample sizes for other random-sampling schemes (such as stratified samples). However, the above calculations give an estimate of the minimum sample size required. In addition, the calculations in this chapter are only for producing \(95\)% confidence intervals.

In practice, researchers often start with a slightly larger sample than calculated to allow for drop-outs (e.g., plants die, or people withdraw from the study).

29.8 Example: emergency residential aged care

Dwyer et al. (2021) studied residential aged care residents in Australia needing emergency care and recorded, among other information, the average age of such residents (\(\bar{x} = 85\); \(s = 7.3\)) and the proportion of calls related to falls (\(\hat{p} = 0.156\)).

Suppose a similar study was to be conducted in New Zealand. The aim was to estimate the mean age of residents to with \(2\) years of age, and the proportion of incidents related to falls to within \(0.10\).

The sample size required to meet the age requirement is at least \[ n = \left(\frac{2\times s}{\text{Margin of error}}\right)^2 = \left(\frac{2\times 7.3}{2}\right)^2 = 53.29, \] or at least \(54\) residents (rounding up). The sample size required to meet the falls requirement is at least \[ n = \frac{1}{(\text{Margin of error}^2)} = \frac{1}{0.1^2} = 100. \] Since the same subjects are needed for both estimates, at least \(100\) residents are needed.

29.9 Chapter summary

Estimating a sample size is a compromise between increasing the precision of the estimate, and the need to remain ethical and reduce costs. All other things being equal, making a sample size four times as large makes the confidence interval half as wide. This means that large gains in precision are made by increasing small sample sizes, but only small gains are made by increasing already-large sample sizes.

29.10 Quick review questions

  1. True or false: A larger sample size produces a more accurate estimate of the parameter, all other things being equal.
  2. True or false: A larger sample size produces a more random sample.
  3. True or false: We should always take the largest possible sample size.
  1. TRUE. The reason why larger sample are "better" is that they estimate the unknown population parameter with greater precision.
  2. FALSE. The size of the sample, and how the sample was obtained, are two different issues.
  3. FALSE. We also need to consider the cost (in terms of size and time) and ethical issues also.

29.11 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 29.1 Suppose we need to estimate a population mean (with \(95\)% confidence), using \(s = 1\).

  1. What size sample is needed to estimate the population mean within \(0.4\)?
  2. What size sample is needed to estimate the population mean within \(0.2\) (that is, the confidence interval will be half as wide as in the first calculation)?
  3. What size sample is needed to estimate the population mean within \(0.1\) (that is, the confidence interval will be a quarter as wide as in the first calculation)?
  4. To get an estimate half as wide, how many times more units of analysis are needed?
  5. To get an estimate a quarter as wide, how many times more units of analysis are needed?

Exercise 29.2 Suppose we need to estimate a difference between two population means (with \(95\)% confidence), using \(s = 8\).

  1. What size samples are needed to estimate the difference between the population means to within \(4\)?
  2. What size samples are needed to estimate the difference between the population means to within \(2\) (that is, the confidence interval will be half as wide as in the first calculation)?
  3. What size samples are needed to estimate the difference between the population means to within \(1\) (that is, the confidence interval will be a quarter as wide as in the first calculation)?
  4. To get an estimate half as wide, how many times more units of analysis are needed?
  5. To get an estimate a quarter as wide, how many times more units of analysis are needed?

Exercise 29.3 Mann and Blotnicky (2017) studied of the eating habits of university students in Canada (Sect. 23.3). They estimated the proportion of Canadian students that ate a sufficient number of servings of grains each day.

Suppose we wished to repeat the study but for New Zealand university students; that is, we seek an estimate of the population proportion of New Zealand students that eat a sufficient number of servings of grains each day (with \(95\)% confidence).

  1. What size sample is needed to estimate the proportion to give-or-take \(0.01\)?
  2. What size sample is needed to estimate the proportion to give-or-take \(0.02\)?
  3. What size sample is needed to estimate the proportion to give-or-take \(0.10\)?
  4. Do you think this study would be costly, in terms of time and money?

Exercise 29.4 We wish to estimate the population proportion of Australians that smoke.

  1. Suppose we wish our \(95\)% CI to be give-or-take \(0.05\). How many Australians would need to be surveyed?
  2. Suppose we wish our \(95\)% CI to be give-or-take \(0.025\); that is, we wish to halve the width of the interval above. How many Australians would need to be surveyed?
  3. How many times as many Australians are needed to halve the width of the interval?

Exercise 29.5 Tager et al. (1979) measured the lung capacity of 11-year-old girls in East Boston, using the forced expiratory volume (FEV) of the children (Exercise 24.3). Suppose we wished to repeat the study, and find a \(95\)% confidence interval for the mean FEV for 11-year-old Australian girls.

Since Australian and American children might be somewhat similar, we could use (as a first approximation) the standard deviation from that study: \(s = 0.43\) litres.

  1. What size sample is needed to estimate the mean to give-or-take \(0.02\) litres?
  2. What size sample is needed to estimate the mean to give-or-take \(0.05\) litres?
  3. What size sample is needed to estimate the mean to give-or-take \(0.10\) litres?
  4. Suppose we wished to find \(99\)% (not \(95\)%) confidence interval for the mean FEV for 11-year-old Australian girls, to give-or-take \(0.10\) litres. Would this sample size be larger or smaller than the sample size found for a \(95\)% confidence interval (also with give-or-take \(0.10\) litres)?
  5. Do you think this study would be costly, in terms of time and money?

Exercise 29.6 B. Williams and Boyle (2007) asked paramedics (\(n = 199\)) to estimate the amount of blood loss on four different surfaces. When the actual amount of blood spill on concrete was \(1000\) ml, the mean guess was \(846.4\) ml (with a standard deviation of \(651.1\) ml). For a different study:

  1. how many paramedics is needed to estimate the mean with an precision of give-or-take \(50\) ml?
  2. how many paramedics is needed to estimate the mean with an precision of give-or-take \(25\) ml?
  3. how many times greater does the sample size need to be to halve the width of the margin of error?

Exercise 29.7 Skypilot is a alpine wildflower native to the Colorado Rocky Mountains (USA). In recent years, a willow shrub has been encroaching on skypilot territory and, because willow often flowers early, Kettenbach et al. (2017) studied whether the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p. 6,965). Data for both species was collected at \(25\) different sites, so the data are paired by site. The 'first-flowering day' is the number of days since the start of the year (e.g., January \(12\) is 'day \(12\)') when flowers were first observed.

Suppose a similar paired study was to be conducted on skypilot growing in Sierra Nevada, California. Using the software output in Fig. 14.3:

  1. determine the sample size needed to estimate the mean difference in first-flowering day to within two days.
  2. determine the sample size needed to estimate the mean difference in first-flowering day to within three days.

Exercise 29.8 MacGregor et al. (1979) studied treating hypertension with Captopril. Patients had their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug. A pilot study showed that the difference between the two measurements had a standard deviation of about \(9\) mm Hg.

  1. Determine the sample size needed to estimate the mean reduction in systolic blood pressure to within \(2\) mm Hg.
  2. Determine the sample size needed to estimate the mean reduction in diastolic blood pressure to within \(1.5\) mm Hg.

Exercise 29.9 Agbayani, Fortune, and Trites (2020) studied gray whales (Eschrichtius robustus) and measured (among other things) the length of whales at birth. Summary information is shown in Table 27.4. Suppose another research study wanted to study sperm whales, which are approximately a similar size.

  1. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, to within \(0.15\) m.
  2. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, to within \(0.10\) m.
  3. Determine the sample size needed to estimate the difference between the mean lengths for female and male goldfish at birth, to within \(1\) mm.

Exercise 29.10 Suppose researchers are trialling a new drug to reduce the recovery time (compared to standard treatments) after contracting double pneumonia. They conduct a pilot study, and find the standard deviation of the duration of the symptoms, in both groups, is about \(s = 1.25\) days.

  1. What size sample is needed to estimate the difference between the mean recovery times between the two treatments to within \(1\) day.
  2. What size sample is needed to estimate the difference between the mean recovery times between the two treatments to within \(0.5\) days.