Preface

This study manual is specially written for the honors students of engineering discipline like CSE, EEE and Textile.

Introduction to statistics

Probability

Random variable (r.v)

Some special discrete random variables

Bernoulli r.v

  • PMF: \(P(X=x)=f(x)=p^x(1-p)^{1-x}; \ \ x=0,1\)
  • Mean: \(\mu=E(X)=p\)
  • Variance: \(\sigma^2 =E(X-\mu)^2=E(X^2)-\mu^2=p(1-p)\)

Binomial r.v

  • Consider an experiment of tossing a biased coin 3(number of trials,
    1. times.
  • Tosses are independent, each toss has only TWO Outcomes-Head (Success) and Tail (Failure) [this type of trial is called the Bernoulli Trial].
  • Suppose, \(P(H)=p\) and remain constant in each toss, consequently, \(P(T)=1-p=q\) (let).

Suppose, \(X=\# \ \ of\ \ head\ \ (successes)\ \ in\ \ 3\ \ tosses\)

Now, what is the probability that, we will have exactly 2 heads (success) in 3 tosses?

That is, \(P(X=2)=?\)

Now, this can happen in the following ways:

\[P(X=2)=P(HHT)+P(HTH)+P(THH)\] \[=P(H)P(H)P(T)+P(H)P(T)P(H)+P(T)P(H)P(H)\] [Since tosses are independent]

\[=p.p.q+p.q.p+q.p.p\] \[=p^2 q+p^2 q+p^2 q=3p^2 q\] \[\therefore P(X=2)=\binom{3}{2}p^2q^{3-2} \] If, \(p=0.6\) is given, then we can easily compute \(P(X=2)=f(2)\). Now, if we repeat the toss 10 times \((n=10)\), with \(P(H)=p\), what is the value of \(P(X=3)=f(3)\)?

So, for \(n\) independent Bernoulli trials with a constant probability of success, \(p\), the probability mass function (PMF) of the random variable, \(X\)=# of successes in \(n\) trials is given below:

  • PMF:\(P(X=x)=f(x)=\binom{n}{x} p^x (1-p)^x \ \ ;x=0,1,2,...,n\)

  • CDF: \(P(X\le x)=F(x)=f(0)+f(1)+...+f(x)\)

  • Mean:\(\mu=E(X)=np\)

  • Variance:\(\sigma^2 =np(1-p)\)

  • We write \(X\sim Binom(n,p)\)

  • \(n\) and \(p\) are said to be the parameters of the Binomial distribution.

N.B: \(f(x)=F(x)-F(x-1)\) i.e \(f(3)=F(3)-F(2)\)

Probability plot of binomial r.v for different values of \(p\) and shape characteristics

Finding Binomial probability manually

Suppose, \(X\sim Binom(n,p)\); where \(n=5\) and \(p=0.6\). Find, (i) \(P(X=2)\) (ii) \(P(X \le 2)\) (iii) \(P(X\ge3)\).

Solution:

PMF of \(X\): \(P(X=x)=f(x)=\binom{5}{x} 0.6^x (0.4)^{5-x} \ \ ;x=0,1,2,...,5\)

(i) \(P(X=2)=f(2)=\binom{5}{2} 0.6^2 (0.4)^{5-2}=0.2304\)

(ii) \(P(X \le 2)=F(2)=f(0)+f(1)+f(2)=0.0102+0.0768+0.2304=0.3174\)

(iii) \(P(X \ge 3)=f(3)+f(4)+f(5)=0.6826\)

Alternative:(iii)

\(P(X \ge 3)=1-P(X< 3)=1-P(X \le 2)=1-F(2)=1-0.3174=0.6826\)

Finding Binomial probability using Binomial Table

In the end of any Statistics book there are some Probability Distribution Table. We can use these table to compute the required probability for specific values of the parameters of certain probability distribution. Here I share the 1st page of Binomial distribution table from Baron (2019).

Suppose, \(X\sim Binom(n,p)\); where \(n=5\) and \(p=0.6\). Find, (i) \(P(X=2)\) (ii) \(P(X \le 2)\) (iii) \(P(X \ge 3)\) using Table.

Solution:

(i) \(P(X=2)=f(2)=F(2)-F(1)=0.3174-0.0870=0.2304\).

(ii) \(P(X\le 2=F(2)=0.3174\)

(iii) \(P(X \ge 3)=1-P(X< 3)=1-F(2)=1-0.3174=0.6826\)

Exercise (Walpole et al., 2017)

5.9 In testing a certain kind of truck tire over rugged terrain, it is found that 25% of the trucks fail to complete the test run without a blowout. Of the next 15 trucks tested, find the probability that: (a) from 3 to 6 have blowouts; (b) fewer than 4 have blowouts; (c) more than 5 have blowouts.

Solution:

Let, X= number of trucks that have blowouts

Given, \(n=15; \ \ p=Pr(blowout)=0.25; \ \ q=1-p=0.75\). Hence, \(X\sim Binom(n=15, p=0.25)\), that is:

\[ P(X=x)=f(x)=\binom{15}{x}(0.25)^x (0.75)^{15-x}; x=0,1,2,...,15. \]

Now,

(a) \(P(3\le X\le 6)=f(3)+f(4)+f(5)+f(6)\)=0.225+0.225+0.165+0.092=0.707.

Alternative: \(P(3\le X\le 6)=F(6)-F(2)=0.943-0.013=0.707\) [from Table]

(b) \(P(X<4)=f(0)+f(1)+f(2)+f(3)\)=0.013+0.067+0.156+0.225=0.461.

Alternative: \(P(X< 4)=F(3)=0.461\) [from Table A2]

(c) \(P(X > 5)=1-P(X \le 5)=1-F(5)\)=`r 1-round(pbinom(5,15,0.25),3)``

5.12 A traffic control engineer reports that 75% of the vehicles passing through a checkpoint are from within the state. What is the probability that fewer than 4 of the next 9 vehicles are from out of state?

5.16 Suppose that airplane engines operate independently and fail with probability equal to 0.4. Assuming that a plane makes a safe flight if at least one-half of its engines run, determine whether a 4-engine plane or a 2-engine plane has the higher probability for a successful flight.

5.25 Suppose that for a very large shipment of integrated-circuit chips, the probability of failure for any one chip is 0.10. Assuming that the assumptions underlying the binomial distributions are met, find the probability that at most 3 chips fail in a random sample of 20.

Exercise (Montgomery & Runger, 2014)

3-93 Let \(X\) be a binomial random variable with \(p = 0.1\) . and \(n = 10\). Calculate the following probabilities from the binomial probability mass function and from the binomial table in Appendix A and compare results. (a) P(X≤2) (b) P(X>8) (c) P(X = 4) (d) P(5≤X≤7)

3-115 The probability that a visitor to a Web site provides contact data for additional information is 0.01. Assume that 1000 visitors to the site behave independently. Determine the following probabilities: (a) No visitor provides contact data. (b) Exactly 10 visitors provide contact data. (c) More than 3 visitors provide contact data

Exercise (Baron, 2019)

3.21. A lab network consisting of 20 computers was attacked by a computer virus. This virus enters each computer with probability 0.4, independently of other computers. Find the probability that it entered at least 10 computers.

3.22. Five percent of computer parts produced by a certain supplier are defective. What is the probability that a sample of 16 parts contains more than 3 defective ones?

And so on….

Poisson r.v

The number of events occur randomly in an interval or in a region usually follows Poisson distribution. A famous French mathematician Sim´eon-Denis Poisson (1781–1840) first introduced this distribution.

Example

The Poisson distribution may be useful to model variables like:

  • The no. of calls arrive at a customer care in 15 minites
  • The no. of arrivals at a car wash in one hour
  • The no. of repairs needed in 10 miles of highway
  • The no. of leaks in 100 miles of pipeline etc.

Usually Poisson distribution is used to evaluate probability of “Rare” event.

The probability mass function of the Poisson random variable \(X\), representing the number of outcomes occurring in a given time interval denoted by \(t\), is:

  • PMF: \(P(X=x)=f(x)=\frac{e^{-\lambda t}(\lambda t)^x}{x!}; \ \ x=0,1,2,...,\infty.\) Here, \(\lambda\) is called arrival rate or average number of occurrences in long-run. And only parameter of Poisson distribution.

  • Mean: \(\mu=E(X)=\lambda t\)

  • Variance: \(\sigma^2 =\lambda t\)

  • We write: \(X\sim Pois(\lambda t)\)

N.B: The mean and variance of Poisson random random variable are identical. This is the unique property of Poisson r.v.

Probability of plot of poisson r.v for different values of \(\lambda\) (for a fixed interval \(t=1\))

We can see that, for small \(\lambda\) the distribution of Poisson r.v is positively skewed and as the value of \(\lambda\) increases the distribution tends to symmetry.

Finding Poisson probability

Consider a discrete r.v say \(X\sim Pois(\lambda t)\). Suppose, \(\lambda =1.5\) and \(t=2\). Find, (i) P(X=4) (ii)P(X≤2) (iii)P(X≥3).

Solution:

PMF of \(X\): \(P(X=x)=f(x)=\frac{e^{-\lambda t}(\lambda t)^x}{x!}; x=0,1,...,\infty.\)

(i) For \(t=2\) , \(\mu=\lambda t=1.5*2=3\).

So, \(P(X=4)=f(4)=\frac{e^{-3}(3)^4}{4!}\)=0.168.

(ii) \(P(X\le2)=\sum_{x=0}^{2}f(x)=\sum_{x=0}^{2}\frac{e^{-3}(3)^x}{x!}=e^{-3}[\frac{3^0}{0!}+\frac{3^1}{1!}+\frac{3^2}{2!}]\)=0.423.

(iii) \(P(X \ge 3)=1-P(X< 3)=1-P(X\le 2)=1-0.423=0.577\)

Finding Poisson probability using Table

We can use Poisson distribution table to compute Poisson probabilities. Here I share the 1st page of Poisson distribution table from Baron (2019).

Consider a discrete r.v say \(X\sim Pois(\lambda t)\). Suppose, \(\lambda =1.5\) and \(t=2\). Find, (i) P(X≤2) (ii)P(X=4)

Solution by using Table:

For \(t=2\) , \(\mu=\lambda t=1.5*2=3\).

(i) \(P(X \le 2)=F(2)=0.423\)

[For x=2 and \(\mu \ \ or \ \lambda =3;\) corresponding probability in Table A3 is 0.423]

(ii) \(P(X=4)=f(4)=F(4)-F(3)=0.815-0.647=0.168\)

Example 5.17:(Walpole et al., 2017) During a laboratory experiment, the average number of radioactive particles passing through a counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given millisecond?

Example 5.18:(Walpole et al., 2017) Ten is the average number of oil tankers arriving each day at a certain port. The facilities at the port can handle at most 15 tankers per day. What is the probability that on a given day tankers have to be turned away?

Example 3.8:(Pishro-Nik, 2014) The number of emails that I get in a weekday can be modeled by a Poisson distribution with an average of 0.2 emails per minute.

  1. What is the probability that I get no emails in an interval of length 5 minutes?

  2. What is the probability that I get more than 3 emails in an interval of length 10 minutes?

Solution

Let, \(X\)=number of emails that I get in a given interval.

Given, \(\lambda =0.2 \ \ min^{-1}\).

\(X\) will follow \(Pois(\lambda t)\)

1. In this case \(\mu=\lambda t=0.2*5=1\). So, \(P(X=0)=f(0)=e^{-\mu}=e^{-1}=0.3679\).

2. In this case \(\mu=\lambda t=0.2*10=2\). So,\(P(X>3)=1-P(X\le 3)=1-F(3)=1-0.857=0.143\). [From Table A3]

Approximation of Binomial Distribution to Poisson

When,

  • \(p \rightarrow0\) (Success rate is very low);

  • \(n\rightarrow \infty\) (Number of trials is very large);

Then Binomial distribution can be approximated by Poisson distribution.

  • Mathematically, \(Binom (x; n,p)\approx Pois(\lambda)\); where \(\lambda=np\).

N.B: In practical situation if \(n \ge 30\) and \(p\le 0.05\) ;hence \(q\ge 0.95\),then the approximation is close enough to use the Poisson distribution for binomial problems(Baron, 2019).

Example 5.20:(Walpole et al., 2017) In a manufacturing process where glass products are made, defects or bubbles occur, occasionally rendering the piece undesirable for marketing. It is known that, on average, 1 in every 1000 of these items produced has one or more bubbles. What is the probability that a random sample of 8000 will yield fewer than 7 items possessing bubbles?

Solution:

Let,\(X=\) number of bubbles in a piece of glass

Given, \(Pr(buuble \ \ occurs)=p=1/1000=0.001\) which is less than \(0.05\), and \(n=800\) which is greater than \(30\). So, the PMF of \(X\) can be approximated by Poisson distribution with

\[\lambda =np=8000*0.001=8\] that is \(X\sim Pois (\lambda=8)\)

According to question,

\(P(X<7)=f(0)+f(1)+...+f(6)=F(6)=0.313\) (Ans.)

[By using Table A3 ]

Exercise 5.87:(Walpole et al., 2017) Imperfections in computer circuit boards and computer chips lend themselves to statistical treatment. For a particular type of board, the probability of a diode failure is 0.03 and the board contains 200 diodes.

  1. What is the mean number of failures among the diodes? (Ans: \(\mu=np=200*0.03=6\))
  2. What is the variance?(Ans: \(\sigma^2=np(1-p)=200*0.03*(1-0.03)=5.82\))
  3. The board will work if there are no defective diodes. What is the probability that a board will work? Ans (c): The board will work if there are no defective diodes. So, P(The board will work)=\(P(X=0)=f(0)=e^{-\mu}=e^{-6}=0.0025\)

Continuous random variable

A fundamental difference separates discrete and continuous random variables in terms of how probabilities are computed. For a discrete random variable, the PMF \(f(x)\) provides the probability that the random variable assumes a particular value.

With continuous random variables, the counterpart of the probability function is the probability density function (PDF), also denoted by \(f(x)\). The difference is that the probability density function does not directly provide probabilities. However, the area under the graph of \(f(x)\) corresponding to a given interval does provide the probability that the continuous random variable x assumes a value in that interval.

So when we compute probabilities for continuous random variables we are computing the probability that the random variable assumes any value in an interval.

Definition

The function \(f(x)\) is said to be probability density function (PDF) for the continuous random variable \(X\), defined over the set of real numbers, if

  1. \(f(x)\ge0; all\ \ x\in R\)
  2. \(\int_{-\infty}^{+\infty} f(x)dx=1\)
  3. \(P(a< X < b)=\int_{a}^{b} f(x)dx\)

N.B: \(P(X=a)=0\) as well as \(P(X=b)=0\). So, \(P(X\le a )\) is same as \(P(X<a)\).

CDF of \(X\): By definition, CDF, \(F(x)=P(X\le x)= \int_{-\infty}^{x} f(x)dx\)

Therefore, \(f(x)=\frac{d}{dx} F(x)\).

Expectation and variance of continuous r.v

Example 3.11(Walpole et al., 2017) Suppose that the error in the reaction temperature, in \(^0C\), for a controlled laboratory experiment is a continuous random variable X having the probability density function

\[ f(x)=\frac{x^2}{3}; -1<x<2. \]

  1. Verify that \(f(x)\) is a density function.
  2. Find \(P(0< X \le 1)\).

Example 3.12(Walpole et al., 2017) Find \(F(x)\), and use it to evaluate \(P(0 < X\le1)\).

H.W: Find E(X) and Var(X) where,\(f(x)=\frac{x^2}{3}; -1<x<2\).

Exercise 3.29(Walpole et al., 2017) An important factor in solid missile fuel is the particle size distribution. Significant problems occur if the particle sizes are too large. From production data in the past, it has been determined that the particle size (in micrometers) distribution is characterized by \[ f(x)=3x^{-4}; x> 1 \]

  1. Verify that this is a valid density function.
  2. Evaluate \(F(x)\).
  3. What is the probability that a random particle from the manufactured fuel exceeds 4 micrometers?

Exercise 3.69(Walpole et al., 2017) The life span in hours of an electrical component is a random variable with cumulative distribution function \[ F(x)=1-e^{-\frac{x}{50}}; x>0 \]

  1. Determine its probability density function (PDF).

  2. Determine the probability that the life span of such a component will exceed 70 hours.

Some special continous random variables

Exponential r.v

In many situations, such as when modeling waiting times, inter-arrival times, the lifespan of hardware, breakdown times, and the intervals between phone calls, the exponential distribution is utilized. The time (suppose \(T\)) between rare events in Poisson process with arrival rate \(\lambda\) (number of arrival per unit time) can be treated as exponential r.v.

  • PDF: \(f(t)=\lambda e^{-\lambda t}; t> 0\)

  • CDF: \(F(t)=P(T\le t)=P(T< t)=1-e^{-\lambda t}; t> 0\)

    Hence, \(P(T> t)=1-P(T\le t)=1-F(t)=e^{-\lambda t}\)

  • Mean: \(E(T)=\frac{1}{\lambda}\)

  • Variance: \(Var(T)=\frac{1}{\lambda^2}\)

We write, \(T\sim Exp(\lambda)\)

The quantity \(\lambda\) is a parameter of Exponential distribution, and its meaning is clear from \(E(T) = \frac{1}{\lambda}\) . If T is time, measured in minutes, then \(\lambda\) is a frequency, measured in \(min^{-1}\). For example, if arrivals occur every half a minute, on the average, then \(E(T) = 0.5\)min and \(\lambda=2\), saying that they occur with a frequency (arrival rate) of 2 arrivals per minute. This \(\lambda\) has the same meaning as the parameter of Poisson distribution(Baron, 2019).

Example 4.5(Baron, 2019) Jobs are sent to a printer at an average rate of 3 jobs per hour.

  1. What is the expected time between jobs?

  2. What is the probability that the next job is sent within 5 minutes?

Solution: Given, number of jobs per hour, \(\lambda=3\ \ hr^{-1}\) per hour. Let, \(T\)=time elapsed between jobs (hour).

So, \(T\sim Exp(\lambda)\)

  1. \(E(T)=\frac{1}{\lambda} hr=\frac{1}{3} hr=20\ \ mins\);
  2. Here, \(5 \ \ mins=\frac{5}{60} hr=\frac{1}{12} hr\) We know, \(F(t)=1-e^{-\lambda t}; t>0\)

So, \(P(T<5 \ \ mins)=P(T<\frac{1}{12})=F(\frac{1}{12})=1-e^{-3*\frac{1}{12}}=0.22\)

Example 4.58 (Navidi, 2011) A radioactive mass emits particles according to a Poisson process at a mean rate of 15 particles per minute. At some point, a clock is started. What is the probability that more than 5 seconds will elapse before the next emission? What is the mean waiting time until the next particle is emitted?

Solution

Let, $T=elapsed   time   before   the   next   emission (in   second) $

Given, \(\lambda=15 min^{-1}=\frac{15}{60} s^{-1}=0.25 s^{-1}\) and

\(T\sim Exp(\lambda)\);

\(P(T\le t)=F(t)=1-e^{-\lambda t}\)

P(more than 5 seconds will elapse before the next emission)=\(P(T>5)=e^{-\lambda * 5}=e^{-0.25*5}=0.2865\)

Mean waiting time, \(E(T)=\frac{1}{\lambda} s=\frac{1}{0.25}s=4s\)

Lack of Memory Property

If \(T \sim Exp(\lambda)\), and \(t\) and \(s\) are positive numbers, then

\[P(T> t+s| T> s)=P(T> t)\] The probability that we must wait additional \(t\) units, given that we have already waited \(s\) units, is the same as the probability that we must wait \(t\) units from the start. The exponential distribution does not “remember” how long we have been waiting.

In particular, if the lifetime of a component follows the exponential distribution, then the probability that a component that is \(s\) time units old will last an additional \(t\) time units is the same as the probability that a new component will last \(t\) time units.

In other words, a component whose lifetime follows an exponential distribution does not show any effects of age or wear (Navidi, 2011).

But if the failure of the component is a result of gradual or slow wear (as in mechanical wear), then the exponential does not apply and either the gamma or the Weibull distribution (see Walpole et al. (2017), Section 6.10) may be more appropriate.

Example 4.59(Navidi, 2011) The lifetime of a particular integrated circuit has an exponential distribution with mean 2 years. Find the probability that the circuit lasts longer than three years.

Example 4.60(Navidi, 2011) Refer to Example 4.59. Assume the circuit is now four years old and is still functioning. Find the probability that it functions for more than three additional years (Hints: Apply Lack of Memory Property).

Exercises for Section 4.7(Navidi, 2011)

1.Let \(T ∼ Exp(0.45)\). Find \(\mu_T, \sigma^2_T, P(T>3)\) and the median of \(T\).

2.The time between requests to a web server is exponentially distributed with mean 0.5 seconds.

  1. What is the value of the parameter λ?
  2. What is the median time between requests?
  3. What is the standard deviation?
  4. What is the 80th percentile?
  5. Find the probability that more than one second elapses between requests.
  6. If there have been no requests for the past two seconds, what is the probability that there more than one additional second will elapse before the next request?

8.A radioactive mass emits particles according to a Poisson process at a mean rate of 2 per second. Let T be the waiting time, in seconds, between emissions.

  1. What is the mean waiting time?
  2. What is the median waiting time?
  3. Find \(P(T > 2)\).
  4. Find \(P(T < 0.1)\).
  5. Find \(P(0.3< T < 1.5)\).
  6. If 3 seconds have elapsed with no emission, what is the probability that there will be an emission within the next second? (Use Lack of Memory Property)

Normal or Gaussian r.v

Poisson process: Use of Exponential and Poisson random variable (concise)

References

Baron, M. (2019). Probability and statistics for computer scientists (Third edition). CRC Press, Taylor & Francis Group.
Montgomery, D. C., & Runger, G. C. (2014). Applied statistics and probability for engineers (Sixth edition). John Wiley; Sons, Inc.
Navidi, W. C. (2011). Statistics for engineers and scientists (3rd ed). McGraw-Hill.
Pishro-Nik, H. (2014). Introduction to probability, statistics, and random processes. Kappa Research, LLC.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (Eds.). (2017). Probability & statistics for engineers & scientists: MyStatLab update (Ninth edition). Pearson.