An intuitive real life example of Poisson distribution and how to simulate it in R?

Pollard Willow – Image Source: The Cleveland Museum of Art

Unlike the binomial distribution in which samples of size n are taken from a population with proportion p, the Poisson distribution, focuses on the number of discrete occurrences over some interval.

The Poisson distribution describes the occurrence of rear events. For e.g. series accidents at a chemical plan are rear, and the number per month might be described by the Poisson distribution.

The Poisson distribution is often used to describe number of random arrivals per some time interval. For e.g. consider number of customers arriving during lunch hour at a bank located in the central city area in a large city.

In the field of business, models used in queuing theory (theory of waiting lines) are usually based on the assumption that the Poisson distribution is the proper distribution to describe random arrival rates over a period of time. In the area of statistical quality control, the Poisson distribution is the basis for the c control chart used to track the number of non-conformance per item or unit.

In this tutorial we will look at Poisson distribution characteristics, build Poisson distribution formula and look at some R functions to calculate the probability of occurrence using Poisson distribution.

Characteristics of Poisson distribution

The Poisson distribution has the following characteristics.

  • It is discrete distribution
  • It describes rear events
  • Each occurrence is independent of the other occurrences.
  • It describes discrete occurrences over interval
  • The occurrence in each interval can range from zero to infinity
  • The expected number of occurrences must hold constant throughout the experiment.

Let’s consider again the example – number of customers arriving during lunch hour at a bank located in the central city area in a large city. You are interested in number of customers who arrive each minute. Does this situation matches with the characteristics of Poisson distribution?

  • It is discrete distribution: The number of customers arrive is discrete value -0 customer arrive, 1 customer arrive and so on.
  • It describes rear events: The probability is virtually zero that two customers will arrive at time interval of 0.01 seconds.
  • Each occurrence is independent of the other occurrences.: The arrival of one customer in any one-minute interval has no effect on (independent of) the arrival any other customer in any other one minute interval.
  • It describes discrete occurrences over interval: time interval is 1 minute
  • The occurrence in each interval can range from zero to infinity: Theoretically, there could be zero to infinite number of customers arriving during 1 minute time interval.
  • The expected number of occurrences must hold constant throughout the experiment: On average if there are 5 customers arriving in a minute, this rate should be same for all 1 minute intervals during the experiment. (this can be assumed, though practically not possible)

Examples of Poisson distribution include:

  • Number of telephone calls per minute at a small business
  • Number of hazardous accidents in a quarter in manufacturing plant
  • Number of defects in a shift

Poisson distribution formula

If a Poisson distributed phenomenon is studied over a long period of time, a long run average can be determined. This average is denoted by lambda.

Each Poisson problem contains a lambda value from which the probabilities of particular occurrences are determined. A Poisson distribution can be determined by lambda alone. The Poisson formula is used to compute the probability of occurrences over an interval for a given lambda value.

\[P(x) = \frac{\lambda x e^{-\lambda}}{x!}\]

Where,
x = 0,1,2… the number of occurrences per interval for which probability is being computed
lambda = long-run average
e = 2.7182

The lambda value must be hold constant throughout the experiment. The analyst must be careful not to apply a given lambda to intervals for which lambda changes. For example, the average number of customers arriving at restaurant during 1-hour interval will vary throughout the day. The different hours of a day might produce different lambdas. The analyst should be specific in describing the intervals for which lambda is being used.

Poisson distribution working example

Suppose bank customers arrive randomly on weekday afternoons at an average of 3.2 customers every 4 minutes. What is the probability that exactly 5 customers arriving in 4 minute interval on weekday afternoon?

The lambda for this problem is 3.2 customers every 4 minutes. The value of x is 5 customers per 4 minutes. The probability of 5 customers randomly arriving during a 4-minute interval when the long-run average has been 3.2 customers per 4 minute interval is:

\[P(x) = \frac{3.2^5 e^{-3.2}}{5!} = 0.1140\]

If the bank averages 3.2 customers every 4 minutes, the probability of exactly 5 customers arriving during any one 4-minute interval is 0.1140.

Poisson distribution formula in R

In R, dpois(x,lambda) function is used to calculate Poisson distribution.

dpois(5,3.2)

We can create Poisson probability distribution for different number of customers arriving during any 4 minute interval. (keeping same lambda value).

probs <- dpois(0:10,3.2)
barplot(probs,names.arg = 0:10,xlab="number of customers")
Poisson Probability distribution

You can observe the probabilities associated with arrival of 2-4 customers per 4 minute interval are much greater than probabilities associated with arrival of 5-10 customers per 4 minute interval. This indicates a slightly right-skewed distribution.

Given the Poisson distribution, we may find the probability that the number of customers arriving per 4 minute interval is less than or equal to 5, by using R’s cumulative poisson distribution function ppois().

ppois(5,3.2)

Thus, there is 89% chance that at most 5 customers will arrive per 4 minute interval for given lambda value. The cumulative probability distribution can be graphically shown as:

barplot(ppois(0:5,3.2),names.arg=0:5,xlab = "number of cumulative 
        customers arriving per 4-minute interval")
Cumulative Poisson distribution

Summary

The Poisson distribution is used to calculate the random arrival rates over period of time. Each Poisson problem contains a lambda value from which the probabilities of particular occurrences are determined. We used Poisson distribution to calculate customer arrival rate in bank per unit time interval.

In R poisson distribution is calculated using dpois() function. The cumulative poisson distribution is calculated using ppois() function.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top