M241 Probability
Chapter 3 Random Variables
Section 3.1
Introduction
-
Random Variable: A function defined
on the outcome space O of some random experiment.
-
Generally, the ones we consider will
be real-valued functions. (However could be vector-valued or complex-valued,
for example.)
-
Normally denote random variables by
X, Y, or Z.
-
Range of the random variable is the
possible
-
We have been working with random variables
already.
-
Some examples that we have already considered:
(See Table 1 page 140)
-
Roll a die. Let X = Number that comes
up. Range of X = {1,2,3,4,5,6}
-
Toss a coin 4 times. Let Y = the number
of heads in 4 tosses Range of Y = {0,1,2,3,4}
-
Pick a card at random from a deck of
52 cards. Let Z = the suit value of the card. Range of Z is {Club, Diamond,
Heart, Spade}. If we wanted to make Z real-valued we could just associate
a number with each suit value ( 1 = Club, 2 = Diamond, etc.)
Distribution of a Random
Variable X
-
For now we are just considering discrete
random variables. These are random variables whose range is discrete (i.e.
either finite or countably infinite).
-
An example of a finite range: { 1, 2,
3, 4, 5, 6 }
-
An example of a countable infinite range:
{ 0,1, 2, 3, .....}
-
An example of a continuous (non-discrete)
range: [0,1] (The set of all real numbers between 0 and 1 inclusive).
-
The density function for a discrete
random variable is the function p(a) = P(X = a), defined
for all values a in the range of X.
-
Example: Let X = number showing when
a die is rolled.
-
Then for any subset B of the range
of X is
-
Hence by specifying the density function
we uniquely define the distribution of X.
-
Often use x as a possible generic
value of X. Can use any dummy variable however (e.g. k) instead.
-
Example: Toss a coin twice. Let X =
number of heads.
Range of X is {0,1,2}
p(0) = 1/4, p(1) = 1/2, p(2) = 1/4.
In table form the distribution of X is
| x |
0 |
1 |
2 |
| P(X = x) |
1/4 |
1/2 |
1/4 |
-
Example: Let X be the sum of the
two rolls of a dice
-
Range of x is 2 through 12.
-
p(2) = 1/36; p(3) = (2/36), p(4) = 3/36,
p(5) = 5/36, etc.
-
Given a random variable X and a non-random
function g defined on the range of g, Y = g(X) defines a new random variable.
-
Example: Toss a coin twice. Let X =
number of heads. Let g(x) = x2. Then Y = g(X) is a random variable
with Let h(x) = |x -1| and Z = h(X). Calculate the distribution function
for both Y and Z
Joint Distributions
-
Suppose we have two different random
variables X and Y defined on the same outcome space. The joint density
function is
p(x,y) = P(X = x, Y = y).
-
Example Three tickets numbered
1, 2, 3 placed in a box. Two are drawn one at a time without replacement.
-
Let X = the number on the first draw
and Y = the number of the second draw. Make a table to indicate the possible
values for p(x,y) (See the table on page 145).
-
Note that the sums of the rows represent
marginal probability distribution of X; and the sums of the columns represent
the marginal probability distribution of Y.
-
In other words, given a joint density
function for two random variables X and Y defined on the same outcome space,
P(x,y)
Important Notes:
-
Two random variables X and Y have the same distribution if P(X = a) = P(Y
= a) for all a in their range (which must be the same)
-
Two random variables X and Y are equal if X = Y for every outcome in the
outcome space.
If we toss a coin twice and X = number of heads, and Y = number of tails,
then X and Y have the same distribution, but they are clearly not equal.
Functions of two (or more random variables). We can form new random
variables as functions of two or more random variables. Examples X + Y,
X - Y, XY, min(X,Y), max(X,Y)
Example 3:
-
3 tickets numbered 1, 2, and 3 in box. Two tickets drawn from the box without
replacement. X = first draw, Y = second draw. Let S = X + Y.
-
Same problem without replacement.
Look at the distributions demonstrated in figure 1, page 148
-
To calculate the probability of any event defined in terms of X and
Y: Add up the probabilities p(x,y) = P(X = x, Y = y) over all pairs
(x,y) which are part of the event.
-
Example: For the joint distribution in table 3 (Drawing 2 tickets from
a box of 3 tickets numbered 1, 2, and 3 without replacement), calculate
the probability that X < Y.
P(X < Y) = p(1,2) + p(1,3)+ p(2,3) = 1/3
-
Finding the distribution of a function of two random variables, g(X,Y):
-
Examples of common functions of interest: X+Y, X-Y, XY, min(X,Y) max(X,Y)
-
Finding the distribution of g(X,Y): To calculate P(g(X,Y) = z ), simply
add up the probabilities p(x,y) for all (x,y) where g(x,y) = z.
-
Best seen by an example: (Example 3) For the distribution in table 3 with
g(X,Y) = X + Y.
Example: Sum of the draws without replacement:
Range of X + Y is {3, 4, 5}
P( X + Y = 3) = p(1,2) + p(2,1) = 1/6 + 1/6 = 1/3
P( X + Y = 4) = p(1,3) + p(3,1) + p(2,2) = 1/6 + 1/6 + 0 = 1/3
P( X + Y = 5) = p(2,3) + p(3,2) = 1/6 + 1/6 = 1/3
Example : Sum of the draw with replacement
Range of X + Y is {2, 3, 4, 5, 6}
Table 3 page 145 is for sampling without replacement.
With replacement the table would be:
|
1
|
2
|
3
|
distn of Y
|
|
3
|
1/9
|
1/9
|
1/9
|
1/3
|
|
2
|
1/9
|
1/9
|
1/9
|
1/3
|
|
1
|
1/9
|
1/9
|
1/9
|
1/3
|
|
dist of X
|
1/3
|
1/3
|
1/3
|
|
-
P( X + Y ) = 2 = p(2,2) = 1/9
-
P( X + Y = 3) = p(1,2) + p(2,1) = 1/9 + 1/9 = 2/9
-
P( X + Y = 4) = p(1,3) + p(3,1) + p(2,2) = 1/9 + 1/9 + 1/9 = 1/3
-
P( X + Y = 5) = p(2,3) + p(3,2) = 1/9 + 1/9 = 2/9
-
P(X + Y = 6) = p(3,3) = 1/9
For tossing die, Distribution of XY (X times Y)
What is the range of XY? What is the distribution of XY?
Example 4: Minimum and maximum.
Pick three digits at random without replacement.
Let X = minimum digit. Y = maximum digit.
How many different ways can you choose 3 digits from 10 (0 through
9).
Binomial coefficient = 120.
What is the probability P(X = 4, Y = 7)? Digits could be (4,5,7)
or (4,6,7). -- 2 ways.
P(X = 4, Y = 7) = 2/120 = 1/60
What is the probability P(X = 3, Y = 8)? Digits could be (3,4,8),
(3,5,8), (3,6,8) or (3,7,8). -- 4 ways.: 4/120 = 1/30
In general the probability P(X = x, Y = Y) =( y - x -1) /120
Conditional Distributions of Y Given X = x:
-
Rearranged we have the Multiplication Rule for calculating P(Y = y,
X = x)
-
Two random variables are independent if
-
Equivalently, two random variables are independent if
Several Random Variables (i.e. more than 2)
Extension to functions of several random variables X1, X2, … Xn
-
Multinomial Distribution. Instead of 2 outcomes have several possible
outcomes in each independent trial. Ni = number of outcomes for of type
i.
-
Example 7: Roll a die 10 times, record number of fours, fives, and
sixes.
Symmetry of a random variable.
-
Symmetric about 0: A random variable is symmetric about 0 if
P(X = -x) = P(X = x) for all x.
-
Symmetric about b: A random variable is symmetric about b if
P(X = b+x) = P(X = b-x) for all x. (i.e. if X-b is symmetric about
0)
Section 3.2 Expectation
Definition: The expected value of a random variable X, denoted
by E(X), is defined to be
-
Other terms we use for expected value: mean of X, expectation of X.
-
This is the average of all possible values of X weighted by their probabilities.
-
Example 6: Rolling a die. X = number rolled. Each value 1 through
6 occurs with probability 1/6. E(X) = 3.5
-
Example 3. Indicator random variables.
-
Important property of expectation: Let X be a random variable representing
the outcome of a single trial (like rolling a die). Then, when repeating
the same independent trial many times, the long-run average observed
values of the outcomes should be approximated by E(X).
Fair Bets
Addition Rule for Expectation
-
E(X + Y) = E(X) + E(Y).
Note: This does NOT require X and Y to be independent!!!
-
This generalizes to an arbitrary number of random variables:
E(X1 + X2 + … + Xn) = E(X1)
+ E(X2) + … + E(Xn)
(See text page 177 for proof)
-
Apply to Example 5. Tn = Sum of numbers rolled on n dice. E(Tn)
= E(X1 + X2 + … + Xn) = E(X1)
+ E(X2) + … + E(Xn) = n*3.5
The Method of Indicators
-
Example 6: Working components. N components each working
with probability pj
Let X = number of components working. Calculate E(X). Let Ij be
the indicator random variable for the event that component j works.
Then X = I1 + I2 + … + In , and E(X)
= p1 + p2 + … + pn
-
Example 7: Mean of the binomial distribution. X = number
of successes in n independent trials. Let Ij be the indicator
random variable for the event that success occurs on the j'th trial.
E(X) = p + p + …+ p = np.
-
Example 8: Let X be the number of aces in a 5-card poker hand. Then
X = I1 + I2 + I3 +I4 + I5
, where Ij is the probability that the i'th card in the hand is an Ace.
E(Ij) = p(ace on jth card) = 4/52. E(X) = 5*4/52. Much easier
than calculating using P(X=x)
-
Apply to exercise 3 and 6 from this section.
Tail Sum Formula for Expectation:
-
For X a random variable with possible values { 0, 1, 2, …, n}
Proof: See text top of page 172
-
Example 9: Application of Tail Sum Formula to finding the Expectation
of the minimum of four rolls of a die. E(M) = P(M>= 1) + P(M >= 2) + …
+ P(M >= 6)
Easier than calculating directly with formula for E(M).
Markov's Inequality
-
This can be used to give a rough estimate on tail probabilities for X if
we know E(X)
Expectation of a Function of X
-
From this formula it is easy to show:
-
E(cX) = cE(X)
-
E(aX + b) = aE(X) + b
-
E( c ) = c (where a, b, and c are constants)
-
E(Xk) is called the k'th moment of X.
-
Important to remember that in general E(X2) is not equal to
[E(X)]2
(See example of the uniform distribution on {-1, 0, 1}
Expectation of functions of two or more random variables:
This allows for an easy proof of the addition rule
E(X+Y) = E(X) + E(Y) by using the above definition with g(X,Y) = X
+ Y
Multiplication Rule for Expectation of independent random variables:
-
If X and Y are independent then E(XY) = E(X)E(Y)
Application: Expected winnings in lotteries.
For an interesting example look at the PowerBall lottery web page:
OREGON LOTTERY -
Web Center: http://www.oregonlottery.org/night/power.htm
Section 3.3 Standard Deviation and Normal Approximation
Variance and Standard deviation give a measure of the deviation of a random
variable from its mean. (I.e. how spread out is the distribution.)
Definition
Variance of X is denoted by Var(X). Standard deviation is denoted by
SD(X). They are defined as follows:
Of course two random variables with the same distribution will always
have the same mean, variance and standard deviation
The equivalent computational form for Var(X) is
This is easily derived by expanding the quadratic on the right hand
side of the original and using the additive property of expectation.
Examples of computing the Variance
Example 1. Random Sampling
n tickets in a box with numbers. Draw a ticket at random. Let X be the
number on the ticket.
-
Example 2: Indicators. Let X be an indicator of the event with probability
p.
-
Example 3. Let X be the number that shows up when fair die is rolled.
Scaling and Shifting
-
For constants a and b, Var (aX+b) = a2Var(x) and SD(aX+b) =
|a| SD(X)
-
This is easily proved by applying the working definition for variance to
aX + b
-
Worth looking at:
-
Example 4: Conversion of r.v. X in Celsius to r.v Y = 9/5X+ 32 in Fahrenheit
-
Example 5: Successes and failures. Let X be the number of successes in
n trials and Y be the number of failures. Since Y = n - X. E(Y) = n- E(X)
and SD(Y) = SD(X)
-
X* : The Standardization of a random variable X.
-
Suppose X has mean m and variance s.
Then
is the standardization of X and has mean 0 and standard deviation 1.
-
A very important application of this is when X is has normal (or approximately
normal) distribution. Then X* has a normal distribution with mean 0 and
standard deviation 1.
-
We can apply this to calculating probabilities that X lies within a particular
range, by normalizing it and using our Normal (0,1) table.
-
Note: we don't use the correcting factor of ½ that we did for the
normal approximation to the binomial distribution
-
See Example 6 (page 190) : Pick a person at random from a population of
people heights which are distributed approximately normal with mean 5 feet
10 inches and SD 2 inches. What is the probability of picking someone taller
than 6 feet.
-
Chebychev's Inequality for tail probabilities: For any random
variable X and any k > 0
-
See example 7 page 192 for an application of this inequality
Sums and Averages of Independent Random Variables.
-
Addition for Variances of Independent Random Variables:
-
Var(X+Y) = Var(X) + Var(Y) if X and Y are Independent
-
(Generalizes to an arbitrary number of independent random variables also).
-
Proof follows easily from the definition of variance and independence.
Sums of independent random variables with the same distribution.
-
This allows us to easily derive the standard deviation of the binomial
distribution
(See example 8)
-
The Law of Averages can be derived from the Square Root Law and
Chebychev's inequality. It says given a sequence of independent random
variables with the same distribution, the average of these random variables
will be arbitrarily close to their common mean with probability approaching
1.
-
The Central Limit Theorem tells us that for
large enough n the sum of n independent random variables each with the
same distribution with finite mean and standard deviation will have a distribution
approaching the normal distribution. The standardized sum will approximate
the normal distribution with mean 0 and variance 1.
Proof is quite difficult and lengthy!!! We won't cover. A number
of nice simulations exist however of the Central Limit Theorem. One
is on the network under (Choose Math then Probability).
-
Example 9 ( p. 107): Application of Central Limit Theorem.
Random walk. Each time step take a step to right (+1), take a step to the
left (-1), or stay where it is (0) (each with probability 1/3). Each move
taken is independent of the other moves made. What is the probability that
after 10000 steps the particle will be more than 100 steps to the right
of its starting point?
Section 3.4 Discrete Distributions
-
Random variables with finite distributions: This means the range
of random variable (possible values) is a finite set.
-
Random variables can also take on a non-finite countable number
of possible values.
-
Random variables that take on at most a countable number of possible values
are called discrete random variables. This includes random variables with
a finite number of possible values. They are said to have a discrete
distribution.
-
Random variables with an infinite, non-countable number of possible values
are called continuous random variables. Example: Let X = Waiting time for
a light bulb to burn out. Range is [0, infinity) We study these in Chapter
4
-
All of the basic concepts for finite distributions extend to discrete distributions
including conditional probability, joint distributions, independence, expectation,
variance, standard deviation in the same way.
Geometric Distribution
-
Example of a discrete random variable with a non-finite countable number
of possible values: Let X be the waiting time for the first success in
a sequence of Bernoulli trials. The set of possible values for X is {0,
1, 2, 3, ...} (i.e. all non-negative integers).
-
The distribution of this random variable X is called the geometric
distribution and is given by the formula:
P(X = k) = qk-1p, for k = 0, 1, 2, ....
( since must have k-1 failures followed by a success on the k'th trial.
)
-
Recall that the geometric series has the form:
Negative Binomial Distribution
-
Let Tr denote the waiting time in a sequence of Bernoulli trials
until the rth success.
-
(See example 4 page 213). This random variable is said to have the negative
binomial distribution (with parameter r).
-
Thus T4 for example is the waiting time for the 4th
success.
-
Range of Tr is {r, r+1, r+2, ....}
-
P(Tr = t) = P(r-1 successes in first t-1 trials, and success
on trial t)
-
To calculate the expectation, variance and standard deviation of the negative
binomial random variable Tr :
-
Write Tr = W1 + W2 + ... + Wr
where each Wi is the waiting time between the i-1st
success and the ith success.
-
Each of these Wi are independent and have geometric distribution.
-
So E(Tr) = rE(Wi) = r(1/p)

The Collector's Problem (example 5 page 215) Example of application
of the geometric distribution:
-
You want a matched set of all the n different animals (or whatever) from
a box of cereal. Let Tn = number of boxes Mom has to by to get
a complete set.
-
What is the expected value of Tn?
-
Buy the first box and get first animal
-
Buy more boxes until you get a different second animal. The number of additional
boxes required to get different second animal is geometric
with p = (n-1)/n
-
E(number of boxes required to get two different animals) is 1 + 1/p
= 1+n/(n-1)
-
Now buy more boxes until you get a different third animal. The number of
additional boxes required to get the different third box is
geometric with p = (n-2)/n
-
E(number of boxes required to get three different animals is 1 +
n/(n-1) + n/(n-2).
-
Continuing we get E(number of boxes to get all n different animals)
=
This means that if the cereal company has 6 different animals, expect
to buy on the average 14.7 boxes of cereal!
Section 3.5 The Poisson Distribution
-
Definition of a Poisson random variable.
Let N be the number of occurrences of some kind of event during
a unit of time where
a) The number of occurrences in non overlapping time intervals
are independent.
b) The probability of exactly one occurrence in a short enough
time interval h is approximately mh.
c) The probability of more than one occurrence in a short interval
is close to zero.
Then N has the Poisson distribution with parameter
m
-
Note: Often times instead of unit of time we are talking about unit
of material (which typically might be in a manufacturing process)
Examples are cookies in a bakery, yards of fabric in a textile plant,
board foot in a planing mill.
-
We only outline a derivation for the distribution of N. There
are several ways to do it. One involves the following steps:
-
Partition the unit time interval in to n small subintervals each of length
1/n.
-
The occurrence or non-occurrence of an event in each subinterval is like
a Bernoulli trial with probability p of occurrence =
m(1/n) = m/n.
-
The number of occurrences in the unit time interval is approximated by
a Binomial distribution.
-
We take the limit of the binomial distribution as n gets arbitrarily large,
and we get
-
(Note this uses some properties of limits of sequences that we won't derive.)
-
Mean and SD of Poisson(m) Distribution. It
is fairly easy to derive the mean and standard deviation of the Poisson(m)
distribution by using the Maclaurin's series expansion of e-mand
a few slick manipulations of infinite series. See your text page
223.
E(N) = m and Var(N) = m,
SD(N) = square root of m.
-
Examples of applications of the Poisson Distribution:
1. Good approximation to the Binomial distribution when probability
p of success is small and n is large. We can Poisson
with m = np.
Example: Number of wins in n games of roulette (n large enough)
for gambler betting only one number per game.
2. Number of radioactive particles emitted during an interval
of time.
3. Number of raindrops that fall on a particular area during
a unit interval of time.
4. Telephone calls arriving during a time period to a switchboard.
5. Number of flaws in a portion of a sheet of steel.
6. Number of chocolate chips in a cookie.
-
What the Poisson Distribution looks like: See page 120 (Section
2.3). As m increases the mean shifts to the
right and the variance increases. It becomes closer and closer to
normal in shape.
-
Important Facts:
1. Sums of Independent Poisson Random Variables are also Poisson!
Let N1, ... Nj be independent Poisson Random
variables with parameters m1, ...,mj.
Then the Sum N1 + ... + Nj is a Poisson random variable
with parameter m1 + ...+ mj.
(See the proof page 227)
2. Changing the unit interval. If the number
of occurrences in interval t is Poisson( m),
then the number of occurrences in interval ct is Poisson(cm).
See examples in the exercises below.
Examples of Applications of the Poisson Distribution:
Sample Applications of Poisson Distribution from the Exercises:
1. Exercise 1, page 233
-
Probability that the number of successes in n = 200 trials is equal to
4. The probability of success = 1%. This is a binomial
distribution well-approximated by the Poisson distribution, since p is
small and n is large.
-
Here m = np = 200*.01 = 2.
2. Exercise No. 2, page 234
-
Let m equal the number of raisins in a cookie on the average. Assuming
a Poisson distribution for N = the number of raisins per cookie.
-
P[at least one arisen per cookie] = 1 - P[no raisins per cookie] = 1 -
P[N = 0] = 1 - e-m,
which
we want to be at least .99 percent.
-
Now solve this for m.
3. Exercise No. 4, page 234
-
Number of misprints on a page is Poisson(1).
-
P[at least 5 misprints on a given page] = 1 - P(four or less per page)
= 1 - (e-1/0! + e-1/1! + e-1/2! + e-1/3!
+ e-1/4!), which is approximately .0037
-
Misprints on each page is independent of every other page. We can
treat the number of pages with more than 5 misprints in a 300 page book
as binomial (300, .0037), which is well approximated by the Poisson(300*.0037)
= Poisson(1.11).
-
P[ At least one page contains more than 5 misprints] = 1 - P[ no pages
contain more than 5 misprints] = 1 - e -1.11.
4. Exercise number 5, page 235.
-
Microbes are smeared over a plate at an average density of 5000 per square
inch. What is the chance of at least one microbe in a viewing field
of 10-4 square inches.
-
Assuming that number of microbes in the viewing field is Poisson with parameter
m
= 5000*10-4 = .5 microbes per viewing field.
-
P(N >= 1) = 1 - P(N = 0)
5. Exercise number 6, page 234.
-
Just like previous one, except raindrops instead of microbes.
-
N = number of raindrops during 10 second period per inch is Poisson(5).
6. Exercise number 8, page 234. This is Poisson(5).
7. Exercise number 7, page 234
a)
-
Number of fresh raisins per muffin is Poisson(3)
-
Number of rotten raisins per muffin is Poisson(2)
-
Number of total raisins per muffin is Poisson(3 + 2) = P(5) (Since
the sum of independent Poisson r.v.'s is Poisson -- here we are assuming
that number of fresh is independent of number of rotten)
b)
-
Number of raisins in .20 of a muffin is Poisson (.20*5) = Poisson(1).
-
P(No raisins) = e-1, or approx. .3679.