Probability

It’s time to introduce some concepts from statistical theory.
The collection of all possible elementary outcomes of an experiment is
called the **sample space**, Ω.
The sample space of our experiment is the natural numbers Ω={1,2, 3 …},
a countably infinite set. An **event**, E is a subset of the sample
space. The probability of getting Yahtzee in three or fewer tosses represents the
event E = {1,2,3}. The sample space can be a discrete set, finite or countably
infinite but it can also be continuous and uncountable such as the real numbers, .

In a sequence of trials in an experiment such as ours the number of times a particular
event E occurs is called its **frequency**. The ratio of that frequency to the number of trials is
the **relative frequency**. The relative frequency of E tends to
stabilize as the number of trials increase. This is illustrated in the applet. The limit is called the
**probability** P of E. P is assumed to be the likelihood of a certain outcome in a single trial.

A **Probability space** (Ω, , P)
is a sample space Ω, a collection of subsets i.e. events
and a
probability assignment P to each event. The collection
must technically be a Borel field of subsets. Normally it’s just the collection of all subsets of Ω.
The probability assignment to events must satisfy three axioms.

1. P(Ω)=1

2. 0≤P(E)≤1

3. P(∪_{i}E_{i}) = ∪_{i} P(E_{i})
for every sequence of disjoint events E_{i}

For a discrete sample space such as N={1,2,3…} each point can be assigned a
probability. The probability of an event will be the sum of the parts P(E)=Σ_{ω}P(ω)
summing over all ωE. This obviously doesn’t work for
continuous spaces such as , where probabilities must
be assigned to events like intervals instead of individual points. Assignment of probabilities
to events can be thought of as a distribution of the total probability that equals one, among points and regions
of the sample space.

A **random variable** is a function X(ω) defined on a probability
space with values in a value space X⊆. If the sample space is
the points in the unit disc Ω= { (x,y)^{2}
| (x^{2}+y^{2}) ≤1 } then distance from the origin would be a random variable X(ω)=
.
When the sample space is part of it is natural to use the
random variable X(ω)=ω.

The random variable X(ω) provides a natural transfer of
probabilities from sample space to a value space of real numbers. P(A)=P( { ω | X(ω)A} ). For a discrete random variable
with values x_{i} the probabilities are described by the **probability
function** p(x_{i}).

The **distribution function** is used to characterize the distribution of probabilities for a random variable.
It is defined as F(λ)=P( { ω | X(ω )≤λ} ).
The distribution function F(λ) of a random variable sweeps up
probability from −∞ to +∞.
It is also called the cumulative distribution function c.d.f. F(λ)
is a non-decreasing function with F(−∞)=0 and F(+∞)=1.

The distribution F(λ) of the random variable describing
distance from the origin gives the probability for a point to be at a distance
less than λ based on a uniform probability distribution over the unit disc,
i.e. equal areas have the same probability.

Check this!

The probability of a random variable being in a certain interval is given by:

P( a<x≤b ) = F(b) – F(a).

A possible déjà vu could be caused by :

The derivative function f(λ)=F’(λ) of the c.d.f of a continuous
variable is called the **density function**. If probability is like a unit
mass distributed over a certain space then the density function would represent
density of mass. It’s the continuous version of the probability function.

If you conduct a large number of trials in a chance experiment with k possible
outcomes x_{1},x_{2},... x_{k} and
probabilities p_{1},p_{2},...p_{k} then you would
expect the average value of all outcomes to be Σx_{i}p_{i}.

The **expected value** of a random variable X is:

_{}

Expected value can be a misleading
term. The value might not even exist in the value space of X. A electron spin
direction experiment with X=1 for up and X=0 for down could have an expected
value of ½ even though every electron is detected as either up or down. Expected
value is analogous to center of gravity for a mass distribution.

The operation of calculating expectation values for a random variable is linear.
Given two random variables X(ω),Y(ω) and forming a new one
Z(ω)=aX(ω)+bY(ω) gives:

The expectation value of a random variable X(x)=x on a sample space in
is:

f(x)is the density function and F(x) is the distribution function.

The information in the density function for a random
variable is often summarized with a few numbers. The most important are the
expectation value, μ=EX and μ_{2}=E( (X-μ)^{2} ) also called the **central
moment of order 2**. μ_{2} is related to the amount of spread in a distribution.
μ_{3}= E( (X-μ)^{3} ) has to do with skewness,
asymmetry in the distribution. Under normal conditions the distribution will be
uniquely defined by its moments μ, μ_{2}, μ_{3}, . . . .

The spread in a distribution of a random variable is called **variance**.

σ is called the **standard deviation** of X.

Another measure of spread would be the mean deviation E( |X-μ| ).
The downside of this is that taking the absolute value is not a continuous operation which complicates
analysis.

Note that the expectation value of X^{2} doesn’t equal
μ^{2}. EX^{2} - μ^{2 }= Var X^{2} (Show
this!). Here’s another exercise, calculate the density function, expectation
value and standard deviation of the random variable X(ω)=.
in the unit disc example.