In probability and statistics, a probability mass function (sometimes called probability function or frequency function[1]) is a function that gives the probability that a discrete random variable is exactly equal to some value.[2] Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.
A probability mass function differs from a continuous probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A continuous PDF must be integrated over an interval to yield a probability.[3]
The value of the random variable having the largest probability mass is called the mode.
Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function p : R → [ 0 , 1 ] {\displaystyle p:\mathbb {R} \to [0,1]} defined by
p X ( x ) = P ( X = x ) {\displaystyle p_{X}(x)=P(X=x)}
for − ∞ < x < ∞ {\displaystyle -\infty <x<\infty } ,[3] where P {\displaystyle P} is a probability measure. p X ( x ) {\displaystyle p_{X}(x)} can also be simplified as p ( x ) {\displaystyle p(x)} .[4]
The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,
∑ x p X ( x ) = 1 {\displaystyle \sum _{x}p_{X}(x)=1} and p X ( x ) ≥ 0. {\displaystyle p_{X}(x)\geq 0.}
Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes x {\displaystyle x} .
A probability mass function of a discrete random variable X {\displaystyle X} can be seen as a special case of two more general measure theoretic constructions: the distribution of X {\displaystyle X} and the probability density function of X {\displaystyle X} with respect to the counting measure. We make this more precise below.
Suppose that ( A , A , P ) {\displaystyle (A,{\mathcal {A}},P)} is a probability space and that ( B , B ) {\displaystyle (B,{\mathcal {B}})} is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of B {\displaystyle B} . In this setting, a random variable X : A → B {\displaystyle X\colon A\to B} is discrete provided its image is countable. The pushforward measure X ∗ ( P ) {\displaystyle X_{*}(P)} —called the distribution of X {\displaystyle X} in this context—is a probability measure on B {\displaystyle B} whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section) f X : B → R {\displaystyle f_{X}\colon B\to \mathbb {R} } since f X ( b ) = P ( X − 1 ( b ) ) = P ( X = b ) {\displaystyle f_{X}(b)=P(X^{-1}(b))=P(X=b)} for each b ∈ B {\displaystyle b\in B} .
Now suppose that ( B , B , μ ) {\displaystyle (B,{\mathcal {B}},\mu )} is a measure space equipped with the counting measure μ {\displaystyle \mu } . The probability density function f {\displaystyle f} of X {\displaystyle X} with respect to the counting measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure of X {\displaystyle X} (with respect to the counting measure), so f = d X ∗ P / d μ {\displaystyle f=dX_{*}P/d\mu } and f {\displaystyle f} is a function from B {\displaystyle B} to the non-negative reals. As a consequence, for any b ∈ B {\displaystyle b\in B} we have P ( X = b ) = P ( X − 1 ( b ) ) = X ∗ ( P ) ( b ) = ∫ b f d μ = f ( b ) , {\displaystyle P(X=b)=P(X^{-1}(b))=X_{*}(P)(b)=\int _{b}fd\mu =f(b),}
demonstrating that f {\displaystyle f} is in fact a probability mass function.
When there is a natural order among the potential outcomes x {\displaystyle x} , it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of X {\displaystyle X} . That is, f X {\displaystyle f_{X}} may be defined for all real numbers and f X ( x ) = 0 {\displaystyle f_{X}(x)=0} for all x ∉ X ( S ) {\displaystyle x\notin X(S)} as shown in the figure.
The image of X {\displaystyle X} has a countable subset on which the probability mass function f X ( x ) {\displaystyle f_{X}(x)} is one. Consequently, the probability mass function is zero for all but a countable number of values of x {\displaystyle x} .
The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. If X {\displaystyle X} is a discrete random variable, then P ( X = x ) = 1 {\displaystyle P(X=x)=1} means that the casual event ( X = x ) {\displaystyle (X=x)} is certain (it is true in 100% of the occurrences); on the contrary, P ( X = x ) = 0 {\displaystyle P(X=x)=0} means that the casual event ( X = x ) {\displaystyle (X=x)} is always impossible. This statement isn't true for a continuous random variable X {\displaystyle X} , for which P ( X = x ) = 0 {\displaystyle P(X=x)=0} for any possible x {\displaystyle x} . Discretization is the process of converting a continuous random variable into a discrete one.
There are three major distributions associated, the Bernoulli distribution, the binomial distribution and the geometric distribution.
The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: Pr ( X = i ) = 1 2 i for i = 1 , 2 , 3 , … {\displaystyle {\text{Pr}}(X=i)={\frac {1}{2^{i}}}\qquad {\text{for }}i=1,2,3,\dots } Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.
Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.
{{cite book}}