Entropy
Entropy
Intuition
We can measure the "surprise" of a event
- If
happens with a high probability, i.e., , it is not surprising when happens. is closed to . - If
happens with a low probability, i.e., , it is surprising when happens. is closed to .
Definition
Given a random variable
We usually only sum/integrate for
Interpretation
The entropy can be interpreted as a measure of uncertainty:
- If a distribution has high entropy, the observations are hard to predict, and the dataset sampled from the distribution will have high information content.
- If a distribution has small entropy, then observations are likely to be the same, and the dataset sampled from the distribution does not contain much information.
Properties
For discrete
is always non-negative, with if and only if is deterministic; is at its maximum if is the uniform distribution; is at its minimum if the probability mass is all in one state.
For continuous
can be negative; - For fixed mean and variance,
is at its maximum if is Gaussian.
Examples
Bernoulli Distribution
For a Bernoulli distribution parameterized by
Gaussian Distribution
The entropy of a
In univariate case,
Entropy in Information Theory
Entropy is useful to calculate the smallest amount of information required to convey a message.
Consider the transmission of sequences with a set of characters
where each character is encoded with
Suppose we want to encode a sequence of 4 characters A, B, C, D
with equal frequency
In this case, A, B, C, D
is encoded with 00, 01, 10, 11
.
Cross Entropy
For two distributions
The cross entropy can be interpreted as the expected number of bits needed to encode a data sample drawn from
Suppose we want to encode a sequence of 4 characters A, B, C, D
with equal frequency 0, 10, 110, 111
respectively, the probability of each encoding appearing in a random sequence is
Gibb's Inequality
The entropy of a distribution
where the equality holds if and only if
Since
Therefore,
The difference between
Using Gibb's inequality, we can also show that the entropy of uniform categorical distribution upper bounds the entropy of any categorical distribution.
Given categories