Lecture2: EntropyandMutualInformation
Entropy
Mutual Information
Dr. Yao Xie, ECE587, Information Theory, Duke University
The winner is:
Eunsu Ryu, with number 60 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
9
10
A strategy to win the game?
Dr. Yao Xie, ECE587, Information Theory, Duke University 1
!"#$"%"&'()%*&+,% Dr. Yao Xie, ECE587, Information Theory, Duke University 2
Uncertainty measure
LetXbe a random variable taking on a nite numberMof dierent
valuesx1;;xM
What isX: English letter in a le, last digit of Dow-Jones index, result
of coin tossing, password
With probabilityp1;;pM,pi>0,
∑
M
i=1
pi= 1
Question: what is the uncertainty associated withX?
Intuitively: a few properties that an uncertainty measure should satisfy
It should not depend on the way we choose to label the alphabet
Dr. Yao Xie, ECE587, Information Theory, Duke University 3
Desired properties
It is a function ofp1;;pM
Let this uncertainty measure be
H(p1;;pM)
Monotonicity. Letf(M) =H(1=M;;1=M). IfM < M
′
, then
f(M)< f(M
′
):
Picking one person randomly from the classroom should result less
possibility than picking a person randomly from the US.
Dr. Yao Xie, ECE587, Information Theory, Duke University 4
Additivity. Two independent RVXandY, each uniformly distributed,
alphabet sizeMandL. The uncertainty for the pair(X;Y), isML.
However, due to independence, whenXis revealed, the uncertainty in
Yshould not be aected. This means
f(ML)f(M) =f(L)
Grouping rule (Problem 2.27 in Text). Dividing the outcomes into two,
randomly choose one group, and then randomly pick an element from
one group, does not change the number of possible outcomes.
Dr. Yao Xie, ECE587, Information Theory, Duke University 5
Entropy
The only function that satises the requirements is the entropy function
H(p1;;pM) =
M
∑
i=1
pilog
2pi
General denition of entropy
H(X) =
∑
x2X
p(x)log
2p(x)bits
0log0 = 0
Dr. Yao Xie, ECE587, Information Theory, Duke University 6
Uncertainty in a single random variable
Can also be written as:
H(X) =E
{
log
1
p(X)
}
Intuition:H= log(#of outcomes/states)
Entropy is a functional ofp(x)
Entropy is a lower bound on the number of bits need to represent a RV.
E.g.: a RV that that has uniform distribution over 32 outcomes
Dr. Yao Xie, ECE587, Information Theory, Duke University 7
Properties of entropy
H(X)0
Denition, for Bernoulli random variable,X= 1w.p.p,X= 0w.p.
1p
H(p) =plogp(1p)log(1p)
{Concave
{Maximizes atp= 1=2
Example: how to ask questions?
Dr. Yao Xie, ECE587, Information Theory, Duke University 8
Joint entropy
Extend the notion to a pair of discrete RVs(X;Y)
Nothing new: can be considered as a single vector-valued RV
Useful to measure dependence of two random variables
H(X;Y) =
∑
x2X
∑
y2Y
p(x;y)logp(x;y)
H(X;Y) =Elogp(X;Y)
Dr. Yao Xie, ECE587, Information Theory, Duke University 9
Conditional Entropy
Conditional entropy: entropy of a RV given another RV. If
(X;Y)p(x;y)
H(YjX) =
∑
x2X
p(x)H(YjX=x)
Various ways of writing this
Dr. Yao Xie, ECE587, Information Theory, Duke University 10
Chain rule for entropy
Entropy of a pair of RVs = entropy of one + conditional entropy of the
other:
H(X;Y) =H(X)+H(YjX)
Proof:
H(YjX)̸=H(XjY)
H(X)H(XjY) =H(Y)H(YjX)
Dr. Yao Xie, ECE587, Information Theory, Duke University 11
Relative entropy
Measure of distance between two distributions
D(pjjq) =
∑
x2X
p(x)log
p(x)
q(x)
Also known as Kullback-Leibler distance in statistics: expected
log-likelihood ratio
A measure of ineciency of assuming that distribution isqwhen the
true distribution isp
If we use distribution isqto construct code, we needH(p)+D(pjjq)
bits on average to describe the RV
Dr. Yao Xie, ECE587, Information Theory, Duke University 12
Mutual information
Measure of the amount of information that one RV contains about
another RV
I(X;Y) =
∑
x2X
∑
y2Y
p(x;y)log
p(x;y)
p(x)p(y)
=D(p(x;y)jjp(x)p(y))
Reduction in the uncertainty of one random variable due to the
knowledge of the other
Relationship between entropy and mutual information
I(X;Y) =H(Y)H(YjX)
Proof:
Dr. Yao Xie, ECE587, Information Theory, Duke University 13
I(X;Y) =H(Y)H(YjX)
H(X;Y) =H(X)+H(YjX)!I(X;Y) =H(X)+H(Y)H(X;Y)
I(X;X) =H(X)H(XjX) =H(X)
Entropy is\self-information"
Example: calculating mutual information
Dr. Yao Xie, ECE587, Information Theory, Duke University 14
Vien diagramH(X,Y)
H(Y|X)H(X|Y)
H(Y)
I(X;Y)
H(X)
I(X;Y)is the intersection of information inXwith information inY
Dr. Yao Xie, ECE587, Information Theory, Duke University 15
!" #" !#" $"
%&'("
)*+"
!"#$ !"!%$ !"&'$ !"&'$
)*+"!"!%$ !"#$ !"&'$ !"&'$
,&-./0"!"!%$ !"!%$ !"!%$ !"!%$
1.23"!"($ )$ )$ )$
*+$,-../$0123$
4+$567853$9.:$$
;<=8$57853:$
*+$>7:?=87-$@!"'A$!"(A$!"#A$!"#B$
4+$>7:?=87-$@!"(A$!"(A$!"(A$!"(B$
C@*B$D$E"($,=0;$ C@4B$D$'$,=0;$
F.8/=G.87-$380:.21+$C@*H4B$D$!!"#$,=0;A$C@4H*B$D$!&"#$,=0;$
$$
C@4H*B$I$C@*H4B$
JK0K7-$=89.:>7G.8+$L@*M$4B$D$C@*B$N$C@*H4B$D$)O&EP$,=0$ Dr. Yao Xie, ECE587, Information Theory, Duke University 16
!"
!"
#$!"
#$!" Dr. Yao Xie, ECE587, Information Theory, Duke University 17
Summary !
"##
#
#!
$####
#
%
"###
#%
&#####
'(!
"#)*)#!
$!#%
")#*)#%
&+# ,(!+
##
!#
-$./012#
34.456#'$70/&580$#
9(%
"#)*)#%
&:!
")#*)#!
$+#9(!
"#
)*)#!
$
+# 9(%
"#)*)#%
&+#
Dr. Yao Xie, ECE587, Information Theory, Duke University 18