The presentation gives basic insight into Information Theory, Entropies, various binary channels, and error conditions. It explains principles, derivations and problems in very easy and detailed manner with examples.
Size: 937.8 KB
Language: en
Added: Feb 02, 2016
Slides: 77 pages
Slide Content
INFORMATION THEORY
INFORMATION
*It is quantitative measure of information.
*Most UN-expected events give maximum information.
Relation between Information and its probability:
*Information is inversely proportional to its probability of occurrence.
Information is continuous function of its probability.
«Total information of two or more independent messages is the sum of individual
information.
I(x) = log (1 / p(x) ) = - log p(x)
P(x) = Probability of occurrence of x.
I(x) = Information
MARGINAL ENTROPY: Average information
Source S delivers messages as X = ([X¡, Xp, X3...Xp}
with probabilities of p {x} = {Pı, Po, P3, -.--Pm) -
» Total N messages are delivered , No.
+ xl occurs Npl times and .....
+ Single occurrence of x, conveys information = -log p..
» Np, times occurrence conveys information = -Np, log pj.
+ Total Information = -Np, log p, -Np, log py... -Np, log ps
° Condition: Complete probability scheme-- X p, = 1
H(x) -- bits per symbol or bits per message
&
N
NS
Infomation te = symbol rate r, * H(x)
¿A
&
.
PRACTICE
If all p’s except one are zero, H(x) =0
If m symbols have all p’s equal, H(x) = log m
Entropy of binary sources:
Symbols are “0” and “1” with probabilities p and (1-p).
H(x) = -p log p — (1-p) log (1-p) = say H(p) '
ETE
X = (xj, X, X, X4} with probabilities
p= { 1/2, 1/4, 1/8, 1/8 } ,
a. Check if x is a complete probability scheme? ° "*n
b. Find Entropy.
c. Find entropy if all messages are equiprobable.
d. Find rate of information, if source emits symbols once every
milisecond.
a. yes. b. 1.75 bits per symbol c. 2 bits per symbol
1. Adiscrete source emits one of the 5 symbols once every millisecond.
The symbol probabilities are 1/2, 1/4, 1/8, 1/16/ 1/16 respectively. Find
information rate.
H(x) = 1.875 bits/symbol, R = 1875 bits / second
2. In the Morse code, dash is represented by a current pulse of 3 unit duration
and dot as 1 unit duration. The probability of occurrence of dash is 1/3"
of the probability of occurrence of dot.
a. Calculate information content in single dot and dash.
b. Calculate average information in dash-dot code.
c. Assuming dot and pause between symbols are Ims each, find average
rate of information.
a. P(dot) = 3/4 , p(dash) = 4
Li = -log 3/4 = 0.415 Lin = log 1/4 = 2
b. HA) = 0.3113 + 0.5 = 0.8113
c. Average time per symbol =?
a For every dash, 3 dots occur.
$ Each symbol is succeeded by a pause.
. Hence in one set we have—
Dot pause dot pause dot pause dash pause
Total 10 units.
Total 10 ms for 4 symbols.
Average time per symbol = 2.5ms.
Rate of symbol = 400 bits/symbol
R = 400 * 0.8113 = 324.52 bits/s
Show that entropy of source with equi-probable symbols
in always maximum.
+ H(X)- log m= Xp, log1/ p;, — log m
+ =p, logl/ p; - 2p,log m «
+ =p, log 1/(p; * m) |
° we have In a <(a-1) ~ property of log
* H(X) — log m < Xp, [(1/ (p; * m)) - 1] log,e 4 .
JOINT ENTROPY
Two sources of information X and Y giving messages Xy, x»,
X3...Xm and yj, Ya, Y3...Y, respectively.
Can be inputs and outputs of a communication system.
Joint event of X and Y considered.
Should have complete probability scheme i.e. sum of all
possible combinations of joint event of X and Y should be 1.
i rel p (x,y) =1
Entropy calculated same as marginal entropy.
Information delivered when one pair (x; y;) occur once is
-log p (x; y;) -
Number of times this can happen is N; out of total N.
Information for N;; times for this particular combination is
- Nj log p (x; y).
Total information for all possible combinations of I and j is
a
-L Y N;, lo xy;
IT je EP (X, y;)
H (x;,y;) = Total / N
JOINT ENTROPY H (x, yj) =
m n
H(x,y)=-E | Ep (x,y, log p(y)
i=l j=l
ALSO,
ie (x,y) =P (y)
iP (x, ¥)) =p (x)
P(X, y) = N; IN
P (x; y;)
Yi
x 0.25
X 0.1
X3 0
X4 0
X5 0
p(y)= 0.35
Find p (x;)
Ya
0.3
0.05
0.35
Y3
01
0.05
0.05
0.2
0.1
0.1
H(X) = Average information per character at the source or the entropy of
the source.
H(Y) = Average information per character at the destination or the entropy
at the receiver.
H(X,Y) = It is average information per pair of transmitted and received
character or average uncertainty of communication of communication as a
whole.
In H(X,Y), X and Y both are uncertain.
H(Y/X) =A specific X; is transmitted (known). One of permissible yj is
received with given probability. This Conditional Entropy is the measure
of information about the receiver where it is known that a particular X; is
transmitted. It is the NOISE or ERROR in the channel.
H(X/Y) =A specific Y; is received (known). It may be the result of one of
the Xi with a given probability. This Conditional Entropy is the measure
of information about the source where it is known that a particular Y; is
received. It is the measure of equivocation or how well input content can
be recovered from the output.
CONDITIONAL ENTROPY H(X/Y)
Complete Probability Scheme required —
Bay’s Theorem - p (x;, y;) =p (x) p (y; x) =p (A) Pp (x; 19; )
For a particular Y; received, it can be ONLY from one of Xj, X3, X3...Xy,
P(x,/y;) + p(X/y;) + P@/y)) + Pony) = 1
y y) =1
¿po CA]
Similarly
Ep GET
CONDITIONAL ENTROPY H(X/Y)
Y; is received.
m
H(X w2 P (xj y;) log p (xi/y;)
Average conditional entropy is taking all such entropies for all Y;
No of times H (X/ y; occurs = no of times Y; occurs = Ny
H(X/Y) = NON, H (X/ y ) + N,,* H (X/y,) + Ny3* H(X/y5) ...
HXIY)=E p(y) Hy)
e
H(X/Y)=-2 2p (y) p (x/y;) log p (xi/y;)
i=l j=l
moon
H(X/Y)=-2 Ep(x;y;) log p (x,/ y;) Similarly
el a
H(Y/X)=-2 Ep (x, Y;) log p (y;/ x)
el pel
PROBLEMS - 1
1. Transmitter has an alphabet consisting of five letters x,, X, X3, X4, Xs
And receiver has an alphabet consisting of four letters - y,, Yo» Ya» Ya.
yl
xl
x2
x3
x4
x5
0.25
0.1
Following probabilities are given.
y2
0
0.3
0.05
y3 y4
0 0
0 0
0.1 0
0.05 0.1
0.05 0
Identify the probability scheme and find all entropies.
Answers
HINT: log,X = log¡¿X / logy 2
=1l0g¡¿X * 3.322
. It is P(X, Y)
. H(X) = 2.066 bits / symbol
. H(Y) = 1.856 bits / symbol
. H(X, Y) = 2.665 bits / symbol
. H(X/Y) = 0.809 bits / symbol
7 H(Y/X) = 0.6 bits / symbol
RELATION AMONG ENTROPIES
H(X,Y) = BO) + H(Y/X) = H(Y) + H(X/Y)
$ HR 9)=-5 i je EPs ‚y)log p (x, y;)
Bay’s Theorem - p &, y)= =p (x) p (y; x)=p (y)) P (x; Ni )
° HG Ve j = P (x; y;) log p (x) p (y; /x;)
* -X Xp (x, y) logp (x) - 2 2 p (x; y;) log p (y; /x;)
+ H(X,Y) = H(X) + H(Y/X )
+ Prove the other half.
H(X) 2H(X/Y) H(Y) 2 H(Y/X)
H(X/Y) -H(X) = - Y =p (x, y;) log p (x; / y;) +2 p (x) log p (x;)
n
as = py) = P(x)
fart or SE
=-2 2 p (x; y;) log p (Gi /y;) +2 2 p (x, yy) log p (x)
H(X/Y) -H(X) = =X p (x; y;) log (p Gi p (x; /y)}
we have In a < (a-1) ~ property of log
H(X/Y) -H(X) <2 2 p (x,y;) { {p &)/ p (x /y)} — 1 } loge
< [22 p (y) p &)/ p &/y) - EE p(xiy;) ] log,e
<[22 ply) p (Xx) - 22 p (a; y¡) ] log¿e
<[1 - 1] log,e
H(X/Y) -H(X) <0
ese # 8 #8 à 8 8 à
Prove H(X, Y) < H(X) + H(Y)
PROBLEMS - 2
» Identify the following probability scheme and find all
entropies.
+ Given P(x) =[0.3 0.4 0.3]
x1 3/5
1/
> y1
1/5
x1 > y2
x1 3/5 > y3
PROBLEMS - 3
Given p(x) = [ 0.6 0.3 0.1]. Find all entropies.
x1 > y1
x2 P > y2
1-p
1-p
x3 > y3
0.80.1 0.1
0.10.8 0.1
0.10.1 0.8
Find all entropies.
Find all other probabilities and entropies and show that
H(X) = HX/Y).
Try the other case too.
MUTUAL INFORMATION
It is the information gain through the channel.
x; is transmitted with a probability of p(x;). Priori entropy of X.
Initial uncertainty of x; is —log p(x;).
Ay; is received. Is it due to transmission of x; ?
Final uncertainty of transmission of x; is log p(x; / y;).
It is Posteriori entropy of X.
Information gain = net reduction in the uncertainties.
I(x; , y,) = —log p(x;) + log p(x;/ y;)
I(x; , y) = log (p(x;/ y;) / p(x) }
1(X, Y) = avergsag IG; ‚y;) for all values of I and j.
(X,Y) = 5 LEP (x;, y) log{p(x;/ y;) / pG:) }
i= Je
TX, Y) = H(X) - H(X/Y)
I(X,Y) = H(Y) - H(Y/X) = H(X) + H(Y) - H(X,Y)
Find mutual information for noiseless channel.
Given p(x) = 1/4, 2/5, 3/20, 3/20, 1/20
Find efficiency for following probability matrix —
1/4 0 0 0
1/10 3/10 0 0
0 120 1/10 0
0 0 1/20 1/10
0 0 120 0
2/3 1/3
1/3 2/3
If p(x,) = % and p(x,) = 4,
find H(X), HCY), H(X/Y), H(Y/X), I(X, Y), C and
efficiency.
ASSIGNMENT
4. a. For a given channel find mutual information I(X,Y) with
xy
%
p(x) = (%, 2}
Yi Y2 Y3
2/3 1/3 0
0 1/6 5/6
b. If receiver decides that x, is the most likely
transmitted symbol when y, is received, other
conditional probabilities remaining same, then calculate
IX, Y)
HINT: b) Find p(X/Y). Change p(x,/y,) to 1 and p(x,/y,) to
0. Rest remains same. Calculate I.
Types of Channels — Symmetric channel
Every row of the channel matrix contains same set of
numbers p, to pj.
Every column of the channel matrix contains same set of
numbers q; to qj.
02 02 03 0.3
0.3 03 02 0.2
02 03 05
03 05 0.2
05 02 0.3
Types of Channels — Symmetric channel
Binary symmetric channel — special case.
0 and 1 transmitted and received.
€ is probability of error.
l-e € O<e<l
€ l-e
H(Y/X) is independent of input distribution and solely
determined by channel matrix
H(Y/X)=-2 2p(x,y;) log p (yj/ x)
i=l jel
Types of Channels — Symmetric channel
+ H(Y/X)=-2 Zp (x; y;) log p(y;/x)
i=1 j=l
+ H(Y/X) = ES p(x;) 2, p(yj/ x) log p (y;/ x)
» (Check with given matrix.)
p(x,){ (1-9) log(1-e)+ e loge} + p(x,){ (1-e)log(1-e)+ elog €}
+ Generalizing—
+ H(Y/X)= 2; p(y;/ x;) log p (y¡/ x;)
+ H(Y/X) is independent of input distribution and solely
determined by channel matrix
Types of Channels - Lossless channel
+ Output uniquely specifies the input.
+ H(X/Y) =0 Noise less channel
+ Matrix has one and only one non-zero element in each column.
+ Channel with error probability 0 as well as 1 are noiseless
channels.
+ P(Y/X)=
x 3/5 3/10 1/10
Ya 0 0 0
% —
Ys
sel
D
oo
oo
200
Types of Channels — Deterministic channel
Input uniquely specifies the output.
H(Y/X) = 0
Matrix has one and only one non-zero element in each
row.
Also Sum in row should be 1.
Elements are either 0 or 1.
P(Y/X) =
==
Y.
._——
X6 3
0000
0-00
200000
Types of Channels — Useless channel
X and Y are independent for all input distributions.
H(X/Y) = H(X)
Proof of sufficiency:-
Assuming channel matrix has identical rows. Source is NOT equi-
probable.
For every output p(y;)-
PCy) = = ¡=1 P (y)
= Xp (x) PY; /x;)
= p(y;/x) E, p (x) = py; /x;)
Also p (x; y;) = p (x;) p( y;) ...X and Y are independent
Types of Channels — Useless channel
Proof of necessity:-
Assuming rows of channel matrix are NOT identical. Source is equi-
probable.
Column elements are not identical.
Let p(Y jo /Xio) is largest element in j" row.
p(y) =E jaa POY)
Ep (x) ply; /x)
= 1M * & ply; /X;) < PCY jo /Xio)
Generalizing - p(y;) # PU jo /Xio)
P (x,y) = p (x) PO /x) F P (x) PCy)
Hence for uniform distribution of X, X and Y are not
independent. Channel in not useless.
DESCRETE COMMUNICATION CHANNELS
— BINARY SYMMETRIC CHANNEL (BSC)
* One of the most widely used channel.
« Symmetric means —
Pi = Poo
Pi = Pai
0 0
> _ p+q=1
1 1
q is probability of correct reception .
p is probability of wrong reception.
We have to find C, channel capacity.
p(0) = p(1) =0.5
As p(X) is given, assume above is p(Y/X).
Find p(X,Y) and p(Y).
Find C = {H(Y) - H(Y/X) }max
p(X, Y) = p/2 q/2
q/2 p/2
C =1+ p log p tq log q
BINARY ERASE CHANNEL (BEC)
ARQ is common practice in Communication.
Assuming x, is not received as y, and vice versa.
Y3 output to be erased and ARQ to be asked.
C is to be found. p(X) = 0.5, 0.5
p(X,Y) = q 0
0
p(X,Y)= q/2 0 p/2
0 q/2 p/2
p(Y)= q q2 p
p(X/Y)= 1 0 Y
0 1 Y
C = H(X) - H(X/Y) = 1-p=q
C = q = Prob of correct reception.
CASCADED CHANNELS
« Two BSC’s are cascaded.
xl 4 4 > zl
q q
Equivalent can be drawn as --
p(Z/S) = X1 q p’
x y ¢
* q =p+q*=1-2pq
* p’=2pq
¢ Find C.
* C=1-p logp’+q logq’
* C= 1+ 2pq log (2pq) + (1-2pg) log (1-2pq)
+ Calculate C for 3 stages in cascaded.
Repetition of signals
To improve channel efficiency and to reduce error, it is a useful
technique to repeat signals.
CASE |: Instead of 0 and 1, we send 00 and 11.
CASE Il: Instead of O and 1, we send 000 and 111.
Case | : At receiver, 00 and 11 are valid outputs.
Rest combinations are erased.
Case Il : At receiver, 000 and 111 are valid outputs.
All above CAN NOT be used for non symmetric
channels
PROBLEMS - Assignment
Assume a BSC with q = 3/4 and p = 1/4 and
p(0) = p(1) = 0.5.
Calculate the improvement in channel capacity after 2
repetition of inputs.
Calculate the improvement in channel capacity after 3
repetition of inputs.
Calculate the change in channel capacity after 2 such BSC's
are cascaded.
Calculate the change in channel capacity after 3 such BSC's
are cascaded.
Channel Capacity of non-symmetric channels
P= Pu Pao
= Pa Pa
* [P][Q] =-[H] auxiliary variables Q, and Q,
Pu Pra Q; = | Pa logPy, + Py2 logPy> Matrix.
Pa Pa Q, Pa logP>, + P27 LogP 22 Operation
Then
C = log (291 + 202)
Channel Capacity of non-symmetric channels
+ Find channel capacity of
+ 0.4 0.6 0
° 0.5 0 0.5
° 0 0.6 0.4
° C=0.58 bits
EXTENSION OF ZERO-MEMORY SOURCE
« Binary alphabets can be extended to s? to give 4
words, 00, 01, 10, 11.
« Binary alphabets can be extended to s? to give 8
words, 000, 001, 010, 011, 100, 101, 110, 111.
« For m messages with m=2", an nit extension of the
binary is required.
H(S”) = nH(S)
« Zero memory source S has alphabets {x,, X2,...Xm-}
with probability of x; as pj.
» nth extension of S called S" is zero memory source
with m’ = m" symbols {y,, Yo,---Y m’ }.
* Y 0) = Xi XX}.
+ P(VG)) = {PO%j1) PO)... px, )}
* 2 sn P(VG)) = 2 sn PX) PO? ).-- PO%n )}-
* =D 2 Dja--- PO) PX )---PlXn)
=) PR) Pl%a JD 3. … Pin )
+ =1
Using this..
* H(S") = -2 sn p(y()) log p(yG))
H(S") = nH(S)
H(S") = -3 5, P(V()) log {P(X 1) (Xp )--PO% I}
* =-2 sn P(YÜ)) log P(X%1) -2 sn P(Y()) log p(x; )...n times
Here each term..
¢ Problem 1: Find entropy of third extension S? of a
binary source with p(0) = 0.25 and p(1) = 0.75. Show
that extended source satisfies complete probability
scheme and its entropy is three times primary
entropy.
+ Problem 2: Consider a source with alphabets x,, x, ,
X3, X, with probabilities p{x} ={ 72, Ya, 1/8, 1/8}.
Source is extended to deliver messages with two
symbols. Find the new alphabets and their
probabilities. Show that extended source satisfies
complete probability scheme and its entropy is twice
the primary entropy.
Given a source of M equally likely messages.
M>> 1.
Source generating information at a rate R.
Given a channel with a channel capacity C, then
If R< =C, then there exists a coding technique such
that messages can be transmitted and received at
the receiver with arbitrarily small probability of
error.
NEGATIVE STATEMENT TO SHANNON’S
THEOREM
Given a source of M equally likely messages.
M>>1.
Source generating information at a rate R.
Given a channel with a channel capacity C, then
If R > C, then probability of error is close to unity
for every possible set of M transmitter messages.
What is this CHANNEL CAPACITY?
CAPACITY OF GAUSSIAN CHANNEL
DESCRETE SIGNAL CONTINEOUS CHANNEL
Channel capacity of white band limited Gaussian channel is
C = Bloga[ 1 + S/N] bits/s
B — Channel bandwidth
S - Signal power
N — Noise power within channel bandwidth
ALSO N=nB where
n / 2 is two sided noise power spectral density.
1.
Formula is for Gaussian channel.
Can be proved for any general physical system
assuming-
Channels encountered in physical system are generally
approximately Gaussian.
The result obtained for a Gaussian channel often provides a
lower bound on the performance of a system operating over a
non Gaussian channel.
PROOF-
* Source generating messages in form of fixed voltage levels.
+ Level height is Lo. RMS value of noise is 6. A is a large enough
constant.
30/2 -
Aol2 |
-Aol2 |
-3A0/2
-5A0/2
M possible types of messages.
M levels.
Assuming all messages are equally likely.
Average signal power is
S = 2/M { (Ao/2 )? + (310/2 )? + (50/2 )? +... ([M-1] 40/2 )2}
S = 2/M* (A0/2 )? {1432 + 52 +... [M-1P}
S = 2/M * (A0/2 }? {M (M? - 1)}/6
Hint: 2 N? = 1/6*N*(N+1)*(2N+1)
S = (M2-1)/12 * (Lo )?
M=[(1+12 S/ (Ao)? J"? N= 02
M=[(1+12S/92 N ]"?
M equally likely messages.
H = log,M
H= log,[ 1+125/12N]12 Let 12=12
H= %*log,[ 1+ S/N]
Square signals have rise time and fall time through channel.
Rise time t, = 0.5/B = 1/(2B)
Signal will be detected correctly if T is at least equal to t, .
At Shannon’s limit, B — +, S / nB — 0
C.=(S/n) loge
C..=1.44 (S/n)
CHANNELS WITH FINITE MEMORY
Statistically independent sequence: Occurrence of
a particular symbol during a symbol interval does
NOT alter the probability of occurrence of symbol
during any other interval.
Most of practical sources are statistically
dependant.
Source emitting English symbol :
— After Q next letter is U with 100% probability
— After a consonant, probability of occurrence of
vowel is more. And vise versa.
Statistical dependence reduces amount of
information coming out of source.
CHANNELS WITH FINITE MEMORY
Statistically dependent source.
A source emitting symbols at every interval of T.
Symbol can have any value from x,, X X3...Xm,
Positions ares, S,, S3...S-1 Sk
Each position can take any of possible x.
Probability of occurrence of x; at position s, will
depend on symbols at previous (k-1) positions.
Such sources are called MARKOV’s SOURCE of (k-
1) order.
Conditional Probability =
» P (x; / 51,8 83... 5.1)
Behavior of Markov source can be predicted from
state diagram.