Introduction to Information Theory and Coding.pdf

mulugeta78 298 views 40 slides Jan 01, 2023
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

Information, Information theory, Entropy,


Slide Content

Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor)
HARAMAYA UNIVERSITY
HARAMAYA INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL AND COMPUTER
ENGINEERING
Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor),
Guest Lecturer
Introduction to Information Theory and Coding

INFORMATION
Information can be described as one or more statements or facts that are
received by a human and that have some form of worth to the recipient.
The working definition for information therefore must :
1.be something, although the exact nature (substance, energy, or
abstract concept) isn't clear;
2.provide “new” information: a repetition of previously received
messages isn't informative;
3.be “true:" a lie or false or counterfactual information is mis-
information, not information itself;
4.be ``about" something.
November 22

Haramaya University, HIT, School of ECE

2 Haramaya University, HIT, School of ECE 11/22/2022

Information can be thought of as the resolution of
uncertainty;
It answers the question of "What an entity is" and thus
defines both its essence and the nature of its characteristics.
The concept of information has different meanings in different
contexts.

Thus the concept becomes related to notions of constraint,
communication, control, data, form, education, knowledge,
meaning, understanding, mental stimuli, pattern, perception,
representation, and entropy.
3
INFORMATION
Haramaya University, HIT, School of ECE 11/22/2022

Information is stimuli that has meaning in some context
for its receiver.
When information is entered into and stored in a
computer, it is generally referred to as data.
After processing (such as formatting and printing), output
data can again be perceived as information.
When information is packaged or used for understanding
or doing something, it is known as knowledge.
4
INFORMATION
Haramaya University, HIT, School of ECE 11/22/2022

Information
How to measure Information?
The uncertainty of an event is measured by its probability of
occurrence and is inversely proportional to that.
More uncertain events, require more information to resolve
the uncertainty of that event.
The mathematical theory of information is based on
probability theory and statistics, and measures information
with several quantities of information.


November 22

Haramaya University, HIT, School of ECE

5 Haramaya University, HIT, School of ECE 11/22/2022

Information
How to measure Information?
The choice of logarithmic base in the following formulae determines
the unit of information entropy that is used.
The most common unit of information is the bit, based on the binary
logarithm.
Other units include the nat, based on the natural logarithm or base e,
and the hartley, based on the base 10 or common logarithm.
The bit is a typical unit of information. It is 'that which reduces
uncertainty by half'.

November 22

Haramaya University, HIT, School of ECE

6 Haramaya University, HIT, School of ECE 11/22/2022

Why logarithm to measure Information?
We want our information measure I(p) to have several
properties
1)Information is a non-negative quantity: I(p) ≥ 0.
2) If an event has probability 1, we get no information from the occurrence of
the event: I(1) = 0.
3)If two independent events occur, then the information we get from observing
the events is the sum of the two informations: I(p1,p2) = I(p1)+I(p2):
4)We want our information measure to be a continuous (and, in fact,
monotonic) function of the probability (slight changes in probability should
result in slight changes in information).
November 22

Haramaya University, HIT, School of ECE

7 Haramaya University, HIT, School of ECE 11/22/2022

Why logarithm to measure Information?
We can therefore derive the following:

November 22

Haramaya University, HIT, School of ECE

8 Haramaya University, HIT, School of ECE 11/22/2022

Why logarithm to measure Information?
From this, we can derive the nice property between logarithm and Self-
information content:


This formula is called the surprise function which represents the
amount of surprise or the amount of self-information a particular
symbol x of a distribution holds.
Intuitively the definition can be understood as follows: we would be
surprised if a rare symbol ( p(x) is small ) is observed. Thus, lower the
p(x), higher the surprise, which is achieved by the above definition.
November 22

Haramaya University, HIT, School of ECE

9 Haramaya University, HIT, School of ECE 11/22/2022

Why logarithm to measure Information?
Thus, using different bases for the logarithm results in
information measures which are just constant multiples of
each other, corresponding with measurements in different
units:
1.log
2 units are bits (from 'binary')-Typical in Information Theory
2.log
3 units are trits(from 'trinary')
3.log
e units are nats (from 'natural logarithm') (We'll use ln(x)
for log
e(x))
4.log
10 units are Hartleys, after an early worker in the field.
November 22

Haramaya University, HIT, School of ECE

10 Haramaya University, HIT, School of ECE 11/22/2022

Why logarithm to measure Information?
In general,
The Shannon information content /self-information content/ of an
outcome x is defined to be

and it is measured in bits.
The entropy of an ensemble X is defined to be the average Shannon
information content of an outcome:
November 22

Haramaya University, HIT, School of ECE

11 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
How to measure Information?
In information theory, the entropy of a random variable is the average level
of "information", "surprise", or "uncertainty" inherent in the variable's
possible outcomes.
The concept of information entropy was introduced by Claude Shannon in his
work published in 1948.
Entropy should be a measure of how "surprising" the average outcome of a
variable is.
Entropy is the average self-information content of the random variable.
For a continuous random variable, differential entropy is analogous to
entropy.
November 22

Haramaya University, HIT, School of ECE

12 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
How to measure Information?
Suppose now that we have n symbols {a
1; a
2; : : : ; a
n}, and some
source is providing us with a stream of these symbols.
Suppose further that the source emits the symbols with probabilities
{p
1; p
2; : : : ; p
n
}, respectively.
Assume that the symbols are emitted independently.
What is the average amount of information we get from each symbol
we see in the stream from the source?
November 22

Haramaya University, HIT, School of ECE

13 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
How to measure Information?
What we really want here is a weighted average.
If we observe the symbol a
i, we will be getting log(1/p
i) information
from that particular observation.
In a long run (say N) of observations, we will see (approximately) N*pi
occurrences of symbol a
i.
Thus, in the N (independent) observations, we will get total information
I of
November 22

Haramaya University, HIT, School of ECE

14 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
How to measure Information?
Then, the average information we get per symbol observed for the N
observations will be




We define to be zero when
November 22

Haramaya University, HIT, School of ECE

15 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
For generalization, let us suppose that we have a set of probabilities
(a probability distribution) P = {p
1, p
2,…, p
n}.
We define the entropy of the distribution P by:



if we have a continuous rather than discrete probability distribution
P(x):

November 22

Haramaya University, HIT, School of ECE

16 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
Another way to think about entropy is in terms of expected value.
Given a discrete probability distribution P = {p
1, p
2,…, p
n} with
and or a continuous distribution P(x) with
and
We can define the expected value of an associated discrete set F = {f
1;
f
2; : : : ; f
n} or function F(x) by:

Or
From this we can conclude that


November 22

Haramaya University, HIT, School of ECE

17 Haramaya University, HIT, School of ECE 11/22/2022

Entropy
The entropy of a probability distribution is just the expected value of
the information of the distribution..
Intuitively, the more the expected surprise or the entropy of the
distribution, the harder it is to represent.


November 22

Haramaya University, HIT, School of ECE

18
Therefore, Entropy represents the expected value of surprise a distribution
holds.

Haramaya University, HIT, School of ECE 11/22/2022

Information Theory
Information theory is the scientific study of the
quantification, storage, and communication of information.
The field was fundamentally established by the works of
Harry Nyquist and Ralph Hartley, in the 1920s, and Claude
Shannon in the 1940s.
The field is at the intersection of probability theory,
statistics, computer science, statistical mechanics,
information engineering, and electrical engineering.
November 22

Haramaya

University, HIT, School of ECE

19 Haramaya University, HIT, School of ECE 11/22/2022

“The fundamental problem of communication is that of reproducing at
one point either exactly or approximately a message selected at
another point.”
Haramaya University, HIT, School of ECE 20
Shannon Definition of Communication
November
22
“Frequently the messages have meaning”

“... [which is] irrelevant to the engineering problem.”

Shannon wants to find a way for “reliably” transmitting data
throughout the channel at “maximal” possible rate.

Haramaya University, HIT, School of ECE 21
Shannon Definition of Communication
November
22
Destination Decoding
Communication
Channel
Coding
Information
Source

Model of a Digital Communication System
Destination Decoding
Communication
Channel
Coding
Information
Source
Message
e.g. English symbols
Encoder
e.g. English to 0,1 sequence
Can have noise
or distortion
Decoder
e.g. 0,1 sequence to English
22
Haramaya University, HIT, School of ECE 11/22/2022

Information
Given a discrete random variable X , with possible outcomes
x
1,x
2,…x
n, which occur with probability p(x
1),p(x
2),…,p(x
n) the
entropy of X is formally defined as:


This Function is called the entropy function

November 22

Haramaya University, HIT, School of ECE

23 )(log)()(
2
xpxpXH
x
 Haramaya University, HIT, School of ECE 11/22/2022

Information
Example
Tossing a dice:
Outcomes are 1,2,3,4,5,6
Each occurs at probability 1/6
Information provided by tossing a dice is
November 22

Haramaya University, HIT, School of ECE

24 bits 585.26log
6
1
log
6
1
)(log)()(log)(
2
6
1
2
2
6
1
2
6
1






i
ii
ipipipipH
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
NOTATIONS
We use capital letters X and Y to name random variables,
lower case letters and to refer to their respective
outcomes.
These are drawn from particular sets A and B: and

The probability of a particular outcome is denoted
November 22

Haramaya University, HIT, School of ECE

25
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
JOINT ENSEMBLE
A joint ensemble „XY' is an ensemble whose outcomes are
ordered pairs with
The joint ensemble XY defines a probability distribution
over all possible joint outcomes
MARGINAL PROBABILITY
From the Sum Rule, we can see that the probability of X taking on a particular
value is the sum of the joint probabilities of this outcome for X and
all possible outcomes for Y :
November 22

Haramaya University, HIT, School of ECE

26
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
In a simplified way it can be rewrite as

And similarly for Y

November 22

Haramaya University, HIT, School of ECE

27
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
Conditional probability
From the Product Rule, we can easily see that the conditional
probability that


In a simplified way

Similarly
November 22

Haramaya University, HIT, School of ECE

28
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
JOINT ENTROPY ENSEMBLES


CONDITIONAL ENTROPY
Conditional entropy of an ensemble X, given that y = b
j measures the
uncertainty remaining about random variable X after specifying that random
variable Y has taken on a particular value y = b
j.
It is defined naturally as the entropy of the probability distribution
November 22

Haramaya University, HIT, School of ECE

29
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
CONDITIONAL ENTROPY
Conditional entropy of an ensemble X, given an ensemble Y is given by




CHAIN RULE FOR ENTROPY
The joint entropy, conditional entropy, and marginal entropy for two
ensembles X and Y are related by:
November 22

Haramaya University, HIT, School of ECE

30
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
It should seem natural and intuitive that the joint entropy of a pair of
random variables is the entropy of one plus the conditional entropy of
the other (the uncertainty that it adds once its dependence on the first
one has been discounted by conditionalizing on it).

If we have three random variables X; Y; Z, the conditionalizing of the
joint distribution of any two of them, upon the third, is also expressed
by a Chain Rule:
November 22

Haramaya University, HIT, School of ECE

31
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
INDEPENDENCE BOUND ON ENTROPY
A consequence of the Chain Rule for Entropy is that if we have many
different random variables X1; X2; :::; Xn, then the sum of all their
individual entropies is an upper bound on their joint entropy:


Their joint entropy only reaches this upper bound if all of the random
variables are independent.
November 22

Haramaya University, HIT, School of ECE

32
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
MUTUAL INFORMATION
Mutual information is a quantity that measures a relationship between
two random variables that are sampled simultaneously.
In particular, it measures how much information is communicated, on
average, in one random variable about another.
Intuitively, one might ask, how much does one random variable tell me
about another?

In this definition, P(X) and P(Y ) are the marginal distributions of X and
Y obtained through the marginalization process.

November 22

Haramaya University, HIT, School of ECE

33
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
MUTUAL INFORMATION
Note that in case X and Y are independent random variables, then the
numerator inside the logarithm equals the denominator.
Then the log term vanishes, and the mutual information equals zero, as
one should expect.
In the event that the two random variables are perfectly correlated,
then their mutual information is the entropy of either one alone.


The mutual information of a random variable with itself is just its
entropy or self-information
November 22

Haramaya University, HIT, School of ECE

34
Haramaya University, HIT, School of ECE 11/22/2022

ENTROPY-Properties
MUTUAL INFORMATION
The mutual information I (X;Y) is the intersection between H(X) and H(Y)
These properties are reflected in three equivalent definitions for the
mutual information between X and Y :
November 22

Haramaya University, HIT, School of ECE

35
Haramaya University, HIT, School of ECE 11/22/2022

Information

Assignment-1

Using Probability Theory, Show that there is only one way to
measure information which is in terms of number of bits.


November 22

Haramaya University, HIT, School of ECE

36 Haramaya University, HIT, School of ECE 11/22/2022

Shannon’s 1
st
Source Coding Theorem
Shannon showed that:
“To reliably store the information generated by some
random source X, you need no more/less than, on the
average, H(X) bits for each outcome.”

November 22

Haramaya University, HIT, School of ECE

37
Haramaya University, HIT, School of ECE 11/22/2022

Shannon’s 1
st
Source Coding Theorem
If I toss a dice 1,000,000 times and record values from each
trial
1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
In principle, I need 3 bits for storing each outcome as 3 bits
covers 1-8. So I need 3,000,000 bits for storing the
information.
Using ASCII representation, computer needs 8 bits=1 byte
for storing each outcome
The resulting file has size 8,000,000 bits
November 22

Haramaya University, HIT, School of ECE

38
Haramaya University, HIT, School of ECE 11/22/2022

Shannon’s 1
st
Source Coding Theorem
You only need 2.585 bits for storing each outcome.
So, the file can be compressed to yield size
2.585x1,000,000=2,585,000 bits
Optimal Compression Ratio is:
November 22

Haramaya University, HIT, School of ECE

39 %31.323231.0
000,000,8
000,585,2

Haramaya University, HIT, School of ECE 11/22/2022

Shannon’s 1
st
Source Coding Theorem
.
November 22

Haramaya University, HIT, School of ECE

40
Haramaya University, HIT, School of ECE 11/22/2022
Tags