Introduction to power laws

csgillespie 3,472 views 53 slides Nov 15, 2012
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

No description available for this slideshow.


Slide Content

An introduction to power laws
Colin Gillespie
November 15, 2012

Talk outline
1
Introduction to power laws
2
Distributional properties
3
Parameter inference
4
Power law generating mechanisms
http://xkcd.com/

Classic example: distribution of US cities
Some data sets vary over enormous
range
US towns & cities:
Dufeld (pop 52)
New York City (pop 8 mil)
The data is highlyright-skewed
When the data is plotted on a
logarithmic scale, itseemsto follow a
straight line
This observation is attributed to Zipfl
l
l
l
llllllllllllllllllll0
500
1000
1500
2000
10
4.5
10
5
10
5.5
10
6
10
6.5
10
7
City population
No. of Cities
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
10
0
10
1
10
2
10
3
10
5
10
5.5
10
6
10
6.5
City population
Cumulative No. of Cities

Distribution of world cities New York 
 Mumbai (Bombay) 
 São Paulo 
 Delhi   Djakar ta  Los Angeles 
 Shanghai 
 Kolkata (Calcutta) 
 Moscou 
 Lagos 
 Pékin (Beijing) 
 Rio de Janeiro 
 Chicago  Ruhr 
 Hong Kong (Xianggang) 
 Washington 
 Chongqing 
 Boston 
 San Francisco − San José  Chennai (Madras) 
 Dallas − For t Wor th  Shenyang  Bangalore  Tianjin  Philadelphie  Hyderabad  Bandung 
 Detroit  Miami 
 Houston  Atlanta  Canton (Guangzhou) 
 Belo Horizonte  Ahmadabad  Pune 
 San Diego − Tijuana  Ibadan  Xian 
 Harbin  Saint−Petersbourg  Shantou  Wuhan  Hangzhou  Chengdu 
 Kano  Phoenix 
 Medan  Nanjing  Tampa − Saint−Petersburg  Seattle  Surabaya  Berlin  Por to Alegre 
 Recife  Salvador 
 Minneapolis  Kanpur 
 Jinan  For taleza  Curitiba  Cleveland  Brasilia  Hambourg  Cincinnati 
 Surat  Francfor t  Jaipur  Changchun  Denver  Lucknow 
 Saint−Louis  Shijiazhuang  Taiyuan  Zibo  Orlando  Dalian 
 Patna  Brownsville − McAllen − Matamoros − Reynosa 
 Por tland  Nagpur  Tangshan  Campinas  El Paso − Ciudad Juarez 
 Qingdao  Pittsburgh  Guiyang  Sacramento  Kunming  Belem 
 Charlotte  Stuttgar t  Munich  Salt Lake City  Anshan 
 Bénin  Wuxi  Zhengzhou  Changsha 
 Palembang 
 Goiânia  Kansas City  Nanchang  Indore  Indianapolis  San Antonio  Columbus 
 Mirat  Kaduna  Jilin  Las Vegas  Por t Harcour t  Lanzhou 
 Santos  Niznij Novgorod  Oshogbo  Manaus  Ujung Pandang (Macassar)  Bhopal  Nashik  Xinyang  Ludhiana  Vadodara  Cirebon  Raleigh − Durham  Agra  Bhubaneswar  Virginia Beach − Norfolk  Austin  Dandong−Sinuiju (Chine − Corée du Nord)  Vitoria  Coimbatore  Nashville  Xuzhou  Zhanjiang  Yogyakar ta  Luoyang  Urumqi  Nanning  Semarang  Greensboro − Winston−Salem  Fuzhou  Kochi  Mannheim  Visakhapatnam  Varanasi (Bénarès)  Rajkot  Baotou  Tanjungkarang (Bandar Lampung)  Novosibirsk  Aba  Suzhou  Huainan  Volgograd  Bielefeld  Qiqihar  Onitsha  Samara  Hefei 
 Leipzig−Halle  Handan  Louisville  Denpasar  Rostov  Madurai  Datong  São Luis  Iekaterinburg  Grand Rapids  Bengbu  Asansol  Jacksonville  Ningbo  Memphis  Natal  Oklahoma City  Allahabad  Jabalpur  Mataram  Tcheliabinsk  Jamshedpur  Wenzhou  Greenville − Spar tanburg  Tegal  Surakar ta  Maisuru  Richmond  Rongcheng  Dhanbad  Amritsar  Nuremberg  Buffalo  Birmingham  Aurangabad  Brême  Nouvelle−Orléans  Chemnitz−Zwickau  Maiduguri  Ogbomosho  Zhangjiakou  Maceio  Vijayawada  Hohhot  Albany  Daqing  Omsk  Abuja  Sholapur  Rochester  Kazan  Hanovre  Bhilai  Teresina  Srinagar  Saarbruck−Forbach (France)  Pingxiang  Aomen  Benxi  Saratov  Xianyang  Dresde  Ranchi  Baoding  Fresno  Thiruvananthapuram  Joao Pessoa  Zhenjiang  Knoxville  Guwahati  Samarinda  Chandigarh  Malang  Ufa  Ilorin  Krasnojarsk  Tucson  Kozhikkod 
10
6
10
6.5
10
7
10
7.5
10
0
10
0.5
10
1
10
1.5
10
2
Log Rank
Log Population
World city populations for 8 countries
logsize vs logrank
http://brenocon.com/blog/2009/05/zipfs- law- and- world- city- populations/

What does it mean?
Letp(x)dxbe the fraction of cities with a population betweenxand
x+dx
If this histogram is a straight line onloglogscales, then
lnp(x) =alnx+c
whereaandcare constants
Hence
p(x) =Cx
a
whereC=e
c
Distributions of this form are said to follow apower law
The constantais called the exponent of the power law
We typically don't care aboutc.

What does it mean?
Letp(x)dxbe the fraction of cities with a population betweenxand
x+dx
If this histogram is a straight line onloglogscales, then
lnp(x) =alnx+c
whereaandcare constants
Hence
p(x) =Cx
a
whereC=e
c
Distributions of this form are said to follow apower law
The constantais called the exponent of the power law
We typically don't care aboutc.

The power law distribution
Name f(x) Notes
Power lawx
a
Pareto distribution
Exponentiale
lx
log-normal
1
x
exp(
(lnxm)
2
2s
2)
Power lawx
a
Zeta distribution
Power lawx
a
x=1,. . .,n, Zipf's dist'
Yule
G(x)
G(x+a)
Poisson l
x
/x!

Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
The number of hits on web pages
The number of papers scientist write
The number of citations received by papers
Annual incomes
Sales of books, music; in fact anything that can be sold

Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
The number of hits on web pages
The number of papers scientist write
The number of citations received by papers
Annual incomes
Sales of books, music; in fact anything that can be sold

Zipf plotsBlackouts Fires Flares
Moby Dick Terrorism Web links
10
−8
10
−6
10
−4
10
−2
10
0
10
−8
10
−6
10
−4
10
−2
10
0
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
x
1−P(x)

Distributional properties

The power law distribution
The power-law distribution is
p(x)µx
a
wherea, the scaling parameter, is a constant
The scaling parameter typically lies in the range 2<a<3, although
there are some occasional exceptions
Typically, theentireprocess doesn't obey a power law
Instead, the power law applies only for values greater than some
minimumxmin

Power law: PDF & CDF
For the continuous PL, the pdf is
p(x) =
a1
xmin

x
xmin

a
wherea>1 andxmin>0.
The CDF is:
P(x) =1

x
xmin

a+1PDF
CDF
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0 2.5 5.0 7.5 10.0
x
1.501.752.002.252.50
a

Power law: PDF & CDF
For the discrete power law, the pmf is
p(x) =
x
a
z(a,xmin)
where
z(a,xmin) =
¥
å
n=0
(n+xmin)
a
is the generalised zeta function
Whenxmin=1,z(a,1)is the standard
zeta functionPDF
CDF
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0 2.5 5.0 7.5 10.0
x
1.501.752.002.252.50
a

Moments
Moments:
hx
m
i=E[X
m
] =
Z
¥
xmin
x
m
p(x) =
a1
a1m
x
m
min
Hence, whenma1, we have diverging moments
So when
a<2, all moments are innite
a<3, all second and higher-order moments are innite
a<4, all third order and higher-order moments are innite
....

Moments
Moments:
hx
m
i=E[X
m
] =
Z
¥
xmin
x
m
p(x) =
a1
a1m
x
m
min
Hence, whenma1, we have diverging moments
So when
a<2, all moments are innite
a<3, all second and higher-order moments are innite
a<4, all third order and higher-order moments are innite
....

Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=

x
1/2
xmin

a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population

Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=

x
1/2
xmin

a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population

Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=

x
1/2
xmin

a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population

Top-heavy distribution & the 80/20 rule
Pareto principle: aka 80/20 rule
The law of the vital few, and the principle of factor sparsity states that, for many
events, roughly 80% of the effects come from 20% of the causes
For example, the distribution of world GDP
Population quantile Income
Richest 20% 82.70%
Second 20% 11.75%
Third 20% 2.30%
Fourth 20% 1.85%
Poorest 20% 1.40%
Other examples are:
80% of your prots come from 20% of your customers
80% of your complaints come from 20% of your customers
80% of your prots come from 20% of the time you spend

Top-heavy distribution & the 80/20 rule
Pareto principle: aka 80/20 rule
The law of the vital few, and the principle of factor sparsity states that, for many
events, roughly 80% of the effects come from 20% of the causes
For example, the distribution of world GDP
Population quantile Income
Richest 20% 82.70%
Second 20% 11.75%
Third 20% 2.30%
Fourth 20% 1.85%
Poorest 20% 1.40%
Other examples are:
80% of your prots come from 20% of your customers
80% of your complaints come from 20% of your customers
80% of your prots come from 20% of the time you spend

Scale-free distributions
The power law distribution is often referred to as ascale-freedistribution
A power law is the only distribution that is the same on regardless of the
scale
For anyb, we have
p(bx) =g(b)p(x)
That is, if we increase the scale by which we measurexby a factor ofb,
the shape of the distributionp(x)is unchanged, except for a multiplicative
constant
The PL distribution is the only distribution with this property

Scale-free distributions
The power law distribution is often referred to as ascale-freedistribution
A power law is the only distribution that is the same on regardless of the
scale
For anyb, we have
p(bx) =g(b)p(x)
That is, if we increase the scale by which we measurexby a factor ofb,
the shape of the distributionp(x)is unchanged, except for a multiplicative
constant
The PL distribution is the only distribution with this property

Random numbers
For the continuous case, we can generate random numbers using the
standard inversion method:
x=xmin(1u)
1/(a1)
whereUU(0,1)

Random numbers
The discrete case is a bit more tricky
Instead, we have to solve the CMF numerically by “doubling up” and a
binary search
So for a givenu, we rst bound the solution to the equation via:
1:x2:=xmin
2:repeat
3: x1:=x2
4: x2:=2x1
5:untilP(x2)<1u
Basically, the algorithm tests whetheru2[x,2x), starting withx=xmin
Once we have the region we use a binary search

Random numbers
The discrete case is a bit more tricky
Instead, we have to solve the CMF numerically by “doubling up” and a
binary search
So for a givenu, we rst bound the solution to the equation via:
1:x2:=xmin
2:repeat
3: x1:=x2
4: x2:=2x1
5:untilP(x2)<1u
Basically, the algorithm tests whetheru2[x,2x), starting withx=xmin
Once we have the region we use a binary search

Fitting power law distributions
Suppose we knowxminand wish to estimate the exponenta.

Fitting power law distributions
Suppose we knowxminand wish to estimate the exponenta.

Method 1
1
Bin your data:[xmin,xmin+4x),[xmin+4x,xmin+24x)
2
Plot your data on a log-log plot
3
Use least squares to estimateaBin size: 0.01 Bin size: 0.1 Bin size: 1.0
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
2
10
3
10
0
10
1
10
2
10
3
10
0
10
1
10
2
10
3
x
CDF
You could also use logarithmic binning (which is better) or should I say not as
bad?

Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value

Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value

Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value

Method 3: Log-Likelihood
The log-likelihood isn't that hard to derive
Continuous:
`(ajx,xmin) =nlog(a1)nlog(xmin)a
n
å
i=1
log

xi
xmin

Discrete:
`(ajx,xmin) =nlog[z(a,xmin)]a
n
å
i=1
log(xi)
=nlog[z(a)] +nlog

xmin1
å
i=1
xi
!
a
n
å
i=1
log(xi)

MLEs
Maximising the log-likelihood gives
ˆa=1+n

n
å
i=1
ln

xi
xxmin

!
1
An estimate of the associated error is
s=
a1
p
n
The discrete case is a bit more tricky and involves ignoring higher order terms,
to get:
ˆa'1+n

n
å
i=1
ln

xi
xxmin0.5

!
1

MLEs
Maximising the log-likelihood gives
ˆa=1+n

n
å
i=1
ln

xi
xxmin

!
1
An estimate of the associated error is
s=
a1
p
n
The discrete case is a bit more tricky and involves ignoring higher order terms,
to get:
ˆa'1+n

n
å
i=1
ln

xi
xxmin0.5

!
1

Estimatingxmin
Recall that the power-law pdf is
p(x) =
a1
xmin

x
xmin

a
wherea>1 andxmin>0
xminisn't a parameter in the usual since - it's a cut-off in the state space
Typically power-laws are only present in the distributional tails.
So how much of the data should wediscardso our distribution ts a
power-law?

Estimatingxmin: method 1
Themostcommon way is just look at the
log-log plot
What could be easier!Blackouts Fires Flares
Moby Dick Terrorism Web links
10
−8
10
−6
10
−4
10
−2
10
0
10
−8
10
−6
10
−4
10
−2
10
0
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
x
1−P(x)

Estimatingxmin: method 2
Use a "Bayesian approach" - the BIC:
2`+klnn=2`+xminlnn
Increasingxminincreases the number of parameters
Only suitable for discrete distributions

Estimatingxmin: method 3
Minimise the distance between the data and the tted model CDFs:
D=max
xxmin
jS(x)P(x)j
whereS(x)is the CDF of the data andP(x)is the theoretical CDF (the
Kolmogorov-Smirnov statistic)
Our estimatexminis then the value ofxminthat minimisesD
Use some form of bootstrapping to get a handle on uncertainty ofxmin

Mechanisms for generating PL distributions

Word distributions
Suppose we type randomly on a
typewriter
We hit the space bar with probabilityqs
and a letter with probabilityql
If there aremletters in the alphabet,
then
ql= (1qs)/m
The distribution of word frequency has
the formp(x)x
a
http://activerain.com/

Word distributions
Suppose we type randomly on a
typewriter
We hit the space bar with probabilityqs
and a letter with probabilityql
If there aremletters in the alphabet,
then
ql= (1qs)/m
The distribution of word frequency has
the formp(x)x
a
http://activerain.com/

Relationship betweenavalue and Zipf's principle of least
effort.
avalue Examples in literature Least effort for
a<1.6 Advanced schizophrenia
1.6a<2 Military combat texts, Wikipedia, Web
pages listed on the open directory project
Annotator
a=2 Single author texts Equal effort levels
2<a2.4 Multi author texts Audience
a>2.4 Fragmented discourse schizophrenia

Random walks
Suppose we have a 1d random walk
At each unit of time, we move1l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l ll l l
−4
−2
0
2
4
0 10 20 30
Time
Position
If we start atn=0, what is the probability for therstreturn time at timet

Random walks
Suppose we have a 1d random walk
At each unit of time, we move1l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l ll l l
−4
−2
0
2
4
0 10 20 30
Time
Position
If we start atn=0, what is the probability for therstreturn time at timet

Random walks
With a bit of algebra, we get:
f2n=
(
n
2n
)
(2n1)2
2n
For largen, we get
f2n'
s
2
n(2n1)
2
So asn!¥, we get
f2nn
3/2
So the distribution of return times follows a power law with exponent
a=3/2!
Tenuous link to phylogenetics

Random walks
With a bit of algebra, we get:
f2n=
(
n
2n
)
(2n1)2
2n
For largen, we get
f2n'
s
2
n(2n1)
2
So asn!¥, we get
f2nn
3/2
So the distribution of return times follows a power law with exponent
a=3/2!
Tenuous link to phylogenetics

Phase transitions and critical phenomena
Suppose we have a simple lattice. Each
square is coloured with probability
p=0.5
We can look at the clusters of coloured
squares. For example, the mean cluster
area,hsi, of a randomly chosen square:
If a square is white, then zero
If a square is coloured, but surround
by white, then one
etc
Whenpis small,hsiis independent of
the lattice size
Whenpis large,hsidepends on the
lattice size

Phase transitions and critical phenomena
Suppose we have a simple lattice. Each
square is coloured with probability
p=0.5
We can look at the clusters of coloured
squares. For example, the mean cluster
area,hsi, of a randomly chosen square:
If a square is white, then zero
If a square is coloured, but surround
by white, then one
etc
Whenpis small,hsiis independent of
the lattice size
Whenpis large,hsidepends on the
lattice size

Phase transitions and critical phenomena
As we increasep, the value ofhsialso
increases
For somep,hsistarts to increase with
the lattice size
This is know as the critical value, and is
p=pc=0.5927462..
If we calculate the distribution ofp(s),
then whenp=pc,p(s)follows a
power-law distributionp=0.3
p=0.5927...
p=0.9

Forest re
This simple model has been used as a primitive model of forest res
We start with an empty lattice and trees grow at random
Every so often, a forest re strikes at random
If the forest is too connected, i.e. largep, then the forest burns down
So (it is argued) that the forest size oscillates aroundp=pc
This is an example ofself-organised criticality

Forest re
This simple model has been used as a primitive model of forest res
We start with an empty lattice and trees grow at random
Every so often, a forest re strikes at random
If the forest is too connected, i.e. largep, then the forest burns down
So (it is argued) that the forest size oscillates aroundp=pc
This is an example ofself-organised criticality

Future work
There isn't even an R package for power law estimation
Writing this talk I have (more or less) written one
Use a Bayesian change point model to estimatexminin a vaguely
sensible way
RJMCMC to change between the power law and other heavy tailed
distributions
References
A. Clauset, C.R. Shalizi, and M.E.J. Newman.
Power-lawdistributionsinempiricaldata.
http://arxiv.org/abs/0706.1062
MEJ Newman.Powerlaws,ParetodistributionsandZipf'slaw.
http://arxiv.org/abs/cond-mat/0412004