An introduction to power laws
Colin Gillespie
November 15, 2012
Talk outline
1
Introduction to power laws
2
Distributional properties
3
Parameter inference
4
Power law generating mechanisms
http://xkcd.com/
Classic example: distribution of US cities
Some data sets vary over enormous
range
US towns & cities:
Dufeld (pop 52)
New York City (pop 8 mil)
The data is highlyright-skewed
When the data is plotted on a
logarithmic scale, itseemsto follow a
straight line
This observation is attributed to Zipfl
l
l
l
llllllllllllllllllll0
500
1000
1500
2000
10
4.5
10
5
10
5.5
10
6
10
6.5
10
7
City population
No. of Cities
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
10
0
10
1
10
2
10
3
10
5
10
5.5
10
6
10
6.5
City population
Cumulative No. of Cities
Distribution of world cities New York
Mumbai (Bombay)
São Paulo
Delhi Djakar ta Los Angeles
Shanghai
Kolkata (Calcutta)
Moscou
Lagos
Pékin (Beijing)
Rio de Janeiro
Chicago Ruhr
Hong Kong (Xianggang)
Washington
Chongqing
Boston
San Francisco − San José Chennai (Madras)
Dallas − For t Wor th Shenyang Bangalore Tianjin Philadelphie Hyderabad Bandung
Detroit Miami
Houston Atlanta Canton (Guangzhou)
Belo Horizonte Ahmadabad Pune
San Diego − Tijuana Ibadan Xian
Harbin Saint−Petersbourg Shantou Wuhan Hangzhou Chengdu
Kano Phoenix
Medan Nanjing Tampa − Saint−Petersburg Seattle Surabaya Berlin Por to Alegre
Recife Salvador
Minneapolis Kanpur
Jinan For taleza Curitiba Cleveland Brasilia Hambourg Cincinnati
Surat Francfor t Jaipur Changchun Denver Lucknow
Saint−Louis Shijiazhuang Taiyuan Zibo Orlando Dalian
Patna Brownsville − McAllen − Matamoros − Reynosa
Por tland Nagpur Tangshan Campinas El Paso − Ciudad Juarez
Qingdao Pittsburgh Guiyang Sacramento Kunming Belem
Charlotte Stuttgar t Munich Salt Lake City Anshan
Bénin Wuxi Zhengzhou Changsha
Palembang
Goiânia Kansas City Nanchang Indore Indianapolis San Antonio Columbus
Mirat Kaduna Jilin Las Vegas Por t Harcour t Lanzhou
Santos Niznij Novgorod Oshogbo Manaus Ujung Pandang (Macassar) Bhopal Nashik Xinyang Ludhiana Vadodara Cirebon Raleigh − Durham Agra Bhubaneswar Virginia Beach − Norfolk Austin Dandong−Sinuiju (Chine − Corée du Nord) Vitoria Coimbatore Nashville Xuzhou Zhanjiang Yogyakar ta Luoyang Urumqi Nanning Semarang Greensboro − Winston−Salem Fuzhou Kochi Mannheim Visakhapatnam Varanasi (Bénarès) Rajkot Baotou Tanjungkarang (Bandar Lampung) Novosibirsk Aba Suzhou Huainan Volgograd Bielefeld Qiqihar Onitsha Samara Hefei
Leipzig−Halle Handan Louisville Denpasar Rostov Madurai Datong São Luis Iekaterinburg Grand Rapids Bengbu Asansol Jacksonville Ningbo Memphis Natal Oklahoma City Allahabad Jabalpur Mataram Tcheliabinsk Jamshedpur Wenzhou Greenville − Spar tanburg Tegal Surakar ta Maisuru Richmond Rongcheng Dhanbad Amritsar Nuremberg Buffalo Birmingham Aurangabad Brême Nouvelle−Orléans Chemnitz−Zwickau Maiduguri Ogbomosho Zhangjiakou Maceio Vijayawada Hohhot Albany Daqing Omsk Abuja Sholapur Rochester Kazan Hanovre Bhilai Teresina Srinagar Saarbruck−Forbach (France) Pingxiang Aomen Benxi Saratov Xianyang Dresde Ranchi Baoding Fresno Thiruvananthapuram Joao Pessoa Zhenjiang Knoxville Guwahati Samarinda Chandigarh Malang Ufa Ilorin Krasnojarsk Tucson Kozhikkod
10
6
10
6.5
10
7
10
7.5
10
0
10
0.5
10
1
10
1.5
10
2
Log Rank
Log Population
World city populations for 8 countries
logsize vs logrank
http://brenocon.com/blog/2009/05/zipfs- law- and- world- city- populations/
What does it mean?
Letp(x)dxbe the fraction of cities with a population betweenxand
x+dx
If this histogram is a straight line onloglogscales, then
lnp(x) =alnx+c
whereaandcare constants
Hence
p(x) =Cx
a
whereC=e
c
Distributions of this form are said to follow apower law
The constantais called the exponent of the power law
We typically don't care aboutc.
What does it mean?
Letp(x)dxbe the fraction of cities with a population betweenxand
x+dx
If this histogram is a straight line onloglogscales, then
lnp(x) =alnx+c
whereaandcare constants
Hence
p(x) =Cx
a
whereC=e
c
Distributions of this form are said to follow apower law
The constantais called the exponent of the power law
We typically don't care aboutc.
The power law distribution
Name f(x) Notes
Power lawx
a
Pareto distribution
Exponentiale
lx
log-normal
1
x
exp(
(lnxm)
2
2s
2)
Power lawx
a
Zeta distribution
Power lawx
a
x=1,. . .,n, Zipf's dist'
Yule
G(x)
G(x+a)
Poisson l
x
/x!
Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
The number of hits on web pages
The number of papers scientist write
The number of citations received by papers
Annual incomes
Sales of books, music; in fact anything that can be sold
Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
The number of hits on web pages
The number of papers scientist write
The number of citations received by papers
Annual incomes
Sales of books, music; in fact anything that can be sold
The power law distribution
The power-law distribution is
p(x)µx
a
wherea, the scaling parameter, is a constant
The scaling parameter typically lies in the range 2<a<3, although
there are some occasional exceptions
Typically, theentireprocess doesn't obey a power law
Instead, the power law applies only for values greater than some
minimumxmin
Power law: PDF & CDF
For the continuous PL, the pdf is
p(x) =
a1
xmin
x
xmin
a
wherea>1 andxmin>0.
The CDF is:
P(x) =1
x
xmin
a+1PDF
CDF
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0 2.5 5.0 7.5 10.0
x
1.501.752.002.252.50
a
Power law: PDF & CDF
For the discrete power law, the pmf is
p(x) =
x
a
z(a,xmin)
where
z(a,xmin) =
¥
å
n=0
(n+xmin)
a
is the generalised zeta function
Whenxmin=1,z(a,1)is the standard
zeta functionPDF
CDF
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0 2.5 5.0 7.5 10.0
x
1.501.752.002.252.50
a
Moments
Moments:
hx
m
i=E[X
m
] =
Z
¥
xmin
x
m
p(x) =
a1
a1m
x
m
min
Hence, whenma1, we have diverging moments
So when
a<2, all moments are innite
a<3, all second and higher-order moments are innite
a<4, all third order and higher-order moments are innite
....
Moments
Moments:
hx
m
i=E[X
m
] =
Z
¥
xmin
x
m
p(x) =
a1
a1m
x
m
min
Hence, whenma1, we have diverging moments
So when
a<2, all moments are innite
a<3, all second and higher-order moments are innite
a<4, all third order and higher-order moments are innite
....
Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=
x
1/2
xmin
a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population
Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=
x
1/2
xmin
a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population
Distributional properties
For any power law with exponenta>1, the median is dened:
x
1/2=2
1/(a1)
xmin
If we use power-law to model wealth distribution, then we might be interested
in the fraction of wealth in the richer half:
R
¥
x
1/2
xp(x)dx
R
¥
xmin
xp(x)dx
=
x
1/2
xmin
a+2
=2
(a2)/(a1)
provideda>2, the integrals converge
When the wealth distribution was modelled using a power-law,awas
estimated to be 2.1, so 2
0.091
'94%of the wealth is in the hands of the
richer 50% of the population
Top-heavy distribution & the 80/20 rule
Pareto principle: aka 80/20 rule
The law of the vital few, and the principle of factor sparsity states that, for many
events, roughly 80% of the effects come from 20% of the causes
For example, the distribution of world GDP
Population quantile Income
Richest 20% 82.70%
Second 20% 11.75%
Third 20% 2.30%
Fourth 20% 1.85%
Poorest 20% 1.40%
Other examples are:
80% of your prots come from 20% of your customers
80% of your complaints come from 20% of your customers
80% of your prots come from 20% of the time you spend
Top-heavy distribution & the 80/20 rule
Pareto principle: aka 80/20 rule
The law of the vital few, and the principle of factor sparsity states that, for many
events, roughly 80% of the effects come from 20% of the causes
For example, the distribution of world GDP
Population quantile Income
Richest 20% 82.70%
Second 20% 11.75%
Third 20% 2.30%
Fourth 20% 1.85%
Poorest 20% 1.40%
Other examples are:
80% of your prots come from 20% of your customers
80% of your complaints come from 20% of your customers
80% of your prots come from 20% of the time you spend
Scale-free distributions
The power law distribution is often referred to as ascale-freedistribution
A power law is the only distribution that is the same on regardless of the
scale
For anyb, we have
p(bx) =g(b)p(x)
That is, if we increase the scale by which we measurexby a factor ofb,
the shape of the distributionp(x)is unchanged, except for a multiplicative
constant
The PL distribution is the only distribution with this property
Scale-free distributions
The power law distribution is often referred to as ascale-freedistribution
A power law is the only distribution that is the same on regardless of the
scale
For anyb, we have
p(bx) =g(b)p(x)
That is, if we increase the scale by which we measurexby a factor ofb,
the shape of the distributionp(x)is unchanged, except for a multiplicative
constant
The PL distribution is the only distribution with this property
Random numbers
For the continuous case, we can generate random numbers using the
standard inversion method:
x=xmin(1u)
1/(a1)
whereUU(0,1)
Random numbers
The discrete case is a bit more tricky
Instead, we have to solve the CMF numerically by doubling up and a
binary search
So for a givenu, we rst bound the solution to the equation via:
1:x2:=xmin
2:repeat
3: x1:=x2
4: x2:=2x1
5:untilP(x2)<1u
Basically, the algorithm tests whetheru2[x,2x), starting withx=xmin
Once we have the region we use a binary search
Random numbers
The discrete case is a bit more tricky
Instead, we have to solve the CMF numerically by doubling up and a
binary search
So for a givenu, we rst bound the solution to the equation via:
1:x2:=xmin
2:repeat
3: x1:=x2
4: x2:=2x1
5:untilP(x2)<1u
Basically, the algorithm tests whetheru2[x,2x), starting withx=xmin
Once we have the region we use a binary search
Fitting power law distributions
Suppose we knowxminand wish to estimate the exponenta.
Fitting power law distributions
Suppose we knowxminand wish to estimate the exponenta.
Method 1
1
Bin your data:[xmin,xmin+4x),[xmin+4x,xmin+24x)
2
Plot your data on a log-log plot
3
Use least squares to estimateaBin size: 0.01 Bin size: 0.1 Bin size: 1.0
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
0
10
1
10
2
10
3
10
0
10
1
10
2
10
3
10
0
10
1
10
2
10
3
x
CDF
You could also use logarithmic binning (which is better) or should I say not as
bad?
Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value
Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value
Method 2
Similar to method 1, but
Don't bin, just plot the data CDF
Then use least squares to estimatea
Using linear regression is a bad idea
Error estimates are completely off
It doesn't even provide a good point estimate ofa
On the bright side you do get a goodR
2
value
Method 3: Log-Likelihood
The log-likelihood isn't that hard to derive
Continuous:
`(ajx,xmin) =nlog(a1)nlog(xmin)a
n
å
i=1
log
xi
xmin
Discrete:
`(ajx,xmin) =nlog[z(a,xmin)]a
n
å
i=1
log(xi)
=nlog[z(a)] +nlog
xmin1
å
i=1
xi
!
a
n
å
i=1
log(xi)
MLEs
Maximising the log-likelihood gives
a=1+n
n
å
i=1
ln
xi
xxmin
!
1
An estimate of the associated error is
s=
a1
p
n
The discrete case is a bit more tricky and involves ignoring higher order terms,
to get:
a'1+n
n
å
i=1
ln
xi
xxmin0.5
!
1
MLEs
Maximising the log-likelihood gives
a=1+n
n
å
i=1
ln
xi
xxmin
!
1
An estimate of the associated error is
s=
a1
p
n
The discrete case is a bit more tricky and involves ignoring higher order terms,
to get:
a'1+n
n
å
i=1
ln
xi
xxmin0.5
!
1
Estimatingxmin
Recall that the power-law pdf is
p(x) =
a1
xmin
x
xmin
a
wherea>1 andxmin>0
xminisn't a parameter in the usual since - it's a cut-off in the state space
Typically power-laws are only present in the distributional tails.
So how much of the data should wediscardso our distribution ts a
power-law?
Estimatingxmin: method 1
Themostcommon way is just look at the
log-log plot
What could be easier!Blackouts Fires Flares
Moby Dick Terrorism Web links
10
−8
10
−6
10
−4
10
−2
10
0
10
−8
10
−6
10
−4
10
−2
10
0
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
10
0
10
2
10
4
10
6
x
1−P(x)
Estimatingxmin: method 2
Use a "Bayesian approach" - the BIC:
2`+klnn=2`+xminlnn
Increasingxminincreases the number of parameters
Only suitable for discrete distributions
Estimatingxmin: method 3
Minimise the distance between the data and the tted model CDFs:
D=max
xxmin
jS(x)P(x)j
whereS(x)is the CDF of the data andP(x)is the theoretical CDF (the
Kolmogorov-Smirnov statistic)
Our estimatexminis then the value ofxminthat minimisesD
Use some form of bootstrapping to get a handle on uncertainty ofxmin
Mechanisms for generating PL distributions
Word distributions
Suppose we type randomly on a
typewriter
We hit the space bar with probabilityqs
and a letter with probabilityql
If there aremletters in the alphabet,
then
ql= (1qs)/m
The distribution of word frequency has
the formp(x)x
a
http://activerain.com/
Word distributions
Suppose we type randomly on a
typewriter
We hit the space bar with probabilityqs
and a letter with probabilityql
If there aremletters in the alphabet,
then
ql= (1qs)/m
The distribution of word frequency has
the formp(x)x
a
http://activerain.com/
Relationship betweenavalue and Zipf's principle of least
effort.
avalue Examples in literature Least effort for
a<1.6 Advanced schizophrenia
1.6a<2 Military combat texts, Wikipedia, Web
pages listed on the open directory project
Annotator
a=2 Single author texts Equal effort levels
2<a2.4 Multi author texts Audience
a>2.4 Fragmented discourse schizophrenia
Random walks
Suppose we have a 1d random walk
At each unit of time, we move1l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l ll l l
−4
−2
0
2
4
0 10 20 30
Time
Position
If we start atn=0, what is the probability for therstreturn time at timet
Random walks
Suppose we have a 1d random walk
At each unit of time, we move1l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l ll l l
−4
−2
0
2
4
0 10 20 30
Time
Position
If we start atn=0, what is the probability for therstreturn time at timet
Random walks
With a bit of algebra, we get:
f2n=
(
n
2n
)
(2n1)2
2n
For largen, we get
f2n'
s
2
n(2n1)
2
So asn!¥, we get
f2nn
3/2
So the distribution of return times follows a power law with exponent
a=3/2!
Tenuous link to phylogenetics
Random walks
With a bit of algebra, we get:
f2n=
(
n
2n
)
(2n1)2
2n
For largen, we get
f2n'
s
2
n(2n1)
2
So asn!¥, we get
f2nn
3/2
So the distribution of return times follows a power law with exponent
a=3/2!
Tenuous link to phylogenetics
Phase transitions and critical phenomena
Suppose we have a simple lattice. Each
square is coloured with probability
p=0.5
We can look at the clusters of coloured
squares. For example, the mean cluster
area,hsi, of a randomly chosen square:
If a square is white, then zero
If a square is coloured, but surround
by white, then one
etc
Whenpis small,hsiis independent of
the lattice size
Whenpis large,hsidepends on the
lattice size
Phase transitions and critical phenomena
Suppose we have a simple lattice. Each
square is coloured with probability
p=0.5
We can look at the clusters of coloured
squares. For example, the mean cluster
area,hsi, of a randomly chosen square:
If a square is white, then zero
If a square is coloured, but surround
by white, then one
etc
Whenpis small,hsiis independent of
the lattice size
Whenpis large,hsidepends on the
lattice size
Phase transitions and critical phenomena
As we increasep, the value ofhsialso
increases
For somep,hsistarts to increase with
the lattice size
This is know as the critical value, and is
p=pc=0.5927462..
If we calculate the distribution ofp(s),
then whenp=pc,p(s)follows a
power-law distributionp=0.3
p=0.5927...
p=0.9
Forest re
This simple model has been used as a primitive model of forest res
We start with an empty lattice and trees grow at random
Every so often, a forest re strikes at random
If the forest is too connected, i.e. largep, then the forest burns down
So (it is argued) that the forest size oscillates aroundp=pc
This is an example ofself-organised criticality
Forest re
This simple model has been used as a primitive model of forest res
We start with an empty lattice and trees grow at random
Every so often, a forest re strikes at random
If the forest is too connected, i.e. largep, then the forest burns down
So (it is argued) that the forest size oscillates aroundp=pc
This is an example ofself-organised criticality
Future work
There isn't even an R package for power law estimation
Writing this talk I have (more or less) written one
Use a Bayesian change point model to estimatexminin a vaguely
sensible way
RJMCMC to change between the power law and other heavy tailed
distributions
References
A. Clauset, C.R. Shalizi, and M.E.J. Newman.
Power-lawdistributionsinempiricaldata.
http://arxiv.org/abs/0706.1062
MEJ Newman.Powerlaws,ParetodistributionsandZipf'slaw.
http://arxiv.org/abs/cond-mat/0412004