Research on Social Dynamics in Wikipedia

EdChi 1,303 views 41 slides Nov 14, 2008
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

presented at Stanford Open Source lab unconference and Recent Changes Camp 2008


Slide Content

2007-06-17 Ed H. Chi - Who writes Wikipedia? 1
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Augmented Social Cognition:Augmented Social Cognition:
Who Edits Wikipedia?Who Edits Wikipedia?
Ed H. Chi
Augmented Social Cognition Area
Palo Alto Research Center

2007-06-17 Ed H. Chi - Who writes Wikipedia? 2
WikipediaWikipedia

2007-06-17 Ed H. Chi - Who writes Wikipedia? 3
High-end of the collaboration spectrumHigh-end of the collaboration spectrum
Groups utilize systems to
make sense and share
complex topics and
materials.
Wikipedia (social status)
Slashdot (karma points)
eHow.com
Lostpedia.com

2007-06-17 Ed H. Chi - Who writes Wikipedia? 4
Middle of the spectrumMiddle of the spectrum
Systems that evolve structures
that can be used to organize
information.
Del.icio.us
Flickr
YouTube
Friendster

2007-06-17 Ed H. Chi - Who writes Wikipedia? 5
Lightweight social processesLightweight social processes
Counting votes
–A way to increase signal-to-noise ratio
–Information faddishness
Examples:
–Digg.com
–Most bookmarked items on del.icio.us
–Estimating the weight of an ox or
temperature of a room
–The true value of a stock
–PageRank or Hub / Authority
algorithms

2007-06-17 Ed H. Chi - Who writes Wikipedia? 6
Layers of Models NeededLayers of Models Needed
Heavier
collaboration
Voting systems
Digg.com
Collaborative
Creation
Wikipedia
Col. Information
Structures
Slashdot
eHow.com
Del.icio.us
IBM dogear
PageRank
Flickr
Understanding of
micro-economics
•of foraging [PARC]
•Personal vs. group [Huberman,
Adamic]
•Wisdom of Crowd [Surowieki]
•Information cascades [Anderson
and Holt]
Understanding of
conflicts and
coordination
•Wikipedia
coordination costs
[PARC]
•Invisible Colleges [Sandstrom]
•Interference effects [Pirolli]
•Co-laboratories [Olson and Olson]
•Community networks / Col. Problem
solving [Carroll]
Understanding of info
and social networks
•Tag network analysis
[PARC, Golder, Yahoo]
•Structural holes (info brokerage) [Burt]
•Network constraints and structure
[various]
•Semantic of semiotic structures / words
[IR, LSA]

2007-06-17 Ed H. Chi - Who writes Wikipedia? 7
Research VisionResearch Vision
Augmented Social CognitionAugmented Social Cognition
Cognition: the ability to remember, think, and reason; the faculty
of knowing.
Social Cognition: the ability of a group to remember, think, and
reason; the construction of knowledge structures by a group.
–(not quite the same as in the branch of psychology that studies the
cognitive processes involved in social interaction, though included)
Augmented Social Cognition: Supported by systems, the
enhancement of the ability of a group to remember, think, and
reason; the system-supported construction of knowledge
structures by a group.

2007-06-17 Ed H. Chi - Who writes Wikipedia? 8
The first step in solving any The first step in solving any
interesting problem is to get some interesting problem is to get some
paper and pencil.paper and pencil.
John Tukey
(not a direct quote)

2007-06-17 Ed H. Chi - Who writes Wikipedia? 9
Increasing Coordination Cost in WikipediaIncreasing Coordination Cost in Wikipedia
(joint work with Niki Kittur, Bongwon Suh,
Bryan Pendleton)
Published in CHI2007 conference: Aniket Kittur, Bongwon Suh, Bryan
Pendleton, Ed H. ChiHe Says, She Says: Conflict and Coordination in Wikipedia. In Proc. of ACM
Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM
Press. San Jose, CA

2007-06-17 Ed H. Chi - Who writes Wikipedia? 10
What is Wikipedia?What is Wikipedia?
“Wikipedia is the best thing ever. Anyone in the world can write anything they
want about any subject, so you know you’re getting the best possible
information.”
– Steve Carell, The Office

2007-06-17 Ed H. Chi - Who writes Wikipedia? 11
Increasing Coordination Costs in WikipediaIncreasing Coordination Costs in Wikipedia
Understanding coordination costs is vital for long-term
viability of collaborative information environment
Data:
–Entire dump on July 2, 2006
–58 million revisions
–4.7 million wiki pages
–2.4 million article pages
–800 gigabytes

2007-06-17 Ed H. Chi - Who writes Wikipedia? 12
Less direct workLess direct work
Decrease in proportion of edits to article page
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
2001 2002 2003 2004 2005 2006
Edit proportion
70%

2007-06-17 Ed H. Chi - Who writes Wikipedia? 13
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit Proportion
More indirect workMore indirect work
Increase in proportion of edits to user talk
8%

2007-06-17 Ed H. Chi - Who writes Wikipedia? 14
More indirect workMore indirect work
Increase in proportion of edits to user talk
Increase in proportion of edits to procedure
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit proportion
11%

2007-06-17 Ed H. Chi - Who writes Wikipedia? 15
More maintenance workMore maintenance work
Increase in proportion of edits that are reverts
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit proportion
7%

2007-06-17 Ed H. Chi - Who writes Wikipedia? 16
More wasted workMore wasted work
Increase in proportion of edits that are reverts
Increase in proportion of edits reverting vandalism
% Edits (marked Vandalism)
0
0.005
0.01
0.015
0.02
0.025
0.03
2001 2002 2003 2004 2005
Edit proportion
1-2%

2007-06-17 Ed H. Chi - Who writes Wikipedia? 17
Global levelGlobal level
Conflict and coordination costs are growing
–Less direct work (articles)
+More indirect work (article talk, user, procedure)
+More maintenance work (reverts, vandalism)
60%
65%
70%
75%
80%
85%
90%
95%
100%
2001 2002 2003 2004 2005 2006
Percentage of total edits
Article
User
Article Talk
User Talk
Other
Maintenance

2007-06-17 Ed H. Chi - Who writes Wikipedia? 18
Conflict at the article levelConflict at the article level
Conflict is growing at the global level
We have some idea about where it is
But what defines conflict at the local level?
Build a characterization model of article conflict
–Identify metrics relevant to conflict
–Automatically identify high-conflict articles

2007-06-17 Ed H. Chi - Who writes Wikipedia? 19
Measure of controversyMeasure of controversy
“Controversial” tag
Use # revisions tagged controversial

2007-06-17 Ed H. Chi - Who writes Wikipedia? 20
Page metricsPage metrics
Possible metrics for identifying conflict in articles
Metric type Page Type
Revisions (#) Article, talk, article/talk
Page length Article, talk, article/talk
Unique editors Article, talk, article/talk
Unique editors / revisions Article, talk
Links from other articles Article, talk
Links to other articles Article, talk
Anonymous edits (#, %) Article, talk
Administrator edits (#, %) Article, talk
Minor edits (#, %) Article, talk
Reverts (#, by unique
editors)
Article

2007-06-17 Ed H. Chi - Who writes Wikipedia? 21
Performance: Cross-validationPerformance: Cross-validation
5x cross-validation, R
2
= 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Actual controversial revisions

2007-06-17 Ed H. Chi - Who writes Wikipedia? 22
Performance: Cross-validationPerformance: Cross-validation
5x cross-validation, R
2
= 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Actual controversial revisions

2007-06-17 Ed H. Chi - Who writes Wikipedia? 23
Determinants of conflictDeterminants of conflict
Revisions (talk)
Minor edits (talk)
Unique editors (talk)
Revisions (article)
Unique editors (article)
Anonymous edits (talk)
Anonymous edits (article)
Highly weighted features of conflict model:

2007-06-17 Ed H. Chi - Who writes Wikipedia? 24
Model Generalization and Model Generalization and
Validation surveyValidation survey
Applied model to untagged articles (100+ edits)
Sampled range of predicted conflict scores
Rated by expert Wikipedians
Significantly correlated with predicted scores
– By rank correlation, p < 0.013 (Spearman’s rho)
Validates characterization model
–Detects conflicts even for articles with no ground truth

Who edits Wikipedia?Who edits Wikipedia?
% of edits made by administrators% of edits made by administrators
2007-06-17 Ed H. Chi - Who writes Wikipedia? 25

% of edits by 10k+ editors% of edits by 10k+ editors
2007-06-17 Ed H. Chi - Who writes Wikipedia? 26

Word changes made by adminsWord changes made by admins
2007-06-17 Ed H. Chi - Who writes Wikipedia? 27

Shifting user population in WikipediaShifting user population in Wikipedia
(more and more bottom driven!)(more and more bottom driven!)
2007-06-17 Ed H. Chi - Who writes Wikipedia? 28

Proportion of edits made by top Proportion of edits made by top
editors in Wikipediaeditors in Wikipedia
2007-06-17 Ed H. Chi - Who writes Wikipedia? 29

Long tail of participation in WikipediaLong tail of participation in Wikipedia
2007-06-17 Ed H. Chi - Who writes Wikipedia? 30

The participation architecture is a The participation architecture is a
power lawpower law
2007-06-17 Ed H. Chi - Who writes Wikipedia? 31

Only 60% of top 1% editors stay around Only 60% of top 1% editors stay around
month to month!month to month!
2007-06-17 Ed H. Chi - Who writes Wikipedia? 32

2007-06-17 Ed H. Chi - Who writes Wikipedia? 33
Living Laboratory:Living Laboratory:
Prototyping Social Applications on Prototyping Social Applications on
the Internetthe Internet
Create a Living Laboratory as a platform to
develop, test, and market our innovations, and
as a vehicle for creating collaborations and
thought leadership.

2007-06-17 Ed H. Chi - Who writes Wikipedia? 34
WikiDashboardWikiDashboard
Joint work with
Bongwon Suh, Aniket Kittur, Bryan Pendleton

2007-06-17 Ed H. Chi - Who writes Wikipedia? 35
Risks for Using WikipediaRisks for Using Wikipedia
Factual accuracy
Motives of editors
Uncertain expertise
Volatility
Spotty coverage
Unproven/non-independent source
[Denning et al. 2005]

2007-06-17 Ed H. Chi - Who writes Wikipedia? 36
Social DashboardSocial Dashboard
Social translucent for effective communication and collaboration
–Make socially significant information visible and salient
–Support awareness of the rules and constraints
–Accountability for actions
Wikis can be a prime candidate
–Every edit is logged and retrievable
–WikiScanner.com
–WikiRage.com
–Intellipedia
[Erickson and Kellogg 2002]

2007-06-17 Ed H. Chi - Who writes Wikipedia? 37
WikiDashboardWikiDashboard
Surfacing hidden social context to users
For readers
–Any incidents in the past e.g. A sudden burst of edits?
–Who are the editors?
–What is their motivation / point of views / expertise / topics of
interest
–Help them judging the quality/trustworthiness/usefulness of
an article
For writers
–Measure expertise / contribution / reputation
–Motivate them to be more active / responsible (?)

2007-06-17 Ed H. Chi - Who writes Wikipedia? 38
Article DashboardArticle Dashboard

2007-06-17 Ed H. Chi - Who writes Wikipedia? 39
User DashboardUser Dashboard

2007-06-17 Ed H. Chi - Who writes Wikipedia? 40
Drilling DownDrilling Down
List of every edits that a user made
Let readers examine each individual revision for validity, which is hard to
accomplish when only provided with aggregate visual summaries.

2007-06-17 Ed H. Chi - Who writes Wikipedia? 41
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Augmented Social Cognition:Augmented Social Cognition:
From Social Foraging to Social SensemakingFrom Social Foraging to Social Sensemaking
Research Vision: Understand how social
computing systems enhance the ability of a
group of people to remember, think, and reason.
Living Laboratory: Create breakthrough
applications that harness collective intelligence
to improve knowledge capture, transfer, and
discovery.