presented at Stanford Open Source lab unconference and Recent Changes Camp 2008
Size: 2.97 MB
Language: en
Added: Nov 14, 2008
Slides: 41 pages
Slide Content
2007-06-17 Ed H. Chi - Who writes Wikipedia? 1
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Augmented Social Cognition:Augmented Social Cognition:
Who Edits Wikipedia?Who Edits Wikipedia?
Ed H. Chi
Augmented Social Cognition Area
Palo Alto Research Center
2007-06-17 Ed H. Chi - Who writes Wikipedia? 2
WikipediaWikipedia
2007-06-17 Ed H. Chi - Who writes Wikipedia? 3
High-end of the collaboration spectrumHigh-end of the collaboration spectrum
Groups utilize systems to
make sense and share
complex topics and
materials.
Wikipedia (social status)
Slashdot (karma points)
eHow.com
Lostpedia.com
2007-06-17 Ed H. Chi - Who writes Wikipedia? 4
Middle of the spectrumMiddle of the spectrum
Systems that evolve structures
that can be used to organize
information.
Del.icio.us
Flickr
YouTube
Friendster
2007-06-17 Ed H. Chi - Who writes Wikipedia? 5
Lightweight social processesLightweight social processes
Counting votes
–A way to increase signal-to-noise ratio
–Information faddishness
Examples:
–Digg.com
–Most bookmarked items on del.icio.us
–Estimating the weight of an ox or
temperature of a room
–The true value of a stock
–PageRank or Hub / Authority
algorithms
2007-06-17 Ed H. Chi - Who writes Wikipedia? 6
Layers of Models NeededLayers of Models Needed
Heavier
collaboration
Voting systems
Digg.com
Collaborative
Creation
Wikipedia
Col. Information
Structures
Slashdot
eHow.com
Del.icio.us
IBM dogear
PageRank
Flickr
Understanding of
micro-economics
•of foraging [PARC]
•Personal vs. group [Huberman,
Adamic]
•Wisdom of Crowd [Surowieki]
•Information cascades [Anderson
and Holt]
Understanding of
conflicts and
coordination
•Wikipedia
coordination costs
[PARC]
•Invisible Colleges [Sandstrom]
•Interference effects [Pirolli]
•Co-laboratories [Olson and Olson]
•Community networks / Col. Problem
solving [Carroll]
Understanding of info
and social networks
•Tag network analysis
[PARC, Golder, Yahoo]
•Structural holes (info brokerage) [Burt]
•Network constraints and structure
[various]
•Semantic of semiotic structures / words
[IR, LSA]
2007-06-17 Ed H. Chi - Who writes Wikipedia? 7
Research VisionResearch Vision
Augmented Social CognitionAugmented Social Cognition
Cognition: the ability to remember, think, and reason; the faculty
of knowing.
Social Cognition: the ability of a group to remember, think, and
reason; the construction of knowledge structures by a group.
–(not quite the same as in the branch of psychology that studies the
cognitive processes involved in social interaction, though included)
Augmented Social Cognition: Supported by systems, the
enhancement of the ability of a group to remember, think, and
reason; the system-supported construction of knowledge
structures by a group.
2007-06-17 Ed H. Chi - Who writes Wikipedia? 8
The first step in solving any The first step in solving any
interesting problem is to get some interesting problem is to get some
paper and pencil.paper and pencil.
John Tukey
(not a direct quote)
2007-06-17 Ed H. Chi - Who writes Wikipedia? 9
Increasing Coordination Cost in WikipediaIncreasing Coordination Cost in Wikipedia
(joint work with Niki Kittur, Bongwon Suh,
Bryan Pendleton)
Published in CHI2007 conference: Aniket Kittur, Bongwon Suh, Bryan
Pendleton, Ed H. ChiHe Says, She Says: Conflict and Coordination in Wikipedia. In Proc. of ACM
Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM
Press. San Jose, CA
2007-06-17 Ed H. Chi - Who writes Wikipedia? 10
What is Wikipedia?What is Wikipedia?
“Wikipedia is the best thing ever. Anyone in the world can write anything they
want about any subject, so you know you’re getting the best possible
information.”
– Steve Carell, The Office
2007-06-17 Ed H. Chi - Who writes Wikipedia? 11
Increasing Coordination Costs in WikipediaIncreasing Coordination Costs in Wikipedia
Understanding coordination costs is vital for long-term
viability of collaborative information environment
Data:
–Entire dump on July 2, 2006
–58 million revisions
–4.7 million wiki pages
–2.4 million article pages
–800 gigabytes
2007-06-17 Ed H. Chi - Who writes Wikipedia? 12
Less direct workLess direct work
Decrease in proportion of edits to article page
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
2001 2002 2003 2004 2005 2006
Edit proportion
70%
2007-06-17 Ed H. Chi - Who writes Wikipedia? 13
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit Proportion
More indirect workMore indirect work
Increase in proportion of edits to user talk
8%
2007-06-17 Ed H. Chi - Who writes Wikipedia? 14
More indirect workMore indirect work
Increase in proportion of edits to user talk
Increase in proportion of edits to procedure
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit proportion
11%
2007-06-17 Ed H. Chi - Who writes Wikipedia? 15
More maintenance workMore maintenance work
Increase in proportion of edits that are reverts
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
2001 2002 2003 2004 2005 2006
Edit proportion
7%
2007-06-17 Ed H. Chi - Who writes Wikipedia? 16
More wasted workMore wasted work
Increase in proportion of edits that are reverts
Increase in proportion of edits reverting vandalism
% Edits (marked Vandalism)
0
0.005
0.01
0.015
0.02
0.025
0.03
2001 2002 2003 2004 2005
Edit proportion
1-2%
2007-06-17 Ed H. Chi - Who writes Wikipedia? 17
Global levelGlobal level
Conflict and coordination costs are growing
–Less direct work (articles)
+More indirect work (article talk, user, procedure)
+More maintenance work (reverts, vandalism)
60%
65%
70%
75%
80%
85%
90%
95%
100%
2001 2002 2003 2004 2005 2006
Percentage of total edits
Article
User
Article Talk
User Talk
Other
Maintenance
2007-06-17 Ed H. Chi - Who writes Wikipedia? 18
Conflict at the article levelConflict at the article level
Conflict is growing at the global level
We have some idea about where it is
But what defines conflict at the local level?
Build a characterization model of article conflict
–Identify metrics relevant to conflict
–Automatically identify high-conflict articles
2007-06-17 Ed H. Chi - Who writes Wikipedia? 19
Measure of controversyMeasure of controversy
“Controversial” tag
Use # revisions tagged controversial
2007-06-17 Ed H. Chi - Who writes Wikipedia? 20
Page metricsPage metrics
Possible metrics for identifying conflict in articles
Metric type Page Type
Revisions (#) Article, talk, article/talk
Page length Article, talk, article/talk
Unique editors Article, talk, article/talk
Unique editors / revisions Article, talk
Links from other articles Article, talk
Links to other articles Article, talk
Anonymous edits (#, %) Article, talk
Administrator edits (#, %) Article, talk
Minor edits (#, %) Article, talk
Reverts (#, by unique
editors)
Article
2007-06-17 Ed H. Chi - Who writes Wikipedia? 21
Performance: Cross-validationPerformance: Cross-validation
5x cross-validation, R
2
= 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Actual controversial revisions
2007-06-17 Ed H. Chi - Who writes Wikipedia? 22
Performance: Cross-validationPerformance: Cross-validation
5x cross-validation, R
2
= 0.897
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Predicted controversial revisions
Actual controversial revisions
2007-06-17 Ed H. Chi - Who writes Wikipedia? 23
Determinants of conflictDeterminants of conflict
Revisions (talk)
Minor edits (talk)
Unique editors (talk)
Revisions (article)
Unique editors (article)
Anonymous edits (talk)
Anonymous edits (article)
Highly weighted features of conflict model:
2007-06-17 Ed H. Chi - Who writes Wikipedia? 24
Model Generalization and Model Generalization and
Validation surveyValidation survey
Applied model to untagged articles (100+ edits)
Sampled range of predicted conflict scores
Rated by expert Wikipedians
Significantly correlated with predicted scores
– By rank correlation, p < 0.013 (Spearman’s rho)
Validates characterization model
–Detects conflicts even for articles with no ground truth
Who edits Wikipedia?Who edits Wikipedia?
% of edits made by administrators% of edits made by administrators
2007-06-17 Ed H. Chi - Who writes Wikipedia? 25
% of edits by 10k+ editors% of edits by 10k+ editors
2007-06-17 Ed H. Chi - Who writes Wikipedia? 26
Word changes made by adminsWord changes made by admins
2007-06-17 Ed H. Chi - Who writes Wikipedia? 27
Shifting user population in WikipediaShifting user population in Wikipedia
(more and more bottom driven!)(more and more bottom driven!)
2007-06-17 Ed H. Chi - Who writes Wikipedia? 28
Proportion of edits made by top Proportion of edits made by top
editors in Wikipediaeditors in Wikipedia
2007-06-17 Ed H. Chi - Who writes Wikipedia? 29
Long tail of participation in WikipediaLong tail of participation in Wikipedia
2007-06-17 Ed H. Chi - Who writes Wikipedia? 30
The participation architecture is a The participation architecture is a
power lawpower law
2007-06-17 Ed H. Chi - Who writes Wikipedia? 31
Only 60% of top 1% editors stay around Only 60% of top 1% editors stay around
month to month!month to month!
2007-06-17 Ed H. Chi - Who writes Wikipedia? 32
2007-06-17 Ed H. Chi - Who writes Wikipedia? 33
Living Laboratory:Living Laboratory:
Prototyping Social Applications on Prototyping Social Applications on
the Internetthe Internet
Create a Living Laboratory as a platform to
develop, test, and market our innovations, and
as a vehicle for creating collaborations and
thought leadership.
2007-06-17 Ed H. Chi - Who writes Wikipedia? 34
WikiDashboardWikiDashboard
Joint work with
Bongwon Suh, Aniket Kittur, Bryan Pendleton
2007-06-17 Ed H. Chi - Who writes Wikipedia? 35
Risks for Using WikipediaRisks for Using Wikipedia
Factual accuracy
Motives of editors
Uncertain expertise
Volatility
Spotty coverage
Unproven/non-independent source
[Denning et al. 2005]
2007-06-17 Ed H. Chi - Who writes Wikipedia? 36
Social DashboardSocial Dashboard
Social translucent for effective communication and collaboration
–Make socially significant information visible and salient
–Support awareness of the rules and constraints
–Accountability for actions
Wikis can be a prime candidate
–Every edit is logged and retrievable
–WikiScanner.com
–WikiRage.com
–Intellipedia
[Erickson and Kellogg 2002]
2007-06-17 Ed H. Chi - Who writes Wikipedia? 37
WikiDashboardWikiDashboard
Surfacing hidden social context to users
For readers
–Any incidents in the past e.g. A sudden burst of edits?
–Who are the editors?
–What is their motivation / point of views / expertise / topics of
interest
–Help them judging the quality/trustworthiness/usefulness of
an article
For writers
–Measure expertise / contribution / reputation
–Motivate them to be more active / responsible (?)
2007-06-17 Ed H. Chi - Who writes Wikipedia? 38
Article DashboardArticle Dashboard
2007-06-17 Ed H. Chi - Who writes Wikipedia? 39
User DashboardUser Dashboard
2007-06-17 Ed H. Chi - Who writes Wikipedia? 40
Drilling DownDrilling Down
List of every edits that a user made
Let readers examine each individual revision for validity, which is hard to
accomplish when only provided with aggregate visual summaries.
2007-06-17 Ed H. Chi - Who writes Wikipedia? 41
Image from: http://www.flickr.com/photos/ourcommon/480538715/
Augmented Social Cognition:Augmented Social Cognition:
From Social Foraging to Social SensemakingFrom Social Foraging to Social Sensemaking
Research Vision: Understand how social
computing systems enhance the ability of a
group of people to remember, think, and reason.
Living Laboratory: Create breakthrough
applications that harness collective intelligence
to improve knowledge capture, transfer, and
discovery.