ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts arise naturally in collaborative data curation settings where multiple experts work independently and then aim to put their efforts together to improve and accelerate data cleaning. The key idea of our approach is to model conflicting updates as a formal argumentation framework (AF). Such argumentation frameworks can be automatically analyzed and solved by translating them to a logic program PAF whose declarative semantics yield a transparent solution with many desirable properties, e.g., uncontroversial updates are accepted, unjustified ones are rejected, and the remaining ambiguities are exposed and presented to users for further analysis. After motivating the problem, we introduce our approach and illustrate it with a detailed running example introducing both well-founded and stable semantics to help understand the AF solutions. We have begun to develop open source tools and Jupyter notebooks that demonstrate the practicality of our approach. In future work we plan to develop a toolkit for conflict resolution that can be used in conjunction with OpenRefine, a popular interactive data cleaning tool.
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning: the story so far …
●80% of data science is data wrangling … (or so they say)
●Interactivedata cleaning (e.g.Excel, OpenRefine, … )
●Script-based (e.g., Python/pandas, R, … )
●Single-user/single-curator setting (… only the lonely … )
●Multi-user/multi-curator collaboration(… friends ..)
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
CollaborativeData Cleaning: Pros& possible Cons
Joiningforces & poolingexpertise
èhigher throughput(efficiency)
èhigher data quality output
But also …
èNeed to coordinatemore(e.g., vertical-and/or horizontal splitting, ...)
èNeed to resolve conflicts / disputes
èCost of collaboration
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
CollaborativeData Cleaning Part-I: Provenance+ Expert Merge
CollaborativeDC Provenance Model (CDCM)
ExpertRecipe Merge
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RossLoretta
WholeTeam > Sumof Members?
●Before: Expertcoordinator, merging bits & pieces of data cleaning recipes
●Alternative: Tightly-coupled, well-planned collaboration (“eager”)
●Newproposal: Loosely-coupledor ad-hoc collaboration (“lazy”)
+ automated conflict-resolutionstrategy
Rosetta
Team
+<
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Loosely-Coupled Multi-CuratorData Cleaning Example
6
Book TitleAuthorDate
Against MethodFeyerabend, P.1975
Changing OrderCollins, H.M.␣␣1985 ␣
Exceeding Our GraspP. Kyle Stanford2006
Theory of Information1992
Wrangling Goal:Create an APA style in-text citation based on the given dataset D
RossLoretta
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Results
10
Book-TitleAuthorDateAuthor 1Citation
Against MethodFeyerabend, P.1975FeyerabendFeyerabend,
1975
Changing OrderCollins, H.M.1985 CollinsCollins, 1985
Exceeding Our
Grasp
Stanford, P.2006StanfordStanford,
2006
Theory of
Information
1992
Book_TitleAuthorDateLast NameFirst
Name
Citation
Against
Method
Feyerabend,
P.
1975FeyerabendP.Feyerabend,
1975
Changing
Order
Collins, H.M.1985 CollinsH.M.Collins, 1985
Exceeding Our
Grasp
Stanford, P.K.2006StanfordP.K.Stanford,
2006
Theory of
Information
Shannon,
C.E.
1992ShannonC.E.Shannon,
1992
rename("Book Title",
"Book-Title")
rename("Book Title",
"Book_Title")
del_row(4)
transform("Date",
"value.toNumber()") transform("Date",
"value.trim()")
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Data Cleaning Conflicts
11
Execution Order Data Cleaning Actions
AttackRelationship
defeated(!) ←
attacks(", !),
¬ defeated(").
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Operation Attack Relation (example, one of many)
12
B
A
Attack Relationshipupdate(r,c,v1)del_row(r)del_col(c)split_col(c,sp1)transform(c,F1)join_col(c,...ci,sp1, cn1)rename(c, c1)
update(r,c,v2)A ⟷B
del_row(r)A ⟶B∅
del_col(c)A ⟶B∅∅
split_col(c,sp2)A ⟵B∅A ⟵BA ⟷B
transform(c,F2)A ⟷B∅A ⟵BA ⟶BA ⟷B
join_col(c,...ci,sp2, cn2)A ⟵B∅A ⟵B∅A ⟵BA ⟷B
rename(c, c2)A ⟶B∅A ⟷BA ⟶BA ⟶BA ⟶B A ⟷B
Describe whether/how operations AandBare in conflictwith each other
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Example Data Cleaning Conflicts
13
AttackDescription
E ↔Lrename("Book Title", "Book-Title") ↔
rename("Book Title", "Book_Title")
K ← Qdel_row(4) → cell_edit(4, "Author", "Shannon, C.E.")
F → Pcell_edit(3, "Author", "Stanford, P.") →
split_col("Author", ",")
……
defeated(!) ←
attacks(", !),
¬ defeated(").
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Formal
Argumentation
14
BBC4 Moral Maze
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Conflict: Argumentation Frameworks
15
defeated(!) çattacks(", !), ¬ defeated(").
accepted
defeatedundecided
undecided
1.aisn’t attacked at all
2.⇒ais accepted
3.aattacks b
4.⇒bdefeated
5.⇒battacks ccan be ignored
6.cand dattack each other
7.⇒status undecided
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
SolvingConflict: Argumentation Frameworks (AF)
16
InputAF
(attackgraph)
Output
(solvedAF)
defeated(!) ⇐
attacks(", !),
notdefeated(").
Argument Xis defeated
if it is attackedby Y
and Yis not defeated
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
SolvingRoss+ Loretta( = Rosetta) ad-hoc “collaboration”
18
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RefinedSolution (Stable Model/Stable Extension)
19
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RefinedSolutionput back in Recipe Order
20
StepActionsCurator
Erename("Book Title", "Book-Title")Alice
Mtransform("Date", "value.trim()")Bob
Hdel_row(4)Alice
Ocell_edit(3, "Author", "Stanford,
P.K.")Bob
Psplit_col("Author", ",")Bob
Jdel_col("Author 2")Alice
Qrename("Author 1", "Last Name")Bob
Sjoin_col("Last Name", "Date", "," ,
"Citation")Bob
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Et voilà! The mergedrecipe and combinedsolution!
21
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc
Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Conclusions (Work in Progress) & Future Work
22
An approach based on formal
argumentation frameworksfor
-modeling the actions of users’ data-
cleaning recipes
-identifying conflicting actions across
recipes
-providing users with new tools to help
resolve these conflicts to generate a
single, unified, merged recipe.
An algorithm helps auto-process
recipes and solveconflicts
Take dependencies in account
when modeling
Explore criteriascan be used to
evaluate possible merged recipe