Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation

ludaesch 31 views 23 slides May 10, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Yilin Xia ([email protected]),
Shawn Bowers ([email protected]),
Lan Li ([email protected]), and
Bertram Ludäscher ([email protected])

Presented at IDCC-2024 in Edinburg.

ABSTRACT. We propose a new approach for modeling and reconciling conflicting data cleaning actions. Such conflicts...


Slide Content

1
ReconcilingConflicting Data Curation Actions:
Transparencythrough Argumentation
YilinXia([email protected])
ShawnBowers([email protected])
LanLi([email protected])
BertramLudäscher([email protected])

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning: the story so far …
●80% of data science is data wrangling … (or so they say)
●Interactivedata cleaning (e.g.Excel, OpenRefine, … )
●Script-based (e.g., Python/pandas, R, … )
●Single-user/single-curator setting (… only the lonely … )
●Multi-user/multi-curator collaboration(… friends ..)

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
CollaborativeData Cleaning: Pros& possible Cons
Joiningforces & poolingexpertise
èhigher throughput(efficiency)
èhigher data quality output
But also …
èNeed to coordinatemore(e.g., vertical-and/or horizontal splitting, ...)
èNeed to resolve conflicts / disputes
èCost of collaboration

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
CollaborativeData Cleaning Part-I: Provenance+ Expert Merge
CollaborativeDC Provenance Model (CDCM)
ExpertRecipe Merge

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RossLoretta
WholeTeam > Sumof Members?
●Before: Expertcoordinator, merging bits & pieces of data cleaning recipes
●Alternative: Tightly-coupled, well-planned collaboration (“eager”)
●Newproposal: Loosely-coupledor ad-hoc collaboration (“lazy”)
+ automated conflict-resolutionstrategy
Rosetta
Team
+<

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Loosely-Coupled Multi-CuratorData Cleaning Example
6
Book TitleAuthorDate
Against MethodFeyerabend, P.1975
Changing OrderCollins, H.M.␣␣1985 ␣
Exceeding Our GraspP. Kyle Stanford2006
Theory of Information1992
Wrangling Goal:Create an APA style in-text citation based on the given dataset D
RossLoretta

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Actions (Transformation Model)
7
cell_edit(row_id, column_name, new_value)Cell-Level
del_row(row_id) Row-Level
del_col(column_name) Column-Level
split_col(column_name, separator)Column-Level
transform(column_name, function)Column-Level
join_col(set_of_column_names, separator,
new_column_name)
Column-Level
rename(column_name, new_column_name)Column-Level
… …
OpenRefine

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning ActionsèRecipes
8
StepAction
1rename("Book Title", "Book-Title")
2cell_edit(3, "Author", "Stanford, P.")
3transform("Date", "value.toNumber()")
4del_row(4)
5split_col("Author", ",")
6del_col("Author 2")
7join_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
StepActions
1rename("Book Title", "Book_Title")
2transform("Date", "value.trim()")
3cell_edit(4, "Author", "Shannon, C.E.")
4cell_edit(3, "Author", "Stanford, P.K.")
5split_col("Author", ",")
6rename("Author 1", "Last Name")
7rename("Author 2", "First Name")
8join_col("Last Name", "Date", "," ,
"Citation")
Recipe 2

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning ActionsèRecipes
9
StepAction
Erename("Book Title", "Book-Title")
Fcell_edit(3, "Author", "Stanford, P.")
Gtransform("Date", "value.toNumber()")
Hdel_row(4)
Isplit_col("Author", ",")
Jdel_col("Author 2")
Kjoin_col("Author 1", "Date", "," ,
"Citation")
Recipe 1
StepActions
Lrename("Book Title", "Book_Title")
Mtransform("Date", "value.trim()")
Ncell_edit(4, "Author", "Shannon, C.E.")
Ocell_edit(3, "Author", "Stanford, P.K.")
Psplit_col("Author", ",")
Qrename("Author 1", "Last Name")
Rrename("Author 2", "First Name")
Sjoin_col("Last Name", "Date", "," ,
"Citation")
Recipe 2

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Data Cleaning Results
10
Book-TitleAuthorDateAuthor 1Citation
Against MethodFeyerabend, P.1975FeyerabendFeyerabend,
1975
Changing OrderCollins, H.M.1985 CollinsCollins, 1985
Exceeding Our
Grasp
Stanford, P.2006StanfordStanford,
2006
Theory of
Information
1992
Book_TitleAuthorDateLast NameFirst
Name
Citation
Against
Method
Feyerabend,
P.
1975FeyerabendP.Feyerabend,
1975
Changing
Order
Collins, H.M.1985 CollinsH.M.Collins, 1985
Exceeding Our
Grasp
Stanford, P.K.2006StanfordP.K.Stanford,
2006
Theory of
Information
Shannon,
C.E.
1992ShannonC.E.Shannon,
1992
rename("Book Title",
"Book-Title")
rename("Book Title",
"Book_Title")
del_row(4)
transform("Date",
"value.toNumber()") transform("Date",
"value.trim()")

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Data Cleaning Conflicts
11
Execution Order Data Cleaning Actions
AttackRelationship
defeated(!) ←
attacks(", !),
¬ defeated(").

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Operation Attack Relation (example, one of many)
12
B
A
Attack Relationshipupdate(r,c,v1)del_row(r)del_col(c)split_col(c,sp1)transform(c,F1)join_col(c,...ci,sp1, cn1)rename(c, c1)
update(r,c,v2)A ⟷B
del_row(r)A ⟶B∅
del_col(c)A ⟶B∅∅
split_col(c,sp2)A ⟵B∅A ⟵BA ⟷B
transform(c,F2)A ⟷B∅A ⟵BA ⟶BA ⟷B
join_col(c,...ci,sp2, cn2)A ⟵B∅A ⟵B∅A ⟵BA ⟷B
rename(c, c2)A ⟶B∅A ⟷BA ⟶BA ⟶BA ⟶B A ⟷B
Describe whether/how operations AandBare in conflictwith each other

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Example Data Cleaning Conflicts
13
AttackDescription
E ↔Lrename("Book Title", "Book-Title") ↔
rename("Book Title", "Book_Title")
K ← Qdel_row(4) → cell_edit(4, "Author", "Shannon, C.E.")
F → Pcell_edit(3, "Author", "Stanford, P.") →
split_col("Author", ",")
……
defeated(!) ←
attacks(", !),
¬ defeated(").

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Formal
Argumentation
14
BBC4 Moral Maze

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Modeling Conflict: Argumentation Frameworks
15
defeated(!) çattacks(", !), ¬ defeated(").
accepted
defeatedundecided
undecided
1.aisn’t attacked at all
2.⇒ais accepted
3.aattacks b
4.⇒bdefeated
5.⇒battacks ccan be ignored
6.cand dattack each other
7.⇒status undecided

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
SolvingConflict: Argumentation Frameworks (AF)
16
InputAF
(attackgraph)
Output
(solvedAF)
defeated(!) ⇐
attacks(", !),
notdefeated(").
Argument Xis defeated
if it is attackedby Y
and Yis not defeated

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RefinedConflict Analysis: Stable Models (Extensions)
17
Well-founded
Solution
(“skeptical”
reasoning)
Stable Solution 1
(“brave” reasoning)
Stable Solution 2
(“brave” reasoning)

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
SolvingRoss+ Loretta( = Rosetta) ad-hoc “collaboration”
18
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RefinedSolution (Stable Model/Stable Extension)
19
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
RefinedSolutionput back in Recipe Order
20
StepActionsCurator
Erename("Book Title", "Book-Title")Alice
Mtransform("Date", "value.trim()")Bob
Hdel_row(4)Alice
Ocell_edit(3, "Author", "Stanford,
P.K.")Bob
Psplit_col("Author", ",")Bob
Jdel_col("Author 2")Alice
Qrename("Author 1", "Last Name")Bob
Sjoin_col("Last Name", "Date", "," ,
"Citation")Bob

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Et voilà! The mergedrecipe and combinedsolution!
21
Yilin Xia, Shawn Bowers, Lan Li and Bertram Ludäscher. 2023. Games and Argumentation Demo Repository.
https://github.com/idaks/Games-and-Argumentation/tree/idcc

Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation
Conclusions (Work in Progress) & Future Work
22
An approach based on formal
argumentation frameworksfor
-modeling the actions of users’ data-
cleaning recipes
-identifying conflicting actions across
recipes
-providing users with new tools to help
resolve these conflicts to generate a
single, unified, merged recipe.
An algorithm helps auto-process
recipes and solveconflicts
Take dependencies in account
when modeling
Explore criteriascan be used to
evaluate possible merged recipe

23
Reconciling Conflicting Data Curation Actions:
Transparency Through Argumentation
Yilin Xia [email protected]
Shawn Bowers [email protected]
Lan Li [email protected]
Bertram Ludäscher [email protected]