Big Data - Semantic expressiveness as a function of data complexity levels - ISKO BrasItalia.pdf

CarlosMarcondes17 10 views 23 slides Jun 14, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

This research suggests that data is organized at different levels, from the simplest data to the most complex data aggregates (Foskett, 1961; Barreto, 2008), forming data sets or data systems, such as fields, records, tables, conceptual models, ontologies . As they become more complex and voluminous...


Slide Content

Big Data: semanticexpressivityas a
functionofdata complexitylevels
2023-2024
KO across disciplines and media
Brazitalianmeeting, June 14, 2024
ROCAD ResearchGroup* –KnowledgeRepresentationand
Organizationin Digital Environments, + Prof. LinairCampos,
https://dgp.cnpq.br/dgp/espelhogrupo/793271
*MARTINS, S; RAMOS JR, M; PEREIRA, D; MARCONDES, C.

Summary
1.Problem
2. Researchquestions
3. Definitions, pressupositions
4. Foudations
5. Results
6. Final remarks
2

In thisimage(Fig.7), whichTufte
considerstobeoneofthemost
“effective” scientificdiagrams, we
understandtheoriginofthis
doublebetthatmakes the
scientistwineverytime heseems
tohaveaskedfor direct contact
withtheworld. Marey, thegreat
physiologist, cansuperimposethe
mapofRussia, themeasurement
oftemperatures, therouteofthe
Grande Armée, thedate ofits
deploymentsand, mosttragically,
thenumberofsurvivingsoldiers
fromeachbivouac! (LATOUR
2000, p. 30)
3

Problem
“While data volume proliferates, the knowledge it creates has not kept pace”,
Cognizant Newsletter (2011),
Human activities are increasingly mediated by information technologies, the
so-called datafication process (Mejias; Couldry, 2019) of contemporary
society.
Large amount of digital data, called Big Data, which has become fundamental
in organizations as it has a high semantic potential.
As a semantic resource, Big Data only reaches its full potential when
processed by information technologies.
4

Research Questions
•Whatisdata? Howsemantics(for humansandmachines) emerges from
digital data?
“The “strength” ofthesemantic, in thesecases, islinkedto”semanticexpressivity,” associatedwiththetractabilityofthe
KOS for differentkindsofformalism” (SOUZA et al. 2012, p. 183).
“Representationalpower, SemanticExpressiveness, Intelligibility” (SOUZA et al. 2012, p. 188).
“Expressivityorsemanticexpressivity, isa notionusedtonominatehowaccuratea knowledgerepresentationexpressa
phenomenaofreality”. (MARTINS et al 2024, p. 6)
•Howdoes SEMANTIC EXPRESSIVITY increaseswithintheBig Data context?
✓Thisresearchsuggeststhatdata isorganizedatdifferentlevels, fromthesimplestdata tothemost
complexdata aggregates(Foskett, 1961; Barreto, 2008), formingdata sets ordata systems, suchas
fields, records, tables, conceptual models, ontologies. As theybecomemore complexandvoluminous,
thesedata aggregatespotentiallybecomemore expressiveandcangeneratesemantics, information,
insights for humansandmachines.
5

Definitions, presuppositions
•SEMANTIC EXPRESSIVITY (still a notion) –Accuracy in representing reality,
potential information, informative potential of the OUTPUT of a data set
processing, immediately offered to an end user, enabling him/her to immediate
action or decision taking.
•OUTPUT of the processing of a data set (a data aggregate) –“output” of an
information system immediately offered to an end user.
6

Fundations... -> REFERENCES, CONCEPTUAL BASES
-SemanticWeb (BERNERS-LEE, HENDLER, LASSILA 2001)
-TheoryOfIntegrativeLevels(HARTMANN 1952 ), (FEIBLEMAN 1954), (GNOLI
2018)
-SNAP andSPAN representationsofreality (GRENON, SMITH 2004)
-Big Data andKnowledgeOrganization(IBEKWE-SANJUAN, BOWKER 2017)
-Data, Big Data andsemantics(HJØRLAND 2018)
-The semanticoftheSemanticWeb (SHETH, RAMAKRISHNAN, THOMAS
2005)
-A taxonomyofKOS accordingtotheirsemanticexpressivity(SOUZA,
TUDHOPE, Douglas, ALMEIDA 2012)
7

What is data?
Severalscientificareasalsoalsohavemadeeffortstounderstand, conceptualize andinstrumentBig
Data, suchas Computer Science, Health SciencesandKnowledgeOrganization(Shet, 2020) (Huang et al.
2015).
Big Data todayX informationexplosion, thephenomenonthatgave risetoInformationScience andKO in
thein the1960s
“BigDataistheInformationassetcharacterizedbysuchaHighVolume,VelocityandVarietytorequire
specificTechnologyandAnalyticalMethodsforitstransformationintoValue”Mauro,GrecoeGrimaldi
(2016,p.126).
“Dataaresocialartefacts”(Ibekwe-Sanjuan,Bowker2017,p.195)
“Dataareconcreteinstantiationsofsymbolicrepresentationsofdescriptivepropositions,informedby
empiricalobservation,aboutthequantitativeandqualitativepropertiesofreal-worldphenomena”;“Data
arealwaysproducedforsomepurposesandperspectives”.(Hjørland2018,s.p.).
“Adatumordata item as a triple <e, a, v>, wheree isanentityin a conceptual model, a isanattributeof
entitye, andv isa valuefromthedomainofattributea. A datumassertsthatentitye hasvaluev for
attributea” (Hjørland2018, s.p.).
Data are representationsofentitiesorphenomena, semioticentities
8

Integrative Level Classification
Reality is organized as crescente levels of complexity, as Phisycal level,
the Biological level, the Psycologicallevel, the Cultural level
The levels overlap each other. Each higher level adds an emergent
property (or quality) that did not exist at the lower level
9KANISTO, Tony (2018)
https://philosophicallp.quora.com/Emergence-vs-Supervenience
Foundations ...
1.Structures and Form
2. Mater and Energy
3. Cosmos and Earth
4. Life (biological systems)
5. Human beings
6. Societies
7. Material artefacts
8. Intellectual artefacts
9. Spiritual artefacts
(DAHLBERG, 1995)

Foundations ... -> Integrative Level Classification
DATA
(GNOLI 2008, http://www.iskoi.org/ilc/book/strata.php)

Foundations ...
CONTINUANTS AND OCCURENTS
GRENON, Pierre; SMITH, Barry. SNAP and SPAN:
Towards dynamic spatial ontology.Spatial cognition
and computation, v. 4, n. 1, p. 69-104, 2004
11

AN EXAMPLE...
https://iris.who.int/rest/bitstreams/1287200/retrieve
12

STRUCTURE OF
THE WHO-
COVID-19-Rapid-
CRF
AN EXAMPLE...
13

AN EXAMPLE...
14

15
EXAMS
PATIENTS
DICTIONARY
OUR EXAMPLE:

A DASHBOARD
Online, real time data
16

Foundations ...
Computacional model of Data Processing
PROCESSING
DATA

Representing Processing TIME DIMENSION
Data
conception
Level 0Conceptual modeling Continuants Occurrents
Processing, Querying,
SNAPSHOT
Statistical modelling
DYNAMIC
Digital dataLevel 1Textual data
Non-
structured
data
Level 2A discretetokens, a quasi-sign(NÖTH, 2002), a “data
point” (SHAH 2020, p. 16), delimited
decontextualized
Structured
data
Level 3A contextualizeddata, a “stateofaffairs” (JANSEN,
2008, 188), a “datum” Hjørland(2018): triples entity,
attribute, value, therepresentationofanisolated
factofphenomena Real time data,
A Dashboard
Level 4Aggregation of triples referencing to a single entity
or phenomema
Level 52 ormore Aggregationoftriples refereningto2 or
more interrelatedentitiesorphenomema; implicit
schema
Level 6Data includes the conceptual model/schema; explicit
schema
NLP,
NER
SPARQL
SQL
SQL
SPARQL
Data Sci
TEXT
MINING
RESULTS
19

EXAMPLES
Level 1–“The pacient 0ae989c74676b6b5b8bf1a5f57be45f7 took the URINE 1 exam”
Level 2–“pacient”, “0ae989c74676b6b5b8bf1a5f57be45f7”, “URINE 1”, “exam”
Level 3 -<pacient_0ae989c74676b6b5b8bf1a5f57be45f7> <exam> <URINE 1>. (RDF triples)
Level 4 –
Level 5 –
Level 6 -
SCHEMA DATA
RESULTS
(A table)
(2 or more
interrelated tables)
(An ontology)
(A text)
(Discrete tokens)
20

Types of Outputs
•SQL query on 1 or more interrelated tables
Output -the subsetof the rows and cells of a table giving a condition
•SPARQL query on a graph
Output -the subsetof the triples that attend to a condition
•Descriptive analysis on 1 or more interrelated tables
Output –general information on a dataset
•Correlationson 1 or more interrelated tables
Output –a view (possible a graph view) on how the correlated
variables vary one in relation to the other over time
•Dashboard(real time) on 1 or more sources
Output –a real time view of how the correlated variables vary one in
relation to the other over time
STATIC, a data
field, a cell
DYNAMIC, OVER
THE TIME,
a variable
RESULTS
21

Final remarks
Semantics “emerges” from the data according to 2 axis:
–Complexity of Representations Axis -As data organizes into more
complex sets or systems they can become more expressive and
more accurately represent things in a domain;
-Processing Axis -by processing the things represented by the data
CONCEPTUAL MODELING X STATISTICAL MODELING
KO, KR, CM closer to Big Data, Data Science
22

References
LATOUR, Bruno. Redes que a razão desconhece: laboratórios, bibliotecas, coleções. In: Baratin, M; Jacob, C. O poder das
bibliotecas. Rio de Janeiro : Ed. UFRJ, 2000.
COGNIZANT. Making Sense of Big Data in the Petabyte Age. Cognizant, 20-20 insights, jun. 2011. Disponível em:
https://www.cognizant.com/whitepapers/Making-Sense-of-Big-Data-in-thePetabyte-Age.pdf. Acesso em: 02 abr. 2021.
BERNERS-LEE, Tim; HENDLER, James; LASSILA, Ora. The semantic web. Scientific American, May, 2001.
FEIBLEMAN, JAMES K. Theory Of Integrative Levels. The British Journal for the Philosophy of Science, v5, n.17, p. 59-66, 1954.
GRENON, Pierre; SMITH, Barry. SNAP and SPAN: Towards dynamic spatial ontology.Spatial cognition and computation, v. 4, n.
1, p. 69-104, 2004.
IBEKWE-SANJUAN, F.; BOWKER, G. C. Implications of big data for Knowledge Organization.Knowledge Organization, Baden-
Baden, v. 44, n. 3, p. 187-198, 2017.
HJØRLAND, Birger. Data with big data and database semantics. In. IEKO, ISKO Encyclopedia of Knowledge Organization. ISKO:
2018. Disponível em: https://www.isko.org/cyclo/data. Acesso em: 02 dez. 2020.
HARTMANN, NICOLAI. New ways of ontology. Henry Regnery Company, Chicago, Illinois, 1952.
GNOLI, Claudio. Mentefacts as a missing level in theory of information science.Journal of Documentation, v. 74, n. 6, p. 1226-
1242, 2018. Disponível em: https://www.gnoli.eu/mentefacts.docx. Acesso em: 07 ago. 2023.
SHETH, Amit; RAMAKRISHNAN, Cartic; THOMAS, Christopher. Semantics for the semantic web: The implicit, the formal and the
powerful.International Journal on Semantic Web and Information Systems (IJSWIS), v. 1, n. 1, p. 1-18, 2005. Disponível em:
https://www.academia.edu/download/90817267/JSWIS.pdf#page=19. Acesso em: 12 set. 2019.
OECD –ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT. Handbook on Constructing Composite
Indicators: Methodology and User Guide, 2008. Disponível em: https://compositeindicators.jrc.ec.europa.eu/. Acesso em: 22
mai. 2024.
SOUZA, Renato Rocha; TUDHOPE, Douglas; ALMEIDA, Maurício Barcellos. Towards a taxonomy of KOS: Dimensions for
classifying Knowledge Organization Systems. KO KNOWLEDGE ORGANIZATION, v. 39, n. 3, p. 179-192, 2012. Disponível em: .
Acesso em: 4 mai. 2022.
23

24
MAC Niterói
ContemporaryArtMuseum, Niterói, Rio
de Janeiro, Brazil
Commentsare welcome
Thankyou!
[email protected]
http://profmarcondes.ong.br