Bosman and Kramer Open Research: A 2024 NISO Training Series, Session Four: Open data"

BaltimoreNISO 508 views 30 slides Oct 18, 2024
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

This presentation was provided by Jeroen Bosman of Utrecht University and Bianca Kramer of Sesame Open Science, during the fourth session of NISO's 2024 Training Series "Open Research." Session Four: Open data," was held on October 17, 2024.


Slide Content

Open Research NISO training fall 2024 session 4: Open Data October 17, 2024 Facilitated by Bianca Kramer & Jeroen Bosman https://tinyurl.com/NISO-fall2024-session04

Course goals and structure Course Goals Learn what open research entails and why one should pursue it Explore practices and tools, getting insight into how these are implemented and used Discuss open research policies and how to support open research in practice Each week’s structure: Review previous week: 20 min short recap, share actions Session topic 25 min ‘what’ and ‘why’ (lecture) 25 min ‘how’ (hands-on activity) 20 min support, monitoring, policy (discussion) Home assignment

Recap session 3: Reproducibility and code sharing Image source: Scriberia for The Turing Way https://doi.org/10.5281/zenodo.3332808 . Source: https://book.the-turing-way.org/

Recap Session 3 - Home assignment Before our next session, formulate one or more potential actions at your organization to facilitate ‘reproducibility and code sharing’ These can be things that are being considered already, or fully new ideas. Please share your actions in the slides below - we will then discuss them together

Recap session 3: Reproducibility and code sharing

Session 4: Open Data Considered by many as one of the core practices in open research, making data FAIR and open involves many considerations and choices that we address in this session. From data management plans, data privacy and open data sharing to data archives and reusing data. analysis

Session 2: Open Data In the analysis phase , researchers make choices around: generating data or reusing data data management data cleaning, processing, analyzing data archiving and sharing data citation All practices have implications for reproducibility and (potential) openness of data and for many steps there are policies and requirements. analysis

FAIR and open data Why is it important? Source: https://vu.nl/en/stories/infected-data-in-biblical-terms

FAIR and open data Why is it important? V erifiability & Reproducibility E fficiency T ransparency & accountability R elevance & stakeholder involvement FAIR data is not necessarily also open data But FAIR data very much helps open data being useful For verification data needs to be available in some form Reusing available data saves time/money generating new data Available data helps with meaningful involvement and application of results Data availability makes the research process more transparent to outsiders

Types of data and common data file formats .rdf .json .txt .csv .rdf .tiff .avi .wav . netcdf4

Versions of data in the data processing workflow Raw data (directly from measuring, recording, count ing , sensors etc.) C leaned data (remove failed measurements, disambiguate phrasing, remove unwanted characters, e tc.) A ggreg ated data (rep lace detailed data with totals, averages etc. ) Anonymised data (remove any personal data and data that c an lead to identification of people, orgs. etc ) Processed da ta (coding of survey/interview answers, calculations) Future proof data (use/convert to preferred data formats )

Searching for data General search engine Dataset search (multidisciplinary) Searching for (field specific and general) research data archives Indirect data search via publications Field specific data search Government data, official statistics

FAIR data

FAIR principles in plain language Findable: Uni que identifier Rich metadata In searchable archi ve Identifier type in metadata Accessible: Metadata retrievable through identifier Open protocol for computer-readability P rotocol include s authentication (where applicable) Metadata permanently available Interoperab le Language is accessible and open Vocabula ries used are also FAIR (Computer readable ) referen ces to other data Re-usable Accurate and relevant metadata Lice nse is available Data provenance is clear Using domain/community standards

Levels of FAIR and open Source: https://doi.org/10.3233/ISU-170824

Real FAIRness levels (for instance as monitored at Charité Hospital) https://quest-dashboard.charite.de

Data management Image sources: The Turing Way http://doi.org/10.5281/zenodo.3695300 , http://doi.org/10.5281/zenodo.3695300 and much more…

Data management plan (DMP) General Creator Affiliation Template ORCID Abstract Last modified Copyright Data collection What data to collect/create? How will data be collected/created? Documentation / metadata How to manage ethical issue s/consent? How to manage copyright/IPR? Example with DCC template: https:/ /d mponline.dcc.ac.uk/plans/75069/export.pdf?export%5Bquestion_headings%5D=true Storage / backup How will data be stored during research? How will you manage access and security?

Data management plan (DMP) tools https://dmponline.dcc.ac.uk/ https://dmptool.org/

Data policies From: Institutions (incl. ethics review boards) Funders Journals National Academies Governments On: Integrity Ethics (incl. privacy) DMP Retention periods FAIRness Openness

Data archives and repositories global, multidisciplinary United States field specific

Considerations when sharing data Format Metadata Choice of data repository License Costs Permissions Privacy-sensitive data Equity

Considerations when sharing data Format Metadata Choice of data repository License Costs Permissions Support Privacy-sensitive data Equity

Promoting reuse of data https://www.uu.nl/en/news/new-application-system-for-youth-data-great-example-of-team-science

Choose to explore either examples of specific research data sets that have been shared or explore overviews of archives for research data. Explore specific examples of research data sets that have been shared: what do these look like, how could they be (re)used? Art Museum Staff Demographic Survey Hydrographic data from the Rio Grande Cone region Deliverable 1.1.1.1 BEL-Float project Supplementary dataset to self-learning training compared with instructor-led training Explore overviews of archives for research data : what kind of archives are available, what would be relevant archives for my field or for researchers that we support? Fairsharing.org list NIH recommended data archives list Re3data list Exploring tools and practices

Examples of specific research data sets Go to one of the data sets break out groups. Group size is 4 maximum. Start with filling group D1, 5th person goes to D2 etc. If you end up alone in a group you may add yourself as a fifth person to a group. As a group, decide which of the data sets below you want to see and click that link. You can likely do two at most. Scan the record and briefly look at the options of the platforms as a whole. One person shares the screen. Talk aloud about what you see. Art Museum Staff Demographic Survey, United States, 2015, 2018, 2022 (ICPSR 38196) Hydrographic data from the Rio Grande Cone region, southwestern Atlantic Ocean in 2023 Deliverable 1.1.1.1 BEL-Float project | Dataset containing the results of numerical simulations (motions, forces) of the operational performance analysis - Part 9/9 Supplementary dataset to self-learning training compared with instructor-led training Try find out what kind of data you are looking at (i.e. what is measured and how), what the data format is, whether the data can indeed (potentially) be downloaded/accessed, what the license is, whether there is a meaningful description and whether there is a link to publication describing the data or the research project it was generated by. Could you think of obvious types of reuse? Exploring tools and practices: activity instructions

Examples of overviews of data archives Go to one of the archive overviews break out groups. Group size is 4 maximum. Start with filling group A1, 5th person goes to A2 etc. If you end up alone in a group you may add yourself as a fifth person to a group. As a group decide which of the archive overviews below you want to explore click that link. One person shares the screen. You can likely do two at most. Talk aloud about what you see. Fairsharing.org list NIH supported data archives list RE3data list Try to find out whether it is possible and easy to search for or to drill down to a set of archives for a specific discipline (e.g. neuroscience or economics). Also try to find out whether there is information on the quality status of included archives (e.g. indications of ‘certification’, a ‘seal’, ‘recommendation’ or evidence of applying certain standards or policies). Exploring tools and practices: activity instructions

Motivations and barriers Effects of policies & mandates Discussion: monitoring, policies and support

Home assignment Before our next session, formulate one or more potential actions at your organization to facilitate ‘open data’ These can be things that are being considered already, or fully new ideas. Try to use the SMART rubric - identifying actions that are specific, measurable, achievable, relevant and time-bounded. Formulate actions along the lines of: “[Actor] could [action] for [audience] ” For example: “The library will offer support with research data management plans ” “IT services will provide an overview of data repositories researchers can deposit their data in” “We will include researchers’ data sets in our current research information system (CRIS) ”

Open Research NISO training fall 2024 session 5: Open Access October 31 , 2024 Facilitated by Bianca Kramer & Jeroen Bosman Next session: