Data management planning is an essential step of preparing to launch a research project, but it's often not given the robust consideration it deserves. External funders are increasingly requiring research funding proposals to include detailed plans for how data will be accurately and effectively...
Data management planning is an essential step of preparing to launch a research project, but it's often not given the robust consideration it deserves. External funders are increasingly requiring research funding proposals to include detailed plans for how data will be accurately and effectively collected, maintained, preserved, and shared. Even without a funder requirement, sound data management planning improves accuracy and efficiency of research data collection. This session from the Scholarly Communications Librarian at Sam Houston State University will walk step by step through the process of data management planning; participants will leave with an outline of their own plan and a list of useful resources.
Size: 949.44 KB
Language: en
Added: May 17, 2024
Slides: 21 pages
Slide Content
Introduction
to Data
Management
Planning
Erin Owens
Professor, NGL, SHSU
The Importance of Having a Plan
•If your research involves data collection, you should have a plan for
how you will describe, collect, store, and share your data.
•Accuracy
•Efficiency
•Security
•Longevity
•Reproducibility
•Funder mandate / funding application requirement
Some Principles of Data Ethics
•Data ethics should underpin all planning
•At the center: Do No Harm
•FAIR principles set expectations for how
data is collected, documented, and shared
•CARE principles set expectations for data
governance by indigenous peoples
•See also Tribal engagement guidelines and
ABOR tribal consultation policy
•Delve deeper into Data Ethics
Five Major Questions a Data Management
Plan (DMP) Should Answer
1.What type of data will be produced?
2.How will it be organized and what standards will be used for
documentation and metadata?
3.What steps will be taken to protect privacy, security, confidentiality,
intellectual property or other rights?
4.If you allow others to reuse your data, how, where and when will the
data be accessed and shared?
5.Where will the data be archived and preserved and for how long?
Core Sections of a DMP
•Data Description
•Format(s)
•Metadata
•Ethics, Privacy, and Intellectual
Property
•Storage and Backup
•Access and Sharing
•Archiving and Preservation
In team projects, the
individual(s) responsible for
each step should be clearly
identified throughout
Data Description
•What data will be collected?
•One Excel spreadsheet of measurements; Ten interview transcripts in Google Docs;
A folder of 100 FMRI images and an accompanying table of patient demographics
•What's the scope and scale of the data?
•Volume/quantity of data, such as number of files and/or rows/ records; Chronological
or geographical scope; Level of data (raw , de-identified, aggregated, summarized)
•Who do you expect the audience will be?
•This may affect what specific variables you collect, how you format and describe
them, etc., so this should be clarified up front
•Are there other existing data that are relevant to what you are collecting?
•This may help you decide what specific variables to collect (for interoperability and
comparison), where you may want to archive data, etc.
Format(s)
•What format(s) will you use for the submission, distribution, and
preservation?
•Preservation formats should be platform-independent and non-proprietary so
that data will be reusable in the future; for example:
•CSV, not Excel, for spreadsheets
•TXT or PDF/A, not Word, for text documents
•MP4, not Quicktime, for videos
Metadata
•What information is needed for the data to be to be read and interpreted
in the future?
•Minimum: Basic details to help a user find the data: title, creation date, creator/
contributors, conditions of access
•Recommended: Definitions of variables, vocabularies, units of measurement;
definitions of any coding; format(s) and file type(s)
•Optional: Methodology used, analytical and procedural information, assumptions
made
Metadata
•What metadata standards will be used?
•Dublin Core – basic, domain-agnostic, easy to understand and implement
•Also referred to as DCMI (Dublin Core Metadata Initiative)
•Find other options by discipline in the RDA Metadata Standards Catalog
Source
Source
Metadata
•How will you capture / create
documentation and metadata?
•Data dictionary
•Codebook
•README
text file
•See also Cornell University’s Guide to
writing “readme” style metadata
Metadata
•Consider using Common Data Elements (standardized and precisely
defined question/response set, used systematically across different
studies to ensure consistency)
•Consider using standardized vocabulary where appropriate
•Medical Subject Headings (
MeSH)
•Art and Architecture Thesaurus (AAT)
•United Nations Educational, Scientific and Cultural Organisation (UNESCO)
Thesaurus
•Find more: Guide from UT Austin
•Find more: Linked Open Vocabularies
Ethics, Privacy, Confidentiality
•How is informed consent handled? How is privacy protected?
•Evaluate sensitivity of data; consider if it contains direct or indirect identifiers
that could be combined with other public information to identify research
participants
•Protect privacy through anonymizing data
•Include a provision for sharing of de-identified results when obtaining informed
consent of research participants
•See also: Primer for Protecting Sensitive Data in Academic Research
Intellectual Property
•Will any copyrighted materials be used/included?
•How will permission be obtained to use and/or disseminate that data?
•Can/will these rights be transferred to another organization for distribution and archiving?
•Who will own the rights to the data and information produced by the project?
•Know institutional policies regarding individual vs. institutional copyright control
•Note: New IP and Data policies from ORSP are in process at SHSU
•USA: Data is not copyrightable (but a specific expression might be, such as a table or chart)
•Data can be licensed: for example, some data providers license data to limit how it can be
used (to protect the privacy of study participants or guide downstream use of data)
•If you want to promote sharing and unlimited use of your data, you can make it available under
a CC0 Declaration
to explicitly remove all restrictions, or other Creative Commons licenses
with selected restrictions
Storage and Backup
•Where and how will you store your data to ensure their safety (several copies
are recommended)?
•SHSU networked drive or cloud storage (frequent auto backup) > USB drive
•How will data be managed during the project?
•Include information about version control and file-naming conventions
•Microsoft Office 365 provides default versioning; you still may want a system of downloading
backups on a set schedule and identifying versions
•2024_05_InterviewID572_Coding_v2.txt
•Store data in non-proprietary, open standard format for long-term readability
•Already discussed file formats, but also must consider storage medium
•CD/DVD not reliable in long-term: copy or migrate to new media 2-5 yrs after creation
•Check data integrity of stored data files at regular intervals
•Learn more about integrity (fixity) checking from the Digital Preservation Coalition
Access and Sharing
•Some funders will require sharing of data (at some level), or a strong
justification for why nothing can be shared
•First make a decision regarding whether you can share ANY level of
the data – then make specific decisions about which level, where, and
under what conditions or restrictions
•How and when are you planning on archiving and sharing your data?
•Why did you choose this method?
•What terms of use do you have, if any?
Archiving and Preservation
•What procedures will you use to ensure long-term archiving and
preservation of your data?
•Local drive or storage device; In the cloud; In a dedicated data repository
•What are the budget costs of preparing data and documentation for
preservation?
•Managed at each step as part of your work? Hire someone to create
documentation afterwards? Hosting costs for a specific repository?
•Consider whether costs can be factored into your funding application budget
•Based on the item’s likely useful “lifetime,” will the data eventually be
deaccessioned and/or destroyed? If so, when and how?
Specific Funder Requirements
•Some funders may require
inclusion of specific details
•When preparing a DMP for
a funding application, be
sure to reference the
funder’s exact requirements
– Just like we tell our
students, read and follow
the instructions!
•Source Document (NSF-BIO)
Using DMP Tool
•Log into DMPTool.org with SHSU credentials
•Access online DMP creation system
•Includes funder-specific templates and guiding
prompts/questions
•Collaborate on a plan with co- researchers
•Option to invite others (peers/colleagues, ORSP, or
SHSU Scholarly Communications Librarian) to review
your plan and provide feedback
Using DMP Tool
Additional Resources
•SHSU Library Guide to Data Management Plans
•Includes links to example plans and requirements for selected funders
•ICPSR Framework for Creating a Data Management Plan
•Writing a DMP - Step by Step with Examples
•Data Management Plan Self-Assessment Tool
•Data Sharing
•Do I have an open data mandate from my funder? Search Sherpa Juliet
•How do I find an appropriate repository to share data? See the advice
and tools here
•Guidance regarding NIH Requirements for Data Management and
Sharing Plans (DMSP)
Thank You! Questions?
Erin Owens
Professor
Associate Director of Library Public Services
Scholarly Communications Librarian
Newton Gresham Library
Sam Houston State University [email protected]