Data Science and AI in Biomedicine: The World has Changed
pebourne
130 views
50 slides
May 02, 2024
Slide 1 of 50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
About This Presentation
Presentation to Cardiff University
Size: 29.46 MB
Language: en
Added: May 02, 2024
Slides: 50 pages
Slide Content
Data Science and AI in Biomedicine: The World has Changed Philip E. Bourne PhD [email protected] https:// www.slideshare.net / pebourne May 2, 2024 Cardiff University http://bournelab.org http://datascience.virginia.edu
My Perspective aka Biases AI User Practical Science Long standing computational biomedical researcher Open Access Co-Founder and Founding Editor in Chief PLOS Computational Biology Open Knowledge First President of FORCE11 Data are Value Involved in FAIR Translation First Associate Vice Chancellor for Innovation and Industrial Alliances, UCSD Funders as Lever First Associate Director for Data Science, NIH – preprints, data sharing, BD2K, etc. Change Higher Ed Founding Dean School of Data Science, UVA
Further Bias I am privileged to be helping build a new kind of school within a traditional institution. I have drunk my own Kool-Aid
Why do I think the world has changed…. Because history repeats and in some cases amplifies itself
The 6 D’s (Peter Diamandis) Digitization Disruption Demonetization Dematerialization Democratization Time Volume, Velocity, Variety Digital media becomes bona fide form of communication Deception
Kodak – A 6D’s Case study Digital media becomes bona fide form of communication
Will History Repeats Itself? Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume, Velocity, Variety AI impact minimal Models reach human capacity Augmented reality, sensors Quantum computing Digital media becomes bona fide form of communication Learning modalities change Knowledge workers must adapt job market shifts Robotics Research practice changes What Will Academic Medicine Look Like & In What Time Frame?
https:// en.wikipedia.org /wiki/ Jim_Gray _( computer_scientist ) https:// www.microsoft.com / en -us/research/ wp -content/uploads/2009/10/ Fourth_Paradigm.pdf https://twitter.com/aip_publishing/status/856825353645559808 Science Drivers Over the Millennia
The Human Genome was the Tipping Point and Led the Way http://www.ornl.gov/hgmis High throughput DNA digital data changed how we think about biomedicine Spawned a new field – bioinformatics / computational biology/ systems biology / biomedical data science Spawned a multi billion-dollar industry
Bourne’s Timeline 1980s 1990s 2000s 2010s 2020’s The Discipline (Whatever it is Called) Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver 11 Digital Data Systems Analytics Design Value 4 Pillars of Data Science HPC Cloud GPUs HHMs SVMs NNs CNNs LLMs HIPPA Privacy Security HiTech Mol Graphics Web 2.0 Dashboards
Basic Premise ….. We are at a new tipping point
Basic Premise … “We need to be more aware than ever of developments that may be far outside our discipline that fall under the broad topic of data science. In short, we need to become biomedical data scientists.” Stated another way, the leadership role in data/informatics afforded by the human genome project no longer applies.
Data Science (including AI) – In 45+ Years in Academia I Have Never Seen Anything Like It It is a response to the digital transformation of society It is touching every discipline (aka vertical) We can’t keep the students out of our classes Cause – large amounts of digital data Effect – interdisciplinarity, openness, translation, search for responsibility and more In summary, it is disruptive to current modes of biomedical research
A Data Integration Poster Child Researcher and Assistant Professor of Medicine Dr. Thomas Hartka , also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data —to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka , UVA School of Medicine
Data Science As a Driver Its Just the Beginning…. https:// zenodo.org /records/7768414 Data scientist jobs are predicted to experience 36 percent growth between 2021 and 2031 , according to the US Bureau of Labor Statistics. The global data science platform market size was valued at USD 64.14 billion in 2021 and is projected to grow from USD 81.47 billion in 2022 to USD 484.17 billion by 2029 , exhibiting a CAGR of 29.0% during the forecast period. Data science is the fastest emerging field around the globe . 57 Member Institutions
Given these precedents about data and data science we should start with a definition/framework
Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… http:// vadlo.com / cartoons.php?id =357
One Definition of Data Science – The 4+1 Model (aka domains) Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]
The Data Science Interplay Value + Design = Openness, responsibility Value + Analytics = Human centered AI, algorithmic bias Value + Systems = sustainability, access, environmental impact Design + Analytics = literate programming, visualization Design + Systems = dashboards, engineering design Analytics + Systems = ML engineering Thinking of data as a science unto itself is novel and perhaps controversial [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]
With this definition let’s explore the implications for biomedical research …
The 4+1 Model - Systems Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]
Systems…. Need something akin to the electricity grid or banking system Need to consider data and methods as first-class data objects Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science Mesh, the China Science and Technology (CST) Cloud, the African Open Science Platform, the South African National Integrated Cyber Infrastructure System, the Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital Research Alliance of Canada (formerly known as the New Digital Research Infrastructure Organization), and the Arab States Research and Education Network. Problems span funding agencies; solutions do not There is a lack of public-private partnership
Analytics ….
AlphaGo – Take Home Messages https:// www.alphagomovie.com / Even the programmers were disquieted by creating something better than any human AlphaGo made a move that no human Go expert nor programmer anticipated It takes a lot of resources to defeat the world champion Go has more moves than there are atoms in the universe
Proteins have ~20**300 combinations also more than the number of atoms in the universe
AlphaFold2 Numerical optimization – differential programming Overall gradient descent trained to win CASP Jumper et al.., 2021. Nature , 596 (7873), pp.583-589 Transformer models using attention Geometry invariant to translation/rotation
Logistics Behind the Win Nothing fundamentally new from an AI perspective Data Integration Collaboration not competition Engineering challenge beyond most labs Compute power beyond most labs Team size beyond most labs Worked with protein structure specialists
Downstream Implications Cooperation rather than competition Public-private partnership Translational possibilities are endless Made possible by curated open data Appreciate engineering
Scientific Implications
Exploration of Latent Space Rethink fold space? Rethink classification schemes?
AI Analytics Across the Scientific Discovery Process From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
The 4+1 Model - Design Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]
Questions I Leave You With …. Are we indeed at a change point? Will biomedicine continue to lead data science? Do we need new models for doing science? Are we placing the right emphasis on our research products, notably data and methods vs papers?
Questions?
Databases organize data around a project . Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse Migratory Pressure Data Ecosystems How we think about our infrastructure is important
Challenges Fixed level of funding Opportunities data commons Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center. Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818. Systems [Adapted from Bob Grossman]
But wait the picture is more complicated….
Data Science versus Data Engineering – How Much Emphasis Where?
Coming back to the question… So we have a definition of data science and we have a set of guiding principles, where does this take us? Stated another way, what do we want to be recognized for in 10 years? https:// pebourne.wordpress.com /
Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry) … * Protect human subject data The right of human subjects to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl / CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Balance protecting human subject data with open research that benefits patients [Adapted from Bob Grossman] Value
Why Responsible Data Science? A defining feature A partnership between STEM, social sciences and the humanities Where UVA has strength
Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV Machine learning Statistics Biophysical Mathematics Semantic SNP methylation 3D structure Gene expression Proteomics Metabolomics Metabolic Signaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWAS Population dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 EHR’s Imaging Numerous Databases Environment Current roadblocks are more cultural than technical The Fifth Paradigm: Integration Across Scales?
Gohlke et al. 2022 https:// onlinelibrary.wiley.com / doi /10.1002/ctm2.726 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis EHR Animal Models Pathways
Daily Challenges Deciding what not to do Competition for the best team members (faculty and staff) Establishing a diverse team Lack of a comprehensive enterprise-wide data infrastructure Its easier to conform