Data Science and AI in Biomedicine: The World has Changed

pebourne 130 views 50 slides May 02, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Presentation to Cardiff University


Slide Content

Data Science and AI in Biomedicine: The World has Changed Philip E. Bourne PhD [email protected] https:// www.slideshare.net / pebourne May 2, 2024 Cardiff University http://bournelab.org http://datascience.virginia.edu

My Perspective aka Biases AI User Practical Science Long standing computational biomedical researcher Open Access Co-Founder and Founding Editor in Chief PLOS Computational Biology Open Knowledge First President of FORCE11 Data are Value Involved in FAIR Translation First Associate Vice Chancellor for Innovation and Industrial Alliances, UCSD Funders as Lever First Associate Director for Data Science, NIH – preprints, data sharing, BD2K, etc. Change Higher Ed Founding Dean School of Data Science, UVA

Further Bias I am privileged to be helping build a new kind of school within a traditional institution. I have drunk my own Kool-Aid

Why do I think the world has changed…. Because history repeats and in some cases amplifies itself

The 6 D’s (Peter Diamandis) Digitization Disruption Demonetization Dematerialization Democratization Time Volume, Velocity, Variety Digital media becomes bona fide form of communication Deception

Kodak – A 6D’s Case study Digital media becomes bona fide form of communication

Will History Repeats Itself? Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume, Velocity, Variety AI impact minimal Models reach human capacity Augmented reality, sensors Quantum computing Digital media becomes bona fide form of communication Learning modalities change Knowledge workers must adapt job market shifts Robotics Research practice changes What Will Academic Medicine Look Like & In What Time Frame?

https:// en.wikipedia.org /wiki/ Jim_Gray _( computer_scientist ) https:// www.microsoft.com / en -us/research/ wp -content/uploads/2009/10/ Fourth_Paradigm.pdf https://twitter.com/aip_publishing/status/856825353645559808 Science Drivers Over the Millennia

The Human Genome was the Tipping Point and Led the Way http://www.ornl.gov/hgmis High throughput DNA digital data changed how we think about biomedicine Spawned a new field – bioinformatics / computational biology/ systems biology / biomedical data science Spawned a multi billion-dollar industry

Bourne’s Timeline 1980s 1990s 2000s 2010s 2020’s The Discipline (Whatever it is Called) Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver 11 Digital Data Systems Analytics Design Value 4 Pillars of Data Science HPC Cloud GPUs HHMs SVMs NNs CNNs LLMs HIPPA Privacy Security HiTech Mol Graphics Web 2.0 Dashboards

Basic Premise ….. We are at a new tipping point

Basic Premise … “We need to be more aware than ever of developments that may be far outside our discipline that fall under the broad topic of data science. In short, we need to become biomedical data scientists.” Stated another way, the leadership role in data/informatics afforded by the human genome project no longer applies.

Data Science (including AI) – In 45+ Years in Academia I Have Never Seen Anything Like It It is a response to the digital transformation of society It is touching every discipline (aka vertical) We can’t keep the students out of our classes Cause – large amounts of digital data Effect – interdisciplinarity, openness, translation, search for responsibility and more In summary, it is disruptive to current modes of biomedical research

A Data Integration Poster Child Researcher and Assistant Professor of Medicine Dr. Thomas Hartka , also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data —to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka , UVA School of Medicine

Data Science As a Driver Its Just the Beginning…. https:// zenodo.org /records/7768414 Data scientist jobs are predicted to experience 36 percent growth between 2021 and 2031 , according to the US Bureau of Labor Statistics.  The global data science platform market size was valued at USD 64.14 billion in 2021 and is  projected to grow from USD 81.47 billion in 2022 to USD 484.17 billion by 2029 , exhibiting a CAGR of 29.0% during the forecast period. Data science is the fastest emerging field around the globe .  57 Member Institutions

Given these precedents about data and data science we should start with a definition/framework

Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… http:// vadlo.com / cartoons.php?id =357

One Definition of Data Science – The 4+1 Model (aka domains) Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]

The Data Science Interplay Value + Design = Openness, responsibility Value + Analytics = Human centered AI, algorithmic bias Value + Systems = sustainability, access, environmental impact Design + Analytics = literate programming, visualization Design + Systems = dashboards, engineering design Analytics + Systems = ML engineering Thinking of data as a science unto itself is novel and perhaps controversial [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]

With this definition let’s explore the implications for biomedical research …

The 4+1 Model - Systems Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]

Systems…. Science, 377 (6603), . DOI: 10.1126/science.abo5947

Systems…. Need something akin to the electricity grid or banking system Need to consider data and methods as first-class data objects Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science Mesh, the China Science and Technology (CST) Cloud, the African Open Science Platform, the South African National Integrated Cyber Infrastructure System, the Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital Research Alliance of Canada (formerly known as the New Digital Research Infrastructure Organization), and the Arab States Research and Education Network. Problems span funding agencies; solutions do not There is a lack of public-private partnership

Analytics ….

AlphaGo – Take Home Messages https:// www.alphagomovie.com / Even the programmers were disquieted by creating something better than any human AlphaGo made a move that no human Go expert nor programmer anticipated It takes a lot of resources to defeat the world champion Go has more moves than there are atoms in the universe

Proteins have ~20**300 combinations also more than the number of atoms in the universe

Science Games…. https:// medium.com / proteinqure /welcome-into-the-fold-bbd3f3b19fdd

AlphaFold2 Makes Significant Leap

AlphaFold2 Numerical optimization – differential programming Overall gradient descent trained to win CASP Jumper et al.., 2021. Nature , 596 (7873), pp.583-589 Transformer models using attention Geometry invariant to translation/rotation

Logistics Behind the Win Nothing fundamentally new from an AI perspective Data Integration Collaboration not competition Engineering challenge beyond most labs Compute power beyond most labs Team size beyond most labs Worked with protein structure specialists

Downstream Implications Cooperation rather than competition Public-private partnership Translational possibilities are endless Made possible by curated open data Appreciate engineering

Scientific Implications

Exploration of Latent Space Rethink fold space? Rethink classification schemes?

AI Analytics Across the Scientific Discovery Process From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699

The 4+1 Model - Design Value – assuring societal benefit Design - Communication of the value of data Systems – the means to communicate and convey benefit Analytics – models and methods Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004 ]

Questions I Leave You With …. Are we indeed at a change point? Will biomedicine continue to lead data science? Do we need new models for doing science? Are we placing the right emphasis on our research products, notably data and methods vs papers?

Questions?

Databases organize data around a project . Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse Migratory Pressure Data Ecosystems How we think about our infrastructure is important

Challenges Fixed level of funding Opportunities data commons Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center. Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818. Systems [Adapted from Bob Grossman]

But wait the picture is more complicated….

Data Science versus Data Engineering – How Much Emphasis Where?

Coming back to the question… So we have a definition of data science and we have a set of guiding principles, where does this take us? Stated another way, what do we want to be recognized for in 10 years? https:// pebourne.wordpress.com /

Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry) … * Protect human subject data The right of human subjects to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl / CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Balance protecting human subject data with open research that benefits patients [Adapted from Bob Grossman] Value

Why Responsible Data Science? A defining feature A partnership between STEM, social sciences and the humanities Where UVA has strength

Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV Machine learning Statistics Biophysical Mathematics Semantic SNP methylation 3D structure Gene expression Proteomics Metabolomics Metabolic Signaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWAS Population dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 EHR’s Imaging Numerous Databases Environment Current roadblocks are more cultural than technical The Fifth Paradigm: Integration Across Scales?

Gohlke et al. 2022 https:// onlinelibrary.wiley.com / doi /10.1002/ctm2.726 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis EHR Animal Models Pathways

Daily Challenges Deciding what not to do Competition for the best team members (faculty and staff) Establishing a diverse team Lack of a comprehensive enterprise-wide data infrastructure Its easier to conform