dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data and Workflow-Based Analyses" 04/26/2024

dkNET 35 views 21 slides May 02, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Presenter: Chen Li, PhD. Professor, Department of Computer Science, University of California Irvine

Abstract
Many data analytics projects have collaborators with complementary backgrounds, including biologists, bioinformaticians, computer scientists, and AI/ML experts. Many of them have limited exp...


Slide Content

Prof. Chen Li
Computer Science, UC Irvine
April 26, 2024
Texera: A Scalable Cloud Computing
Platform for Sharing Data and
Workflow-Based Analyses
1

2
Alice: Biologist (PI)
Sally: Bioinformatician Bob: Bioinformatician
Chen Li, UCI
Example: sequence analysis in biology

Differential
"gene" expression analysis
AI/ML analysis
Count Matrix
Cellranger
Quality Control
Data Integration
Clustering
Cluster Annotation
Trajectory analysis
Sequence
analysis
pipeline
FASTQ Sequences
Dimensionality
Reduction
Step 1
Step 2
Step 4
Step 5
Step 6
Step 7
Normalization Step 3

-Coding is hard!
-Version control of libraries
-Needs servers
-Slow on large data
-Not every lab can afford a
bioinformatician
4
Data preparation
Coding challenges
Data analytics
Visualization
Sally: Bioinformatician
Chen Li, UCI

Collaboration challenges
●Collaborators of different backgrounds:
○Biologists
○Bioinformaticians
○Computer scientists
●Collaborators from different organizations
○Same lab: senior students vs new students
○Other labs
5
Chen Li, UCI

Limitations
●Only file management, no run-time environment
●Inefficient!
Collaboration: existing tools
6
Chen Li, UCI

●How to utilize state-of-the-art AI/ML technologies?
●Require advanced coding skills
●Not easily available
AI/ML opportunities
7
Chen Li, UCI

Cloud-computing services for sharing data and workflow-based analyses
Benefits:
-Cloud services (no installation, software patches)
-Version control
-Shared editing/execution
-Sharing data and workflows
-Parallel engine, scalable
-…
Our solution
8
Chen Li, UCI

Open source
9
Chen Li, UCI

Texera example workflow
10
Chen Li, UCI

Demo!
11
Alice: Biologist (PI)
Sally: Bioinformatician Bob: Bioinformatician
Chen Li, UCI

Figures on the entire dataset
12
Quality Control
Elbow plot
Clustered UMAP
Annotated UMAP
Chen Li, UCI

Texera Statistics
13
# of user accounts 332# of projects 86
# of workflows 2,257# of executions 31,000
# of workflow versions 357,000# of publications 23
# of deployed servers 7# of CPU cores in the largest deployment 400
# of files on GitHub 1,291# of lines of code on GitHub 101,690
# of pull requests on GitHub 2,096# of current PhD students 7
# of collaborating professors 17# of involved undergraduates 80+
# of completed PhD theses 3# of development years 7
Chen Li, UCI

Example: analyzing brain images, 256GB
14

Prof. Chen Li
Teaching non-STEM students AI/ML using Texera
15

Prof. Chen Li
High-school students using Texera
16

Mission: to serve the diabetes, endocrinology, and metabolic diseases
research communities through the FAIR sharing of data and knowledge.
New NIH award (dkNet)
17
Chen Li, UCI

Pilot project: inviting users
18
Chen Li, UCI

-Support a ChatGPT-like interface
-Provide more operators and workflows related to sequencing
-Make analysis parameters configurable
-Parallelize bottleneck steps
-Make more AI/ML techniques available
-Migrate existing programs to workflows
-Support public clouds (e.g., AWS, GCP)
-…
Open research problems
19
Chen Li, UCI

-Cloud-computing platform
-GUI-based workflows (no coding needed)
-Collaboration and sharing of data/analyses
-Parallel computing: for big data
-Supporting multiple languages: Python, R, Java, …
-Supporting AI/ML (training, inference, …)
Summary
20
Chen Li, UCI

Prof. Chen Li
Computer Science, UC Irvine
Texera: A Scalable Cloud Computing Platform for
Sharing Data and Workflow-Based Analyses
21
Acknowledgements: Yicong Huang, Sally Lee, Xinyuan Lin, Xiaozhen
Liu, Kun Woo (Chris) Park, Kevin Wu, and the Texera team