data_mining- principle and application in biology

ShibsekharRoy1 8 views 13 slides Jun 29, 2024

Slide 1 of 13

About This Presentation

Size: 482.04 KB

Language: en

Added: Jun 29, 2024

Slides: 13 pages

Slide Content

Data Mining
Data Data Data Data Data Data Data
Data Data Data Data Data Data Data
Data Data Data Data Data Data Data

●Howcanonefindallthemembersofahumangenefamily?
●Foragivenprotein,howcanonedeterminewhetherit
containsanyfunctionaldomainsofinterest?
●Howdoesonefindageneofinterestanddeterminethat
gene'sstructureandhowdoesoneeasilyexamineothergenes
inthatsameregion?
WHAT KIND OF INFORMATION YOU ARE MINING

uses informatics and statistics
helps extracting information out of a
huge amount of data
now accessible for everyone
DATA MINING

Data
•Publicly-available from Lambert Lab at
http://lambertlab.uams.edu/publicdata.htm
•105 samples run on Affymetrix HuGenFL
•74 Myeloma samples
•31 Normal samples

Three main data browsers
I.California university(http://genome.ucsc.edu/)
II.National Center for Biotechnology Information’s
Map-Viewer (http://www.ncbi.nlm.nih.gov/)
III.European Molecular Biology Laboratory -
European Bioinformatics Institute
(http://www.emsembl.org)

I.single-query analysis (-> genome browser)
II.selection of a set of genes that meet a criterion (->
"Sister programs")
III.more in-depth analysis (-> R/Bioconductor,
BiomaRt, ...)
3 levels in data mining

The genome browsers: UCSC & Ensembl
I. UCSC (University College of Santa Cruz)
Gene Sorter ●
Table Browser ●
II. Ensembl
BioMart

UCSC
Gene Sorter
Exploring genes families and the relationships among genes
Select genes based on several characteristic
UCSC
Gene Sorter
Table Browser
Query data using the database structure
Ensembl
BioMart
Database reorganised for an easier data minin
How toolboxes work

Common Approaches
•Comparing two measurements at a time
•Person 1, gene G: 1000
•Person 2, gene G: 3200
•Greater than 3-fold change: flag this gene
•Comparing one measurement with a population of
measurements… is it unlikely that the new
measurement was drawn from same distribution?

Approaches (Continued)
•Clustering or Unsupervised Data Mining
•Hierarchical Clustering, Self-Organizing (Kohonen) Maps
(SOMs), K-Means Clustering
•Cluster patients with similar expression patterns
•Cluster genes with similar patterns across patients or
samples (genes that go up or down together)

Approaches (Continued)
•Classification or Supervised Data Mining.
•Use our knowledge of class values… myeloma vs. normal,
positive response vs. no response to treatment, etc., to gain
added insight.
•Find genes that are best predictors of class.
•Can provide useful tests, e.g. for choosing treatment.
•If predictor is comprehensible, may provide novel insight,
e.g., point to a new therapeutic target.

Approaches (Continued)
•Classification or Supervised Learning.
•UC Santa Cruz: Furey et al. 2001 (support vector
machines).
•MIT Whitehead: Golub et al. 1999, Slonim et al. 2000
(voting).
•SNPs and Proteomics are coming.

Outline
•Data and Task
•Supervised Learning Approaches and Results
•Tree Models and Boosting
•Support Vector Machines
•Voting
•Bayesian Networks
•Conclusions

data_mining- principle and application in biology

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

data_mining- principle and application in biology

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

DTI BPI Pivot Small Business - BUSINESS START UP PLAN

CATHOLIC EDUCATIONAL Corporate Responsibilities

Karin Schaupp – Evocation; lançamento: 2000

Pillars of Biblical Oneness in the Book of Acts

7-10. STP + Branding and Product &amp; Services Strategies.pptx

Business Legislation PPT - UNIT 1 jimllpkggg

7-10. STP + Branding and Product & Services Strategies.pptx