QSAR statistical methods for drug discovery(pharmacology m.pharm2nd sem)

24,044 views 25 slides Apr 26, 2019
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

PRINCIPLES DRUG DISCOVERY-unit 4
regression analysis, PLS, and other methods for QSAR statistical methods.application of statistical methods.


Slide Content

QSAR STATISTICAL METHODS PRESENTED BY-GAYATRI SATI CLASS-M.PHARMA- 2 nd sem (PHARMACOLOGY)

TABLE OF CONTENTS INTRODUCTION OF QSAR QSAR STATISTICAL METHODS REGRESSION ANALYSIS APPLICATION OF REGRESSION ANALYSIS PARTIAL LEAST SQUARE ANALYSIS APLICATION OF PLS OTHER METHODS REFERENCES

INTRODUCTION OF QSAR Quantitative structure activity relationship (QSAR) is a strategy of the essential importance for chemistry and pharmacy, based on the idea that when we change a structure of a molecule then also the activity or property of the substance will be modified . QSAR are mathematical relationships between the physicochemical properties and pharmacological/biological activity in a quantitative manner for a series of compound. Biological activity= f (physicochemical properties and /or structure properties) Statistics is a branch of mathematics dealing with data collection , organization, analysis, interpretation and presentation

INDTRODUCTION REGRESSION ANALYSIS In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. Regression analysis correlates independent X variables with dependent Y variables. If two variables are involved, the variable that is basis of estimation is called the independent variable and the variable whose value is to be estimated is called as dependent variable. For any given values of X, the Y values are independent and follow a normal distribution curve.

DEFINITION OF REGRESSION ANALYSIS Regression analysis is a technique of studying the dependence of one variable (called dependent Y variable e.g. biological data ) on one or more variables (called independent X variable e.g. physicochemical parameters ) with a view to estimate or predict the average value of dependent variable in terms of known or fixed values of the independent variable. The dependent variable is also called as- »Explained »Response »Endogenous The independent variable is also called as- »Explanatory »Regressor »Exogenous

REGRESSION MODELS Regression models involve the following parameters and variables. The unknown parameter known as β, which may be a scalar or vector A regression model relates Y to a function of X and β Y ≈ f (X, β ) where; f = function β = unknown parameter X=independent variable Y=dependent variable

Assume now that the vector of unknown parameters β is of length K, In order to perform a regression analysis the user must provide information about the dependent variable Y If N data points of the form (Y, X) are observed, where N < K , most classical approaches to regression analysis cannot be performed . If N = K data points are observed, and the function f is linear, the equations Y ≈ f (X, β) can be solved exactly rather than approximately. If N > K data points are observed, there is enough information in the data to estimate the unique value for β.

SIMPLE LINEAR REGRESSION MODEL In simple linear regression there is only single explanatory variable Simple linear regression is applied when you to want to predict the value of one variable, given values of other variables.

SIMPLE LINEAR REGRESSION   S imple linear regression  for a derivation of these formulas  Yᵢ = β ̥ + β 1 Xᵢ + ε ᵢ Where, Yᵢ=Dependent variable β ̥ =Population Y intercept β 1= Population slope coefficient linear component Xᵢ=Independent variable ε ᵢ=Random error term Random error component

MULTIPLE LINEAR REGRESSION Multiple linear regression is the same idea as simple linear regression, except how you have several independent variables predicting the dependent variables It is used when we want to predict the value of a variable based on the value of two or more other variables Y= β ̥ + β 1X 1 + β 2 X2 +……….+ β n Xn + ε Where, N=number of variable β̥=intercept term β n=Coefficients for independent variable β= unknown parameter

USES OF REGRESSION ANALYSIS Regression analysis helps in establishing the relationship between two or more variables Regression analysis predicts the value of dependent variables from the values of independent variables Coefficient of correlation and coefficient of determination can be calculated with the help of regression analysis Regression analysis is widely used as statistical tool in QSAR .

PARTIAL LEAST SQUARE ANALYSIS(PLS) Partial least square analysis (PLS) is a method for constructing predictive models when the factors are many and collinear It is a recent technique that generalizes and combines features from principal component analysis and multiple regression Goal-predict set of dependent variables Y from a set of independent variables X describe their common structure Used to Find the fundamental relations between the two variables/matrices (X and Y) COMPACT (computer optimized molecular parametric analysis of chemical toxicity), a PLS approach, is described to predict carcinogenicity and other forms of toxicity.

SOFTWARES USED IN PLS Its application depends on the availability of software SIMCA-P UNSCRAMBLER SPM SAS PROC PLS

APPLICATIONS OF PLS PLS is used to find the fundamental relations between two matrices (X and Y ) PLS model will try to find the maximum multidirectional direction in the X space and the maximum multidimensional direction in the Y space PLS regression is widely used in chemo metrics especially in the case where the number of independent variables is significantly larger than the number of data points and related areas It is also used in bioinformatics, sensometrics , neuroscience and anthropology .

OTHER MULTIVARIABLE STATISTICAL MODELS Cluster analysis Principal component analysis Regression based analysis methods Ordinary least square regression Generalized linear models

CLUSTER ANALYSIS Cluster analysis is a group of multivariate techniques whose primary purpose is to group objects based on the characteristics they possess. In cluster analysis, the grouping is based on the distance ( proximity) It is the main task of exploratory data mining , statistical data analysis, pattern recognition, image analysis, bioinformatics, data compression and computer graphics

Role &applications of cluster analysis ROLES- Data reduction Hypotheses generation APPLICATIONS- Medicine Analysis of antimicrobial activity Biology & bioinformatics Field of psychiatry Climate Sequence analysis Crime analysis & transcriptomic

PRINCIPAL COMPONENT ANALYSIS It is a exploratory technique used to reduce the dimensionality of data set to 2D or 3D PCA is a procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components Objective of PCA :- PCA is a dimensionality reduction or data compression method Goal of PCA:- To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component

APPLICATIONS OF PCA Neuroscience : A variant of PCA is used in neuroscience to identify the specific properties of a stimulus that increase a neuron’s probability of generating an action potential. This technique is known as spike triggered covariance analysis. In neuroscience, PCA is also used to discern the identify of a neuron from the shape of its action potential. Quantitative finance : PCA can be directly applied to the risk management of interest rate derivatives portfolios.

REGRESSION BASED ANALYSIS Ordinary least squares :- In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS is used in fields as diverse as economics (econometrics), data science , political science , psychology and engineering (control theory and signal processing ) Generalized linear model :- In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.

REFERENCES https:// en.wikipedia.org/wiki/Statistics www.statstutor.ac.uk/resources/uploaded/1introduction3.pdf http://home.iitk.ac.in/~ kundu/Statistical Methods.pdf https :// en.wikipedia.org/wiki/ Regression_analysis#Linear_regression

”Never trust a statistics you didn’t forge yourself ” - Winston Churchill THANK YOU