Statistical method used in QSAR.pptx

2,175 views 18 slides May 17, 2023
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

This presentation is about Statistical method used in QSAR which is the part of computer aided drug design. In this slide we deals with choosing the descriptors or independent variables and validation about them .Linear Regression method, Non linear Regression method, Partial least square method, Cl...


Slide Content

Statistical Method Used in QSAR SUBMITTED BY:  UPASANA SHARMA  (M.PHARMA :Pharmaceutical Chemistry) SUBMITTED TO:  Dr. SALAHUDDIN  (PROFESSOR) NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY (PHARMACY INSTITUTE) GREATER NOIDA  1 UPASANA SHARMA (16/05/2023)

Collection of ligand  Generation of descriptor s  Features selection  Construction of model Validation of model  QSAR Work Flow:  2 UPASANA SHARMA (16/05/2023)

Build models and calculate minimum energy conformation Calculate descriptors   Short list descriptors  Based on correlation coefficient  Based on cross correlation coefficient  Based on dissimilarity distance  Based on cluster analysis  Based on genetic function approach  Develop regression relationship and estimate statistics (R 2 ,R 2adj ,R 2pre ,F test, Residuals) Test model with external data set  QSAR Model  3 UPASANA SHARMA (16/05/2023)

Model construction +Features selection= Statistical analysis ( for large no. of descriptors or few no.  of descriptors) Method  Regression Based approach   Classification Based approach   Machine Learning Technology  Simple Linear Regression method  Multiple linear regression method  Partial least square method  Cluster analysis  Principal component analysis Logistic regression  Artificial neural network  Support vector machine  Gene expressing programming   Linear properties  Linear regression  Partial regression  Semi supervised learning algorithm Unsupervised learning algorithm Non-linear properties Supervised learning algorithm   Model Artificial neural network  Model can be construct using 4 UPASANA SHARMA (16/05/2023)

Validation: Regression based QSAR model  Validation metrics for internal validation  Least square fitting  Chi-Squared x2 and root mean squared error (RMSE) Cross validation  Leave one out cross validation LOO Leave some out cross validation LSO  True Q2 and rm2 metrics  Validation metrics for external validation  Predictive R2 (Q2F1) Q2 F2 and Q3 F3 Golbraikh and tropsha's criteria  Metrics include (RMSEP) root mean square for prediction   Validation metrics for classification based method  Wilks lamda statistics  Lower value   Canonical index (Rc) Chi-square x2 Squared mahalanobis distance   5 UPASANA SHARMA (16/05/2023)

Simple Linear regression method : 1 descriptor  Y= b+b1x1+e Multiple Linear Regression method:   Y=b+b1x1+b2x2+bnxn+e Non-linear regression method : 1parameter is not linear  Y=n(x ,B)+e (B= unknown parameter) Observational data are molded by a function which is non-linear combination of the model parameter and depend on one or more independent variables. Regression based method : 6 UPASANA SHARMA (16/05/2023)

UPASANA SHARMA (16/05/2023) 7 Partial least square method: The principal component regression. predict or analyze a set of dependent variables from a set of independent variables by multivariant statistical method done from multiple regression analysis.   Applied in 3D- QSAR technique, Comparative Molecular Field Analysis ( CoMFA )  It used combination with GPLS genetic, FAPLS (factor analysis), OSCPLS (orthogonal signal correction)  COMPACT( computer optimized molecular parametric analysis of chemical toxicity) predict carcinogenicity and other forms of toxicity. Software:                      SIMCA-P                      UNSCRAMBLER                       SPM

Classification based approach : Cluster analysis: Clustering involves placing similar data into a group in a way that maximizes similarity within groups and dissimilarity between groups. Multivariate technique which analysis the group based on distance (proximity).  In hierarchical clustering , in a agglomerative and divisive form The k-means clustering is a partition based clustering Data reduction and hypotheses generation  use in data mining ,statistical data analysis ,pattern recognition. 8 UPASANA SHARMA (16/05/2023)

UPASANA SHARMA (16/05/2023) 9 PRINCIPAL COMPONENT ANALYSIS: It transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. PCA reduces attribute space from a larger number of variables to a small number of factors (non-dependent variable). PCA is a dimensionality reduction or data compression method and there is no guarantee that the dimensions are interpretable.  Objective : To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component.

Machine learning techniques : Artificial neural network Mimics the behavior of biological neurons.  It has input layer       hidden layer       output layer Types: Propagation neural networks, probabilistic neural networks, Kohonen self-organizing maps and Bayesian regularized neural networks. Support vector machine   Uses a linear classifier to classify data into two categories.   It used in combination with other methods like MLR, PLS and so forth for building more powerful and accurate QSAR models Gene expressing programming  The genetic algorithm and genetic programming.  It used for calculate the dermal penetration, EC50, Binding Affinity ,Improved gene expressing programming  10 UPASANA SHARMA (16/05/2023)

Validation : It avoid chance correlation of numerous descriptors used in the model and also over-fitting of data. It assign the accuracy and prediction of the model. (training set and test set).  The Organization for Economic Cooperation and Development (OECD) give 5 principles to test the model.   1) a defined endpoint  2) an unambiguous algorithm  3) a defined domain of applicability 4) appropriate measures of goodness-of-fit, robustness and prediction accuracy 5) a mechanistic interpretation 11 UPASANA SHARMA (16/05/2023)

Regression based QSAR model : Validation metrics for internal validation  The use of molecules from training set to test the predictability of the model Least square fitting  It is the measure of square correlation coefficient (R2) between the predicted and experimental value of activity. The difference between R2 and R2 adj < 0.3 , QSAR validation is good. The χ2 Chi-squared and RMSE: The values are used to assess the predictive quality of a model.   χ2 value shows the difference between experimentally determined bioactivity values and the values predicted by the model  RMSE value of for large R2 value (that is >=7), values of χ2 and RMSE should < than 0.5 and 0.3 respectively. 12 UPASANA SHARMA (16/05/2023)

Cross validation : It is internal validation include Leave-Group-Out (LGO), which involves leaving of a molecule or a group of molecules while creating model and evaluating the predictability of the model using the molecules left . In LOO cross validation (leave-one-out)  one compound is left out and the QSAR model is constructed using remaining compounds.  The eliminated compound is used as a test for the predicted model  The predictability of the model is assessed by PRESS (Predicted Residual Sum of Squares) and cross-validated R2 (Q2 ) when SDEP (Standard Deviation of Error of Prediction) is obtained from PRESS 13 UPASANA SHARMA (16/05/2023)

True Q2 value: True Q2  is used for small data sets   Q2 should not be treated as an ultimate proof for good predictability of models.  Value of Q2 = < 0.5   LSO (Leave-Some-Out) or LMO (Leave-Many-Out)  It is set of data compounds are eliminated and models are created with rest of the compounds.  The left out compounds are then used to check the predictability of the model.  14 UPASANA SHARMA (16/05/2023) Low RMSE value  and High R2 value 

Validation metrics for external validation  Predictive R2 or Q2 (F1) = correlation of observed and predicted data. Model is  good if predictive power has value of Q2 (F1) =< 0.5    Q2 (F2) and Q2 (F3) = using the mean of test data set and training data . For validation of QSAR model, threshold value of 0.5 is defined for both metrics    Golbraikh and Tropsha’s criteria = forth condition for selection of training and test data sets.  For having a good predictive power, QSAR model should satisfy following condition i . Q 2 training > 0.5 ii. R 2 test > 0.6 iii. (r2 - r 2 0)/ r 2 < 0.1 or (r2 – r’2 0)/ r 2 < 0.1, where r2 0 is R2 of predicted vs. observed activities and r’2 0 is R2 of observed vs. predicted activities.  iv. 0.85 <= k <= 1.15 or 0.85 <= k’<= 1.15, where k and k’ are the slopes of regression lines  Other metrics includes RMSEP (Root Mean Square Error of Prediction) to calculate prediction error of QSAR model  15 UPASANA SHARMA (16/05/2023)

Validation metrics for classification based methods  -cluster analysis and PCA (principal component analysis): The validation matrix employed in classification-based methods is   Wilks lambda (λ) statistics: It is sum of squares to total dispersion. The value ranges between 0< λ <1  lower value corresponding = higher level of discrimination.  Canonical index (Rc): It is used to estimate the strength of relationship between various dependent and independent variables Chi-square (χ2):  to check the quality of the classification based model Squared Mahalanobis : distance is a measure calculated using random data points  16 UPASANA SHARMA (16/05/2023)

REFRENCES :  Pirhadi S, Shiri F, Ghasemi JB. Multivariate statistical analysis methods in QSAR. Rsc Advances. 2015;5(127):104635-65. Damme SV, Bultinck AR. Journal of Computational Chemistry. 2007 Aug;28(11):1924-8. De Oliveira DB, Gaudio AC. BuildQSAR : a new computer program for QSAR analysis. Quantitative Structure‐Activity Relationships: An International Journal Devoted to Fundamental and Practical Aspects of Electroanalysis. 2000 Dec;19(6):599-601. Verma J, Khedkar VM, Coutinho EC. 3D-QSAR in drug design-a review. Current topics in medicinal chemistry. 2010 Jan 1;10(1):95-115. UPASANA SHARMA (16/05/2023) 17

UPASANA SHARMA (16/05/2023) 18 THANKYOU