ADMET.pptx

920 views 18 slides Nov 03, 2023

Slide 1 of 18

About This Presentation

"Optimizing Drug Discovery (ADMET) using Machine Learning" involves leveraging advanced algorithms to enhance the drug development process. By analyzing Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) data with ML models, researchers can predict a drug candidate's...

Size: 1.03 MB

Language: en

Added: Nov 03, 2023

Slides: 18 pages

Slide Content

Optimizing Drug Discovery using ADMET Translating Data into Actionable Insights and Decisions using ML Santu Chall ME, MCA

C10H9NO3 SMILES : SMILES ( S implified M olecular I nput L ine E ntry S ystem) is a concise notation for representing chemical structures in a line of text. For example : OC(=O)CN1C(=O)Cc2c1cccc2 Molecular Representation 0D/1D 2 D 3 D 4 D Descriptors : Molecular descriptors are quantitative values that characterize chemical structures, aiding in structure-property relationships and computational chemistry analysis . For example: MW, HBA, HBD, no_of_atom etc. Software : There are various software that can calculate and analyze chemical properties. Such as RDKit , ChemAxon , Dragon, PaDEL , MOE etc etc

Molecular Fingerprint Binary Representation of Molecule for fast, objective and compact “keyed” fingerprint indicates the present or absent of a structural features Task search and comparison, prediction and clustering Types of fingerprint Selecting the right Fingerprint

ADMET A bsorption D istribution M etabolism E xcretion/ E limination T oxicity

Data Selection Online Database : ChEMBL , PubChem , ChemDB , ChemSpider , DrugBank etc Scientific Reputed Journal : Journal of Chemical Information and Modeling , Journal of Cheminformatics , Journal of Computer-Aided Molecular Design etc etc Data Retrival from the Liturature : PubMed, ScienceDirect , Google Scholar, ACS Publications, Open-access journals etc etc

Data Division Random Division ( train_test_split (X, y, test_size =0.3, random_state =42 ) Kennord -Stone Division : Selecting the two data points that are farthest apart in the feature space. Activity Based Division : Selecting specific activity or property in predicting or modeling. Represent the full range of activity levels in the dataset . Euclidean Distance Based: Compute the Euclidean distance between all pairs of data in a multidimensional space. ( euclidean_distances = np.linalg.norm (X[:, np.newaxis ] - X, axis=2 ) K-Medois based: Clustering algorithm that divides data into groups. ( clusterer = KMedoids ( n_clusters =K, random_state =0 )

Feature Selection Genetic Algorithm : GA’s feature selection is the process of choosing a subset of the most relevant features (variables) from the original feature set to improve model performance and reduce computational complexity . ga = GeneticAlgorithm ( num_features = X.shape [1], fitness_func = fitness_function ) Lasso Feature Selection: Lasso ( L east A bsolute S hrinkage and S election O perator) adding a penalty term to the linear regression or logistic regression cost function, which encourages the model to set the coefficients of some features to zero, effectively removing them from the model. lasso = sklearn.linear_model.Lasso (alpha=1.0) Stepwise Selection: select the most relevant features (Forward Selection, Backward Elimination, Bidirectional Selection, Stopping Criteria) rfe = sklearn.feature_selection.RFE ( LogisticRegression (), 10) # Select the top 10 features

Learning Algorithm Supervised Regression - build predictive models for tasks where the goal is to predict a continuous numeric value. Example : Random Forest Regression(RF), Support Vector Regression(SVR),Decision Tree Regression ,K -Nearest Neighbors Regression, Neural Networks for Regression etc etc Classification - build models that categorize data into predefined classes or categories. Example: Logistic Regression, Decision Trees, Support Vector Machines (SVM ) , K-Nearest Neighbors (KNN) Unsupervised Clustering: used to group data into clusters based on inherent patterns or similarities in the data . Example: K-Means Clustering, X-Means, Gaussian Mixture Models (GMM) Dimensionally Reduction: used to reduce the number of features or dimensions in a dataset while preserving important information and patterns.Example : Principal Component Analysis (PCA), Independent Component Analysis (ICA ), Autoencoders

Absorption Property Definition Used Model and Method %Abs Absorption Rate Percentage through the (Intestinal) Barrier RF and MACCS Key %HIA The absorbed percentage through the human GI tract. RF and MACCS Key Caco2 Artificial membrane models predict absorption with paracellular and active transport. RF and Descriptor Pgp Inhibiting P-glycoprotein (P- gp ) function to enhance drug absorption. SVM and ECFP4 Amount absorbed Compound absorption weight per kilogram of body weight. RF and Descriptor

Distribution Property Definition Used Model and Method BBB partitioning Brain-blood barrier partitioning: Brain vs. blood concentration ratio (serum/plasma). SVM and ECFP2 %PPB Protein binding percentage of the compound in plasma. RF and Descriptor Vd Volume of distribution within the body RF and Descriptor Fbt Fraction bound in tissues SVM and Descriptor Ktb Tissue-blood partition coefficient measure the distribution of a substance between a specific tissue and the blood. SVM and PubChem FP

Metabolism Property Definition Used Model and Method Primary enzyme Predominant enzyme accountable for metabolism (CYP P450 1A2, 2C9, 2C19, 2D6, 3A4 etc ) 1A2 – SVM and ECFP4 2C9 – RF and ECFP2 2C19 – SVM and ECFP2 2D6 – RF and ECFP4 3A4 – SVM and ECFP4 % metabolised Overall percentage of metabolism SVM and MACCS % excreted The proportion of the compound excreted unchanged in urine. RF and Descriptor Vmax Maximum velocity of metabolic reaction SVM and MACCS Cliv Clearance rate in liver RF and Descriptor

Excretion/ Elemination Property Definition Used Model and Method Clr Renal clearance RF and Descriptor Cltot Total clearance across all routes SVM and MACCS key AUC Area under concentration time curve RF and Descriptor t 1⁄2 Half-life: Time for compound concentration to reduce by 50% RF and Descriptor Tmax Time to achieve peak concentration RF and Descriptor

Toxicity Property Definition Used Model and Method hERG hERG encodes a potassium ion channel potentially causing adverse effects on the heart's electrical activity. RF and Descriptor and MACCS LD 50 acute toxicity of a substance, meaning its potential to cause harm within a short period after exposure. RF and Descriptor DILI ingestion of a drug or medication leads to damage, injury, or dysfunction of the liver RF and MACCS key Hepatotoxicity harmful effects or damage to the liver caused by drugs RF and Descriptor SkinSen skin's response to certain allergens RF and MACCS

Model Analysis and Performance Predictive Variance : measures prediction variability; high variance means less precision . Calculation of MAPE (Mean Absolute Percentage Error), MAE (Mean Absolute Error ). Model Quality : refers to the effectiveness, reliability, and performance of a machine learning. Calculation of confusion matrix (Accuracy, Precision, Recall (Sensitivity), Specificity, F1 Score ). Error Analysis : investigate and analyze model errors to identify patterns or areas where the model may need improvement, then fine-tune the model or collect more relevant data . Check response times and throughput to ensure the model can handle the required workload without causing delays Model Versioning: k eep track of different model versions to understand which versions are performing best and to facilitate easy rollback in case of issues . Scheduled Retraining: set up a retraining schedule to periodically update the model with new data. This is essential to adapt to changing patterns in the data.

Model Deployment Source Code Management ( Git ) CI/CD ( Jenkins ) Container ( Docker ) Orchestration ( Ansible ) Log Analysis ( ELK, Grafna )

Model Monitoring Data Processing Issue: Data Quality Checks, Data Consistency, Input Validation , Pipeline Monitoring, Logging and Alerting Data Scheme Changes: Validate I ncoming data, Automated Alerts, Data Transformation Monitoring. Data Loss at the Source: Recovery Mechanisms, Data Ingestion Monitoring, Logging and Auditing Anomaly Detection : unusual behavior in model outputs or predictions that may indicate a problem, such as a sudden increase in errors Model Documentation : Data Sources, Testing and Validation, Model Performance

Current Working Generate molecule (or similar molecule) with(almost) desired properties using generative AI(RNN, GNN etc ) Checking fit score for compatibility Working on automated energy minimisation of structure. Working on DEL, EGFR VIII data analysis Working on various different biological data analysis(NGS, PacBio ) project. Github : https://github.com/santuchal/ ADMET Medium: https://medium.com/@santuchal/admet-an-essential-component-in-drug-discovery-and-development- f503a5aae5dd Streamlit : https://hav8whwegtyvgwjixnhxqw.streamlit.app/

ADMET.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

ADMET.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Earthquakes_Type of Faults_Science G8.pptx

Quiz #1 Science 10 in the first quarter for jhs

Astronomy history from long ago till doday

Great history of astronomy from long ago till today

EARTHQUAKE-DRILL.powerpoint.............

History of astronomy from old times to the present times