Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
J Chem Inf Model
2019 Nov 25;5911:4906-4920. doi: 10.1021/acs.jcim.9b00489.
Show Gene links
Show Anatomy links
STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.
Cockroft NT
,
Cheng X
,
Fuchs JR
.
???displayArticle.abstract???
Target fishing is the process of identifying the protein target of a bioactive small molecule. To do so experimentally requires a significant investment of time and resources, which can be expedited with a reliable computational target fishing model. The development of computational target fishing models using machine learning has become very popular over the last several years because of the increased availability of large amounts of public bioactivity data. Unfortunately, the applicability and performance of such models for natural products has not yet been comprehensively assessed. This is, in part, due to the relative lack of bioactivity data available for natural products compared to synthetic compounds. Moreover, the databases commonly used to train such models do not annotate which compounds are natural products, which makes the collection of a benchmarking set difficult. To address this knowledge gap, a data set composed of natural product structures and their associated protein targets was generated by cross-referencing 20 publicly available natural product databases with the bioactivity database ChEMBL. This data set contains 5589 compound-target pairs for 1943 unique compounds and 1023 unique targets. A synthetic data set comprising 107 190 compound-target pairs for 88 728 unique compounds and 1907 unique targets was used to train k-nearest neighbors, random forest, and multilayer perceptron models. The predictive performance of each model was assessed by stratified 10-fold cross-validation and benchmarking on the newly collected natural product data set. Strong performance was observed for each model during cross-validation with area under the receiver operating characteristic (AUROC) scores ranging from 0.94 to 0.99 and Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) scores from 0.89 to 0.94. When tested on the natural product data set, performance dramatically decreased with AUROC scores ranging from 0.70 to 0.85 and BEDROC scores from 0.43 to 0.59. However, the implementation of a model stacking approach, which uses logistic regression as a meta-classifier to combine model predictions, dramatically improved the ability to correctly predict the protein targets of natural products and increased the AUROC score to 0.94 and BEDROC score to 0.73. This stacked model was deployed as a web application, called STarFish, and has been made available for use to aid in target identification for natural products.
Afzal,
A multi-label approach to target prediction taking ligand promiscuity into account.
2015, Pubmed
Afzal,
A multi-label approach to target prediction taking ligand promiscuity into account.
2015,
Pubmed
Banerjee,
Super Natural II--a database of natural products.
2015,
Pubmed
Binns,
QuickGO: a web-based tool for Gene Ontology searching.
2009,
Pubmed
Burbidge,
Drug design by machine learning: support vector machines for pharmaceutical data analysis.
2001,
Pubmed
Chen,
Data Resources for the Computer-Guided Discovery of Bioactive Natural Products.
2017,
Pubmed
Chen,
TCM Database@Taiwan: the world's largest traditional Chinese medicine database for drug screening in silico.
2011,
Pubmed
Dajas-Bailador,
Dopaminergic pharmacology and antioxidant properties of pukateine, a natural product lead for the design of agents increasing dopamine neurotransmission.
1999,
Pubmed
Eder,
The discovery of first-in-class drugs: origins and evolution.
2014,
Pubmed
Fang,
Quantitative and Systems Pharmacology. 1. In Silico Prediction of Drug-Target Interactions of Natural Products Enables New Targeted Cancer Therapy.
2017,
Pubmed
Gaulton,
The ChEMBL database in 2017.
2017,
Pubmed
Greene,
ontologyX: a suite of R packages for working with ontological data.
2017,
Pubmed
Grenet,
Stacked Generalization with Applicability Domain Outperforms Simple QSAR on in Vitro Toxicological Data.
2019,
Pubmed
Gu,
Use of natural products as chemical library for drug discovery and network pharmacology.
2013,
Pubmed
Hatherley,
SANCDB: a South African natural compound database.
2015,
Pubmed
Keiser,
Relating protein pharmacology by ligand chemistry.
2007,
Pubmed
Keum,
Prediction of compound-target interactions of natural products using large-scale drug and protein information.
2016,
Pubmed
Kim,
PubChem 2019 update: improved access to chemical data.
2019,
Pubmed
Lenselink,
Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.
2017,
Pubmed
Li,
Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.
2019,
Pubmed
Lin,
TIPdb: a database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan.
2013,
Pubmed
Liu,
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.
2019,
Pubmed
Lopez-Del Rio,
Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.
2019,
Pubmed
Mangal,
NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database.
2013,
Pubmed
Mayr,
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL.
2018,
Pubmed
Mazandu,
Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?
2014,
Pubmed
Moffat,
Opportunities and challenges in phenotypic drug discovery: an industry perspective.
2017,
Pubmed
Munusamy,
Structure-based identification of aporphines with selective 5-HT(2A) receptor-binding activity.
2013,
Pubmed
Newman,
Natural Products as Sources of New Drugs from 1981 to 2014.
2016,
Pubmed
Ntie-Kang,
AfroDb: a select highly potent and diverse natural product library from African medicinal plants.
2013,
Pubmed
Ntie-Kang,
Virtualizing the p-ANAPL library: a step towards drug discovery from African medicinal plants.
2014,
Pubmed
Ntie-Kang,
Pharmacophore modeling and in silico toxicity assessment of potential anticancer agents from African medicinal plants.
2016,
Pubmed
Ntie-Kang,
NANPDB: A Resource for Natural Products from Northern African Sources.
2017,
Pubmed
Oda,
Quantitative chemical proteomics for identifying candidate drug targets.
2003,
Pubmed
Onguéné,
The potential of anti-malarial compounds derived from African medicinal plants, part III: an in silico evaluation of drug metabolism and pharmacokinetics profiling.
2014,
Pubmed
Peón,
Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space.
2017,
Pubmed
Pilon,
NuBBEDB: an updated database to uncover chemical and biological information from Brazilian biodiversity.
2017,
Pubmed
Ponnala,
Evaluation of structural effects on 5-HT(2A) receptor antagonism by aporphines: identification of a new aporphine with 5-HT(2A) antagonist activity.
2014,
Pubmed
Reker,
Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.
2014,
Pubmed
Reker,
Revealing the macromolecular targets of complex natural products.
2014,
Pubmed
Rifaioglu,
Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.
2019,
Pubmed
Rodrigues,
Harnessing the potential of natural products in drug discovery from a cheminformatics vantage point.
2017,
Pubmed
Rogers,
Extended-connectivity fingerprints.
2010,
Pubmed
Schenone,
Target identification and mechanism of action in chemical biology and drug discovery.
2013,
Pubmed
Sterling,
ZINC 15--Ligand Discovery for Everyone.
2015,
Pubmed
Svetnik,
Random forest: a classification and regression tool for compound classification and QSAR modeling.
2003,
Pubmed
Swinney,
How were new medicines discovered?
2011,
Pubmed
Sydow,
Advances and Challenges in Computational Target Prediction.
2019,
Pubmed
Teng,
Measuring gene functional similarity based on group-wise comparison of GO terms.
2013,
Pubmed
Truchon,
Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem.
2007,
Pubmed
Veenman,
The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier.
2005,
Pubmed
Weichenberger,
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework.
2017,
Pubmed
Xia,
Classification of kinase inhibitors using a Bayesian model.
2004,
Pubmed
Yabuzaki,
Carotenoids Database: structures, chemical fingerprints and distribution among organisms.
2017,
Pubmed
Zeng,
NPASS: natural product activity and species source database for natural product research, discovery and tool development.
2018,
Pubmed