Click here to close Hello! We notice that you are using Internet Explorer, which is not supported by Echinobase and may cause the site to display incorrectly. We suggest using a current version of Chrome, FireFox, or Safari.
J Chem Inf Model 2019 Nov 25;5911:4906-4920. doi: 10.1021/acs.jcim.9b00489.
Show Gene links Show Anatomy links

STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.

Cockroft NT , Cheng X , Fuchs JR .

Target fishing is the process of identifying the protein target of a bioactive small molecule. To do so experimentally requires a significant investment of time and resources, which can be expedited with a reliable computational target fishing model. The development of computational target fishing models using machine learning has become very popular over the last several years because of the increased availability of large amounts of public bioactivity data. Unfortunately, the applicability and performance of such models for natural products has not yet been comprehensively assessed. This is, in part, due to the relative lack of bioactivity data available for natural products compared to synthetic compounds. Moreover, the databases commonly used to train such models do not annotate which compounds are natural products, which makes the collection of a benchmarking set difficult. To address this knowledge gap, a data set composed of natural product structures and their associated protein targets was generated by cross-referencing 20 publicly available natural product databases with the bioactivity database ChEMBL. This data set contains 5589 compound-target pairs for 1943 unique compounds and 1023 unique targets. A synthetic data set comprising 107 190 compound-target pairs for 88 728 unique compounds and 1907 unique targets was used to train k-nearest neighbors, random forest, and multilayer perceptron models. The predictive performance of each model was assessed by stratified 10-fold cross-validation and benchmarking on the newly collected natural product data set. Strong performance was observed for each model during cross-validation with area under the receiver operating characteristic (AUROC) scores ranging from 0.94 to 0.99 and Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) scores from 0.89 to 0.94. When tested on the natural product data set, performance dramatically decreased with AUROC scores ranging from 0.70 to 0.85 and BEDROC scores from 0.43 to 0.59. However, the implementation of a model stacking approach, which uses logistic regression as a meta-classifier to combine model predictions, dramatically improved the ability to correctly predict the protein targets of natural products and increased the AUROC score to 0.94 and BEDROC score to 0.73. This stacked model was deployed as a web application, called STarFish, and has been made available for use to aid in target identification for natural products.

PubMed ID: 31589422
PMC ID: PMC7291623
Article link: J Chem Inf Model
Grant support: [+]

Genes referenced: LOC115925415 LOC583082

References [+] :
Afzal, A multi-label approach to target prediction taking ligand promiscuity into account. 2015, Pubmed