Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretabilityReport as inadecuate

Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability - Download this document for free, or read online. Document in PDF available to download.

Journal of Cheminformatics

, 8:60

First Online: 31 October 2016Received: 19 May 2016Accepted: 18 October 2016


BackgroundEven though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art QSAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for QSAR modeling.

ResultsWe show that it can be preferable to build QSAR models with circular fingerprint fragments that have been filtered by supervised feature selection, instead of applying folded or all fragments. Compared to folded fingerprints, filtered fingerprints significantly increase predictive performance and remain unambiguous and interpretable. Compared to unprocessed fingerprints, filtered fingerprints reduce the computational effort and are a more compact and less redundant feature representation. Depending on the selected learning algorithm filtering yields about equally predictive QSAR models. We demonstrate the suitability of filtered fingerprints for QSAR modeling by presenting our freely available web service Collision-free Filtered Circular Fingerprints that provides rationales for predictions by highlighting important structural features in the query compound see

ConclusionsCircular fingerprints are potent structural features that yield highly predictive models and encode interpretable structural information. However, to not lose interpretability, circular fingerprints should not be folded when building prediction models. Our experiments show that filtering is a suitable option to reduce the high computational effort when working with all fingerprint fragments. Additionally, our experiments suggest that the area under precision recall curve is a more sensible statistic for validating QSAR models for virtual screening than the area under ROC or other measures for early recognition.


AbstractOpen image in new windowKeywordsFingerprints QSAR Virtual screening Feature selection AbbreviationsCoFFerCollision-free Filtered Circular Fingerprints

QSARquantitative structure-activity relationship

ECFPextended-connectivity fingerprint

FCFPfunctional class fingerprint

ROCreceiver operating characteristic

AUROCarea under ROC curve

BEDROCBoltzmann-Enhanced Discrimination of ROC

EFenrichment factor

AUPRCarea under precision recall curve

CDKchemistry development kit

RESTrepresentational state transfer

RBFradial basis function

HPChigh-performance computing

Electronic supplementary materialThe online version of this article doi:10.1186-s13321-016-0173-z contains supplementary material, which is available to authorized users.

Download fulltext PDF

Author: Martin Gütlein - Stefan Kramer



Related documents