Assessment of the significance of patent-derived information for the early identification of compound–target interaction hypothesesReport as inadecuate

Assessment of the significance of patent-derived information for the early identification of compound–target interaction hypotheses - Download this document for free, or read online. Document in PDF available to download.

Journal of Cheminformatics

, 9:26

First Online: 21 April 2017Received: 02 December 2016Accepted: 13 April 2017DOI: 10.1186-s13321-017-0214-2

Cite this article as: Senger, S. J Cheminform 2017 9: 26. doi:10.1186-s13321-017-0214-2


BackgroundPatents are an important source of information for effective decision making in drug discovery. Encouragingly, freely accessible patent-chemistry databases are now in the public domain. However, at present there is still a wide gap between relatively low coverage-high quality manually-curated data sources and high coverage data sources that use text mining and automated extraction of chemical structures. To secure much needed funding for further research and an improved infrastructure, hard evidence is required to demonstrate the significance of patent-derived information in drug discovery. Surprisingly little such evidence has been reported so far. To address this, the present study attempts to quantify the relevance of patents for formulating and substantiating hypotheses for compound–target interactions.

ResultsA manually-curated set of 130 compound–target interaction pairs annotated with what are considered to be the earliest patent and publication has been produced. The analysis of this set revealed that in stark contrast to what has been reported for novel chemical structures, only about 10% of the compound–target interaction pairs could be found in publications in the scientific literature within one year of being reported in patents. The average delay across all interaction pairs is close to 4 years. In an attempt to benchmark current capabilities, it was also examined how much of the benefit of using patent-derived information can be retained when a bioannotated version of SureChEMBL is used as secondary source for the patent literature. Encouragingly, this approach found the patents in the annotated set for 72% of the compound–target interaction pairs. Similarly, the effect of using the bioactivity database ChEMBL as secondary source for the scientific literature was studied. Here, the publications from the annotated set were only found for 46% of the compound–target interaction pairs.

ConclusionPatent-derived information is a significant enabler for formulating compound–target interaction hypotheses even in cases where the respective interaction is later reported in the scientific literature. The findings of this study clearly highlight the significance of future investments in the development and provision of databases and tools that will allow scientists to search patent information in a comprehensive, reliable, and efficient manner.

KeywordsPatents Patent chemistry databases SureChEMBL ChEMBL AbbreviationsAPIapplication programming interface

EPOEuropean Patent Office

GUIgraphical user interface

URIUniform Resource Identifier

USPTOUnited States Patent and Trademark Office

WIPOWorld Intellectual Property Organization

Electronic supplementary materialThe online version of this article doi:10.1186-s13321-017-0214-2 contains supplementary material, which is available to authorized users.

Author: Stefan Senger



Related documents