ReliableGenome : annotation of genomic regions with high-low variant calling concordanceReport as inadecuate

ReliableGenome : annotation of genomic regions with high-low variant calling concordance - Download this document for free, or read online. Document in PDF available to download.

Reference: Popitsch, N, Schuh, A and Taylor, J et al., (2016). ReliableGenome : annotation of genomic regions with high/low variant calling concordance. Bioinformatics, 33 (2), 155-160.Citable link to this page:


ReliableGenome : annotation of genomic regions with high/low variant calling concordance

Abstract: The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly-accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity.Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g., consensus calling methods) on the smaller, discordant share of the genome (20-30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines.RG was implemented in Java, source code and binaries are freely available for non-commercial use at CONTACT: niko[at] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Publication status:PublishedPeer Review status:Peer reviewedVersion:Publisher's version Funder: National Institute for Health Research Oxford Biomedical Research Centre   Notes:Copyright © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions[at]oup.comThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Bibliographic Details

Publisher: Oxford University Press

Publisher Website:

Journal: Bioinformaticssee more from them

Publication Website:

Volume: 33

Issue: 2

Extent: 155-160

Issue Date: 07 September 2016



Eissn: 1367-4811

Issn: 1367-4803

Uuid: uuid:a1496fd1-3e98-4671-a4d1-be563e45e6c4

Urn: uri:a1496fd1-3e98-4671-a4d1-be563e45e6c4

Pubs-id: pubs:642554 Item Description

Type: journal-article;

Language: eng

Version: Publisher's versionKeywords: WGS500 Consortium Journal Article


Author: Popitsch, N - Oxford, MSD, NDM, Human Genetics Wt Centre - - - Schuh, A - Oxford, MSD, Oncology - - - Taylor, J - Oxford, MSD, ND



Related documents