Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we will assume that you are happy to receive all cookies and you will not see this message again. Click 'Find out more' for information on how to change your cookie settings.

Any given human individual carries multiple genetic variants that disrupt protein-coding genes, through structural variation, as well as nucleotide variants and indels. Predicting the phenotypic consequences of a gene disruption remains a significant challenge. Current approaches employ information from a range of biological networks to predict which human genes are haploinsufficient (meaning two copies are required for normal function) or essential (meaning at least one copy is required for viability). Using recently available study gene sets, we show that these approaches are strongly biased towards providing accurate predictions for well-studied genes. By contrast, we derive a haploinsufficiency score from a combination of unbiased large-scale high-throughput datasets, including gene co-expression and genetic variation in over 6000 human exomes. Our approach provides a haploinsufficiency prediction for over twice as many genes currently unassociated with papers listed in Pubmed as three commonly-used approaches, and outperforms these approaches for predicting haploinsufficiency for less-studied genes. We also show that fine-tuning the predictor on a set of well-studied 'gold standard' haploinsufficient genes does not improve the prediction for less-studied genes. This new score can readily be used to prioritize gene disruptions resulting from any genetic variant, including copy number variants, indels and single-nucleotide variants.

Original publication

DOI

10.1093/nar/gkv474

Type

Journal article

Journal

Nucleic Acids Res

Publication Date

03/09/2015

Volume

43

Keywords

Animals, Disease, Gene Regulatory Networks, Genomics, Haploinsufficiency, Humans, Mice, Support Vector Machine