What is Combined Annotation Dependent Depletion (CADD)?
CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.
While many variant annotation and scoring tools are around, most annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Thus, a broadly applicable metric that objectively weights and integrates diverse information is needed. Combined Annotation Dependent Depletion (CADD) is a framework that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations.
C-scores strongly correlate with allelic diversity, pathogenicity of both coding and non-coding variants, and experimentally measured regulatory effects, and also highly rank causal variants within individual genome sequences. Finally, C-scores of complex trait-associated variants from genome-wide association studies (GWAS) are significantly higher than matched controls and correlate with study sample size, likely reflecting the increased accuracy of larger GWAS.
CADD can quantitatively prioritize functional, deleterious, and disease causal variants across a wide range of functional categories, effect sizes and genetic architectures and can be used prioritize causal variation in both research and clinical settings.
Our manuscript describing the method and its features was published by Nature Genetics in 2014: Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. PubMed PMID: 24487276.
How can I obtain CADD scores?
CADD scores are freely available for all non-commercial applications. If you are planning on using them in a commercial application, please contact Jay Shendure and Gregory M. Cooper. CADD is currently developed by Martin Kircher, Philipp Rentzsch, Daniela M. Witten, Gregory M. Cooper, and Jay Shendure.
We have pre-computed CADD-based scores (C-scores) for all 8.6 billion possible single nucleotide variants (SNVs) of the reference genome, a selection of short insertion/deletions as well as some large variant sets (e.g. gnomAD, ExAC, 1000 Genomes, ESP). We also provide a simple lookup for SNVs and enable scoring of short insertions/deletions.