Methods and Software Development

Research Overview

A major area of method development has been the development of rare-variant aggregate tests to detect associations and linkage which includes:

  • The Combined Multivariate and Collapsing (CMC) method (Li and Leal, 2008)
  • Kernel Based Adaptive Cluster (KBAC) method (Liu and Leal, 2010)
  • Burden of Rare Variant (BRV) which can control type I error when there is missing data (Auer et al. 2013)
  • A rare-variant transmission disequilibrium test (RV-TDT) (He et al. 2014)
  • A rare-variant generalized disequilibrium test (RV-GDT) to test for associations in extended families (He et al. 2017)
  • Control type I error when sequencing is used for variant discovery in a subset of samples (Liu and Leal, 2012a)
  • Rare-variant aggregate parametric linkage analysis using the collapsed haplotype method (Wang et al. 2015)
  • Rare-variant aggregate non-parametric linkage analysis for binary and quantitative traits using the collapsed haplotype method (Zhao et al 2019, 2020)

We also developed methods to estimate genetic effects and quantify the heritability for rare variants (Liu and Leal 2012e); detect secondary trait associations (Liu and Leal, 2012b; Liu and Leal, 2012c); control type I error when sequencing is used for variant discovery in a subset of samples (Liu and Leal, 2012a); and investigated the best strategies to design rare variant studies (Li and Leal 2009); and to perform replication studies (Liu and Leal 2010).

Researchers at the Center for Statistical Genetics are currently working on developing a family based quantitative rare-variant aggregate test, imputation methods for family data which includes sequence data obtained from the family as well as imputation reference panels; conditional logistic regression for genetics studies using biobank data; and testing for interactions using rare variants using the case-only design.

Software Development

In additional to developing methods members of the Center of Statistical Genetic have also developed software to aid in the analysis of genetic data. Below is a list of the software which has been developed

KBAC https://github.com/gaow/kbac (R package) or https://github.com/zhanxw/rvtests (rvtest)
The Kernel based adaptive cluster (KBAC) is an aggregate rare variant association test the implements adaptive weighting.
Reference: Liu DJ, Leal SM (2010) A novel adaptive method for the analysis of next generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genetics 6(10):e1001156 PMID: 20976247; PMC2954824

MendelProb https://github.com/statgenetics/mendelprob/
Description: Determines the probability of detecting a minimum number of families or cases with variants in the same gene. It can also calculate the probability of detecting genes with variants in different family types. MendelProb can also determine the number of probands which need to be screened to detect a minimum number of individuals with variants within the same gene.
Reference: He Z, DeWan AT, Leal SM (2019) MendelProb: Probability and sample size calculations for Mendelian studies of exome and whole genome sequence data. Bioinformatics Feb 1;35(3):529-531. PMID: 30032240; PMC6397596

PhenoMan https://code.google.com/p/phenoman/
Description: To perform quality control, exploration and selection of qualitative and quantitative traits.
Reference: Li B, Wang G, Leal SM (2013) PhenoMan: Phenotypic data exploration, selection, management and quality control for association studies of rare and common variants. Bioinformatics 30:442-4 PMID: 24336645; PMC3904519

RarePedSim http://bioinformatics.org/simped/rare/
Description: Generates sequence data for any pedigree structure conditional or unconditional on family members’ qualitative and quantitative traits. Can also generate phenotypes for pedigree members, conditional on generated sequence data.
Reference: Li B, Wang GT, Leal SM (2015) Generation of sequence-based data for pedigree-segregating Mendelian or Complex traits. Bioinformatics 31(22):3706-8 PMID: 26177964; PMC4757949

Rare Variant- Generalized Disequilibrium Test (RV-GDT) https://github.com/hezx/RV-GDT
Description: Software to perform the RV-GDT using sequence data. The RV-GDT performs aggregate rare variant association analysis of nuclear and extended pedigrees including pedigrees that contain missing genotype data.
Reference: He Z, Zhang D, Renton AE, Li B, Zhao L, Wang GT, Goate AM, Mayeux R, Leal SM (2017) The rare variant generalized disequilibrium test for association analysis of nuclear and extended pedigrees with application to Alzheimer’s Disease WGS data. American Journal of Human Genetics 100(2):193-204 PMID: 28065470; PMC5294711

Rare Variant- Nonparametric linkage (RV-NPL) https://github.com/statgenetics/rvnpl
Description: Software to perform the RV-NPL using sequence data. The RV-NPL performs aggregate rare variant non-parametric linkage analysis of nuclear and extended pedigrees including pedigrees that contain missing genotype data.
Reference: Zhao L, He Z, Zhang D, Wang GT, Renton AE, Vardarajan BN, Nothnagel M, Goate AM, Mayeux R, Leal SM (2019) A rare variant nonparametric linkage method for nuclear and extended pedigrees with application to late-onset Alzheimer’s Disease using whole genome sequence data. American Journal of Human Genetics Oct 3;105(4):822-835. PMID: 31585107, PMC6817540

Rare Variant- Transmission Disequilibrium Test (RV-TDT) http://bioinformatics.org/rv-tdt/
Description: Software to perform the RV-TDT using sequence data. Power calculations can also be performed for the RV-TDT.
Reference: He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, Li B, Kan M, Krumm N, Nickerson DA, Shendure J, Eichler EE, Leal SM (2014) Rare variant extensions of the transmission disequilibrium test: application to autism sequence data. American Journal of Human Genetics 94:33-46 PMID:24360806; PMC3882934

SEQLinkage http://www.bioinformatics.org/seqlink/
Description: Incorporates the collapsed haplotype pattern methods to perform linkage analysis analysis of sequence data.
Reference: Wang GT, Zhang D, Li B, Dai H, Leal SM (2015) Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequence Data. European Journal of Human Genetics 23:1739-43 PMID: 25873013; PMC4795207

SIMPED http://bioinformatics.org/simped
Description: To generate haplotype and genotype data for pedigrees of any size or structure
Reference: Leal SM, Yan K, Müller-Myhsok B (2005) SimPed: A simulation program to generate haplotype and genotype data for pedigree structures. Human Heredity 60:119-22 PMID: 16224189; PMC2909095

SEQPower http://www.bioinformatics.org/spower/start
Description: Power analysis for sequence based association studies
Reference: Wang GT, Li B, Santos-Cortez RL, Peng B, Leal SM (2014) Statistical power analysis for sequence-based association studies. Bioinformatics 30:2377-8 PMID: 24778108; PMC4133582

SEQSpark https://github.com/statgenetics/seqspark.git
Description: A complete sequence and imputed sequence data association analysis pipeline; implementing Apache Spark and Hardoop to perform to parallel processing allowing for the analysis of hundreds of thousands of samples. SEQSpark can perform quality control, annotation, single and rare variant aggregate (e.g. CMC, BRV, SKAT SKAT-O) association analysis and analysis of imputed data and meta-analysis.
Reference: Zhang D, Zhao L, Li B, He Z, Wang GT, Liu DJ, Leal SM (2017) SEQSpark: A complete analysis tool for large-scale rare variant association studies using whole genome and exome sequence data. American Journal of Human Genetics 101 (1):115-122; PMID: 28669402; PMC5501866

SIMRare https://code.google.com/p/simrare/
Description: Simulation of rare variant sequence data for method development and study design.
Reference: Li B, Wang G, Leal SM (2012) SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits. Bioinformatics 28:2703-4 PMID: 22914216; PMC3467746

Variant Association Tools (VAT) http://varianttools.sourceforge.net/Association/HomePage
Description: A pipeline to perform quality control and association analysis of sequence and genotype array data. VAT implements many of the commonly used rare variant association methods including CMC, WSS and SKAT to perform analysis of qualitative and quantitative traits. Can also analyze imputed data and perform meta-analysis.
Reference: Wang GT, Peng B, Leal SM (2014) Variant Association Tools for association analysis and quality control of large scale sequence and genotyping array data. American Journal of Human Genetics 94:770-83 PMID: 24791902; PMC4067555