Link Search Menu Expand Document

Welcome to nf-gwas!

A nextflow pipeline to perform genome-wide association studies (GWAS) using regenie.

Get started now   View it on GitHub


The pipeline takes files in BGEN (e.g. from UK Biobank) or VCF format (e.g. from Michigan Imputation Server) as an input and outputs association results, annotated tophits and an interactive HTML report including numerous plots (e.g. Manhattan Plot, QQ Plot) and statistics (e.g. phenotype histogram, top loci). The pipeline currently includes the following steps:

  1. Validate phenotype and covariate file (e.g. check file format, replace empty values with NA, create summary statistics)

  2. Convert imputed data in VCF format into the plink2 file format (optional).

  3. Prune micro-array data using plink2 (optional).

  4. Filter micro-array data using plink2 based on MAF, MAC, HWE, genotype missingess and sample missingness.

  5. Run regenie and tabix results to use with LocusZoom.

  6. Parse regenie log and create summary statistics.

  7. Filter regenie results by pvalue.

  8. Annotate filtered results using bedtools closest.

  9. Create a HTML report per phenotype including the annotated manhattan plot, qq plot, top loci, phenotype statistics and parsed log files.