Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Welcome to nf-gwas!

A Nextflow pipeline to perform genome-wide association studies (GWAS).

Get started now   View it on GitHub


This cloud-ready GWAS pipeline allows you to run single variant tests and gene-based tests using regenie in an automated and reproducible way.

For single variant tests, the pipeline works with BGEN (e.g. from UK Biobank) or VCF files (e.g. from Michigan Imputation Server). For gene-based tests, we currently support BED files as an input. The pipeline outputs association results (tabixed, works with e.g. LocusZoom out of the box), annotated loci tophits and an interactive HTML report provding statistics and plots.

The single-variant pipeline currently includes the following steps:

  1. Validate phenotype and covariate file (e.g. check file format, replace empty values with NA, create summary statistics)

  2. Convert imputed data in VCF format into the plink2 file format (optional).

  3. Prune micro-array data using plink2 (optional).

  4. Filter micro-array data using plink2 based on MAF, MAC, HWE, genotype missingess and sample missingness.

  5. Run regenie and tabix results to use with LocusZoom.

  6. Parse regenie log and create summary statistics.

  7. Filter regenie results by pvalue.

  8. Annotate filtered results using bedtools closest.

  9. Create a HTML report per phenotype including the annotated manhattan plot, qq plot, top loci, phenotype statistics and parsed log files.