GWAS with nf-gwas

nf-gwas is a Nextflow pipeline designed for biobank-scale genome-wide association studies (GWAS). The pipeline automatically performs multiple pre- and post-processing steps, integrates regression modeling from the REGENIE package, and currently supports single-variant, gene-based, and interaction testing. All modules are structured as sub-workflows, allowing future extensions to other methods and tools. nf-gwas includes extensive reporting functionality, enabling users to inspect thousands of phenotypes and navigate interactive Manhattan plots directly in a web browser.

Prerequisites

The following software is required to set up your local imputation workflow. This has been tested on various Linux distributions as well as macOS:

Java 17 or higher
Nextflow
Docker or Singularity

Setup

First, we need to prepare the input files for nf-gwas. The files are also available here.

mkdir nf-gwas-test
cd nf-gwas-test
# Download phenotype and covariates
wget https://genepi.i-med.ac.at/downloads/imputation/phenotypes.txt
wget https://genepi.i-med.ac.at/downloads/imputation/covariates.txt
# Download imputed data
wget https://genepi.i-med.ac.at/downloads/imputation/gwas.imputed.chr20.dose.vcf.gz
# Download annotation files for chr20
wget https://genepi.i-med.ac.at/downloads/imputation/rsids-v154-hg19-chr20.index.gz .
wget https://genepi.i-med.ac.at/downloads/imputation/rsids-v154-hg19-chr20.index.gz.tbi .

nf-gwas Parameters

The nf-gwas pipeline supports a variety of configurable parameters.
In the examples below, we demonstrate a simplified GWAS analysis where we skip Step 1 of REGENIE — the model-fitting stage.

Step 1 of REGENIE usually fits a ridge regression model to predict the phenotype (or residuals) from genotypes, accounting for population structure and relatedness. Since regenie_skip_predictions = true, this step is bypassed, and we proceed directly to Step 2, which performs the association testing.

Below is a breakdown of the main parameters:

Parameter	Description
`project`	A name for the analysis run. Useful for organizing output files.
`genotypes_association`	The genotype file used in Step 2. Here, it’s a compressed VCF file (`.vcf.gz`) containing imputed genotypes for chromosome 20.
`genotypes_association_format`	Specifies the format of the genotype file (`vcf` here). Internally, nf-gwas converts this VCF into PLINK2 format for REGENIE.
`association_build`	The genome reference build used (`hg19` or `hg38`).
`phenotypes_filename`	File containing phenotype data for each individual.
`phenotypes_columns`	Specific phenotype columns to include (`pheno_2` and `pheno_4`).
`phenotypes_binary_trait`	Whether phenotypes are binary (`true`) or continuous (`false`).
`covariates_filename`	File listing covariates (e.g., PCs) to adjust for population stratification.
`covariates_columns`	Covariate columns to include (`PC1–PC10`).
`regenie_test`	Genetic model to test; here, additive.
`regenie_min_imputation_score`	Minimum genotype imputation quality score; variants below are excluded.
`rsids_filename`	Index file mapping variant positions to rsIDs for annotation.
`binning_size`	Genomic bin size (in base pairs) for plotting and indexing results.

This configuration performs a straightforward GWAS on two continuous traits (pheno_2 and pheno_4), adjusting for population structure using the top 10 principal components (PCs).

GWAS Without PCA

We first run a linear regression GWAS on a simulated phenotype without adjusting for population structure. Please create a gwas.config file with this content:

params {
  project                        = 'test-gwas'
  genotypes_association          = 'gwas.imputed.chr20.dose.vcf.gz'
  regenie_skip_predictions       = true
  association_build              = 'hg19'
  genotypes_association_format   = 'vcf'
  phenotypes_filename            = 'phenotypes.txt'
  phenotypes_columns             = 'pheno_2,pheno_4'
  phenotypes_binary_trait        = false
  regenie_test                   = 'additive'
  regenie_min_imputation_score   = 0.3
  rsids_filename                 = "rsids-v154-hg19-chr20.index.gz"
  binning_size                   = 50000
}

Run GWAS

Execute the pipeline:

nextflow run genepi/nf-gwas -r v1.0.11  -profile docker -c gwas.config

Note

If you are running this on a server with Singularity, change the profile to "singularity" instead of "docker".

Results

All results (REGENIE outputs, top hits, graphical reports) are available within the results folder.

GWAS With PCA

Next, we run a linear regression GWAS on a simulated phenotype while adjusting for population structure using principal components. Please create a gwas_with_pcs.config file with this content:

params {
  project                        = 'test-gwas-with-pcs'
  genotypes_association          = 'gwas.imputed.chr20.dose.vcf.gz'
  regenie_skip_predictions       = true
  association_build              = 'hg19'
  genotypes_association_format   = 'vcf'
  phenotypes_filename            = 'phenotypes.txt'
  phenotypes_columns             = 'pheno_2,pheno_4'
  phenotypes_binary_trait        = false
  regenie_test                   = 'additive'
  regenie_min_imputation_score   = 0.3
  covariates_filename            = 'covariates.txt'
  covariates_columns             = 'PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10'
  rsids_filename                 = "rsids-v154-hg19-chr20.index.gz"
  binning_size                   = 50000
}

Run GWAS

Execute the pipeline:

nextflow run genepi/nf-gwas -r v1.0.11 -profile docker -c gwas_with_pcs.config

Results

All results (REGENIE outputs, top hits, and graphical reports) are available in the results folder. The folder also includes an HTML file containing an interactive Manhattan plot:

GWAS with nf-gwas

Prerequisites

Setup

nf-gwas Parameters

GWAS Without PCA

Run GWAS

Results

GWAS With PCA

Run GWAS

Results

Manhattan Plot without PCs

Manhattan Plot with PCs