Some of the summary statistics that PLINK can generate (through different commands) include missing genotype rate (missingness), Hardy-Weinberg equilibrium, minor allele frequency, and linkage disequilibrium. There are many more things PLINK can do (including family-based association testing for disease traits), which are all further described on the PLINK website.

PLINK Job Script

The following script would run the PLINK commands found in the plink_frq file on a SLURM scheduler.

#!/bin/bash
#SBATCH -p public              # partition aka allocation
#SBATCH --qos general          # quality-of-service (priority)

module load  plink/1.07

./plink_frq

If you get an error code, try source plink_frq instead of ./plink_frq. The error dependent on where you’re sourcing the file from.

Tests

A sample PLINK information file, plink_frq, specifying what PLINK needs to run to test for minor allele frequency (MAF) is below.

# no web stops PLINK from updating before run;
# path needs to include folder and the name for the ped and map
plink --noweb --file /path/to/PED/and/MAP/files \
      --nonfounders \ # all individuals included
      --allow-no-sex \ # prevents setting phenotypes with "ambiguous" sex to missing
      --freq # actual test

Where the actual test is will be changed for each different test. If the path were ~/home/euid123/R_jobs/ and the .ped and .map were both named example, then the line would be ~/home/euid123/R_jobs/example \.

Table: Test specifics for PLINK

Specifier Test or Function Test Statistic
--freq Minor Allele Frequency MAF
--het Heterozygosity F Value
--hardy Hardy Weinberg Equilibrium HWE
--r2 Linkage Disequilibrium r2
--out snps Linkage Disequilibrium r2
--missing --mind 1 Missingness F_MISS
--recodeAD Recode NA
  • Frequency: test for minor allele frequency.
  • Heterozygosity: test for inbreeding coefficients.
  • Hardy-Weinberg Equilibrium: test for Hardy-Weinberg equilibrium.
  • Linkage Disequilibrium: test for linkage disequilibrium.
  • Missingness: test for missingness.
  • Recode: change data coding to additive and dominance components.

Using awk on Data

The awk Unix command can be used to parse out specified data. The following command would create the plinkawk.frq file from the generated plink.frq file, preserving the header, for values in column 5 that are less than 0.05.

awk 'NR == 1; NR > 1 {if ($5<0.05) print}' plink.frq > plinkawk.frq