Some of the summary statistics that PLINK can generate (through different commands) include missing genotype rate (missingness), Hardy-Weinberg equilibrium, minor allele frequency, and linkage disequilibrium. There are many more things PLINK can do (including family-based association testing for disease traits), which are all further described on the PLINK website.
PLINK Job Script
The following script would run the PLINK commands found in the plink_frq
file
on a SLURM scheduler.
#!/bin/bash
#SBATCH -p public # partition aka allocation
#SBATCH --qos general # quality-of-service (priority)
module load plink/1.07
./plink_frq
If you get an error code, try source plink_frq
instead of ./plink_frq
.
The error dependent on where you’re sourcing the file from.
Tests
A sample PLINK information file, plink_frq
, specifying what PLINK needs to
run to test for minor allele frequency (MAF) is below.
# no web stops PLINK from updating before run;
# path needs to include folder and the name for the ped and map
plink --noweb --file /path/to/PED/and/MAP/files \
--nonfounders \ # all individuals included
--allow-no-sex \ # prevents setting phenotypes with "ambiguous" sex to missing
--freq # actual test
Where the actual test is will be changed for each different test.
If the path were ~/home/euid123/R_jobs/
and the .ped
and .map
were both
named example, then the line would be ~/home/euid123/R_jobs/example \
.
Table: Test specifics for PLINK
Specifier | Test or Function | Test Statistic |
---|---|---|
--freq |
Minor Allele Frequency | MAF |
--het |
Heterozygosity | F Value |
--hardy |
Hardy Weinberg Equilibrium | HWE |
--r2 |
Linkage Disequilibrium | r2 |
--out snps |
Linkage Disequilibrium | r2 |
--missing --mind 1 |
Missingness | F_MISS |
--recodeAD |
Recode | NA |
- Frequency: test for minor allele frequency.
- Heterozygosity: test for inbreeding coefficients.
- Hardy-Weinberg Equilibrium: test for Hardy-Weinberg equilibrium.
- Linkage Disequilibrium: test for linkage disequilibrium.
- Missingness: test for missingness.
- Recode: change data coding to additive and dominance components.
Using awk on Data
The awk
Unix command can be used to parse out
specified data.
The following command would create the plinkawk.frq
file from the generated
plink.frq
file, preserving the header, for values in column 5 that are less
than 0.05.
awk 'NR == 1; NR > 1 {if ($5<0.05) print}' plink.frq > plinkawk.frq