Gene-environment interaction is when different genotypes have different
responses to variation in the environment.
To test for this, the data set needs genotypic and phenotypic information to be
combined, which can be done using R’s cbind
function.
If you’re the type of person that like combining data in Excel, however,
then you can get started once your input file looks clean.
Logistic Regression GxE
The only difference when performing logistic regression for gene by environment with the original is the addition of the terms of interest multiplied by the environment variable that you’re testing.
The example tests the environmental VariableD
on variables A through C
(sometimes shortened like VA
).
> data = read.table("/path/to/text/file/with/data", header=TRUE, na.strings = "NA")
> logisticalGE <- glm(data$STATUS ~ data$VariableA + data$VariableB + data$VariableC + data$VariableD + data$VariableA*data$VariableD + data$VariableB*data$VariableD + data$VariableC*data$VariableD, family = binomial)
> summary(logisticalGE)
Call:
glm(formula = data$STATUS ~ data$VariableA + data$VariableB + data$VariableC +
data$VariableD + data$VariableA * data$VariableD + data$VariableB *
data$VariableD + data$VariableC * data$VariableD, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7205 -0.8537 -0.5189 1.0470 2.5061
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.5268619 1.0737534 -4.216 2.49e-05 ***
data$VariableA 0.1690201 0.0530564 3.186 0.001444 **
data$VariableB 0.5921538 0.5478683 1.081 0.279772
data$VariableC 0.5843189 0.5787536 1.010 0.312679
data$VariableD 0.0136132 0.0038047 3.578 0.000346 ***
data$VA:data$VD -0.0007583 0.0001849 -4.101 4.12e-05 ***
data$VB:data$VD -0.0012611 0.0018368 -0.687 0.492361
data$VC:data$VD -0.0003966 0.0018996 -0.209 0.834636
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 366.95 on 293 degrees of freedom
Residual deviance: 311.15 on 286 degrees of freedom
(1 observation deleted due to missingness)
AIC: 327.15
Number of Fisher Scoring iterations: 5
The level of significance for the p-value is given by the number of asterisks. Three asterisks means that the p-value for that result is below 0.001, but larger than 0. Significant results allow the null hypothesis to be rejected, and the significance code specifies whether this is done at the 90% (.), 95% (*) , 99% (**), or 99.9% (***) level.
Conditional Logistic Regression GxE
Again, like logistic regression’s gene by environment interaction, the only change for conditional logistic regression is the addition of the terms of interest multiplied by the environment variable being tested.
The example tests the environmental VariableD
on variables A through C (sometimes shortened like VA
).
> survival.clogitGE <- clogit(formula = STATUS ~ VariableA+ VariableB + VariableC + totalanth + VariableA*VariableD + VariableB*VariableD + VariableC*VariableD + strata(matched_sets),data)
> summary(survival.clogitGE)
Call:
coxph(formula = Surv(rep(1, 295L), STATUS) ~ VariableA + VariableB + VariableC +
VariableD + VariableA * VariableD + VariableB * VariableD + VariableC *
VariableD + strata(matched_sets), data = data, method = "exact")
n= 294, number of events= 93
(1 observation deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
VariableA 1.776e-01 1.194e+00 5.938e-02 2.991 0.002781 **
VariableB 2.608e-01 1.298e+00 7.080e-01 0.368 0.712598
VariableC 1.764e-01 1.193e+00 5.686e-01 0.310 0.756433
VariableD 1.230e-02 1.012e+00 4.091e-03 3.005 0.002652 **
VA:VD -7.187e-04 9.993e-01 1.988e-04 -3.616 0.000299 ***
VB:VD -6.979e-05 9.999e-01 2.076e-03 -0.034 0.973181
VC:VD -7.048e-04 9.993e-01 1.976e-03 -0.357 0.721264
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
exp(coef) exp(-coef) lower .95 upper .95
VariableA 1.1943 0.8373 1.0631 1.3418
VariableB 1.2980 0.7704 0.3240 5.1991
VariableC 1.1929 0.8383 0.3914 3.6356
VariableD 1.0124 0.9878 1.0043 1.0205
VA:VD 0.9993 1.0007 0.9989 0.9997
VB:VD 0.9999 1.0001 0.9959 1.0040
VC:VD 0.9993 1.0007 0.9954 1.0032
Rsquare= 0.133 (max possible= 0.494 )
Likelihood ratio test= 41.97 on 7 df, p=5e-07
Wald test = 27.16 on 7 df, p=3e-04
Score (logrank) test = 36.31 on 7 df, p=6e-06
The level of significance for the p-value is given by the number of asterisks. Three asterisks means that the p-value for that result is below 0.001, but larger than 0. Significant results allow the null hypothesis to be rejected, and the significance code specifies whether this is done at the 90% (.), 95% (*) , 99% (**), or 99.9% (***) level.