You can’t ignore gene-environment correlations when looking for gene-environment interactions

Humans love interactions, they tell interesting stories (however, no study has investigated this bias, AFAIK). However, statistics and nature hate interactions. Interactions in general have low prior, and because people fail to realize this properly, reports of interactions generally fail to replicate. This is also true for gene-environment interactions (GxE), the love-child of any would be behavioral genetics critic (strong interactions make standard ANOVA of family data very tricky and thus lets critics retreat into ‘it’s too complicated, we don’t know anything’ territory).

Some large scale failures of replication

  • Duncan, L. E., & Keller, M. C. (2011). A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. American Journal of Psychiatry, 168(10), 1041-1049.

Objective

Gene-by-environment interaction (G×E) studies in psychiatry have typically been conducted using a candidate G×E (cG×E) approach, analogous to the candidate gene association approach used to test genetic main effects. Such cG×E research has received widespread attention and acclaim, yet cG×E findings remain controversial. The authors examined whether the many positive cG×E findings reported in the psychiatric literature were robust or if, in aggregate, cG×E findings were consistent with the existence of publication bias, low statistical power, and a high false discovery rate.
Method

The authors conducted analyses on data extracted from all published studies (103 studies) from the first decade (2000–2009) of cG×E research in psychiatry.
Results

Ninety-six percent of novel cG×E studies were significant compared with 27% of replication attempts. These findings are consistent with the existence of publication bias among novel cG×E studies, making cG×E hypotheses appear more robust than they actually are. There also appears to be publication bias among replication attempts because positive replication attempts had smaller average sample sizes than negative ones. Power calculations using observed sample sizes suggest that cG×E studies are underpowered. Low power along with the likely low prior probability of a given cG×E hypothesis being true suggests that most or even all positive cG×E findings represent type I errors.
Conclusion

In this new era of big data and small effects, a recalibration of views about “groundbreaking” findings is necessary. Well-powered direct replications deserve more attention than novel cG×E findings and indirect replications.

Authors also note the evidence for publication bias in replications, by showing that replications that are positive have smaller sample sizes.

I know, I know. Authors state this one as a success, but it really isn’t.

The hypothesis that the S allele of the 5-HTTLPR serotonin transporter promoter region is associated with increased risk of depression, but only in individuals exposed to stressful situations, has generated much interest, research and controversy since first proposed in 2003. Multiple meta-analyses combining results from heterogeneous analyses have not settled the issue. To determine the magnitude of the interaction and the conditions under which it might be observed, we performed new analyses on 31 data sets containing 38 802 European ancestry subjects genotyped for 5-HTTLPR and assessed for depression and childhood maltreatment or other stressful life events, and meta-analysed the results. Analyses targeted two stressors (narrow, broad) and two depression outcomes (current, lifetime). All groups that published on this topic prior to the initiation of our study and met the assessment and sample size criteria were invited to participate. Additional groups, identified by consortium members or self-identified in response to our protocol (published prior to the start of analysis) with qualifying unpublished data, were also invited to participate. A uniform data analysis script implementing the protocol was executed by each of the consortium members. Our findings do not support the interaction hypothesis. We found no subgroups or variable definitions for which an interaction between stress and 5-HTTLPR genotype was statistically significant. In contrast, our findings for the main effects of life stressors (strong risk factor) and 5-HTTLPR genotype (no impact on risk) are strikingly consistent across our contributing studies, the original study reporting the interaction and subsequent meta-analyses. Our conclusion is that if an interaction exists in which the S allele of 5-HTTLPR increases risk of depression only in stressed individuals, then it is not broadly generalisable, but must be of modest effect size and only observable in limited situations.

You can’t ignore gene-environment correlations

But aside from the usual false positives due to fishing expeditions, there’s the deeper problem that ignoring gene-environment correlations/dependencies leads many methods to falsely detect gene-environment interactions. A number of methods papers have made this point, but it is still not widely acknowledged. Since gene-environment correlations are ubiquitous, this method problem is also ubiquitous.

Candidate gene × environment (G × E) interaction research tests the hypothesis that the effects of some environmental variable (e.g., childhood maltreatment) on some outcome measure (e.g., depression) depend on a particular genetic polymorphism. Because this research is inherently nonexperimental, investigators have been rightly concerned that detected interactions could be driven by confounders (e.g., ethnicity, gender, age, socioeconomic status) rather than by the specified genetic or environmental variables per se. In an attempt to eliminate such alternative explanations for detected G × E interactions, investigators routinely enter the potential confounders as covariates in general linear models. However, this practice does not control for the effects these variables might have on the G × E interaction. Rather, to properly control for confounders, researchers need to enter the covariate × environment and the covariate × gene interaction terms in the same model that tests the G × E term. In this manuscript, I demonstrate this point analytically and show that the practice of improperly controlling for covariates is the norm in the G × E interaction literature to date. Thus, many alternative explanations for G × E findings that investigators had thought were eliminated have not been.

Gene-environment interactions have the potential to shed light on biological processes leading to disease and to improve the accuracy of epidemiological risk models. However, relatively few such interactions have yet been confirmed. In part this is because genetic markers such as tag SNPs are usually studied, rather than the causal variants themselves. Previous work has shown that this leads to substantial loss of power and increased sample size when gene and environment are independent. However, dependence between gene and environment can arise in several ways including mediation, pleiotropy, and confounding, and several examples of gene-environment interaction under gene-environment dependence have recently been published. Here we show that under gene-environment dependence, a statistical interaction can be present between a marker and environment even if there is no interaction between the causal variant and the environment. We give simple conditions under which there is no marker-environment interaction and note that they do not hold in general when there is gene-environment dependence. Furthermore, the gene-environment dependence applies to the causal variant and cannot be assessed from marker data. Gene-gene interactions are susceptible to the same problem if two causal variants are in linkage disequilibrium. In addition to existing concerns about mechanistic interpretations, we suggest further caution in reporting interactions for genetic markers.

Studying how genetic predispositions come together with environmental factors to contribute to complex behavioral outcomes has great potential for advancing our understanding of the development of psychopathology. It represents a clear theoretical advance over studying these factors in isolation. However, research at the intersection of multiple fields creates many challenges. We review several reasons why the rapidly expanding candidate gene-environment interaction (cGxE) literature should be considered with a degree of caution. We discuss lessons learned about candidate gene main effects from the evolving genetics literature and how these inform the study of cGxE. We review the importance of the measurement of the gene and environment of interest in cGxE studies. We discuss statistical concerns with modeling cGxE that are frequently overlooked. And we review other challenges that have likely contributed to the cGxE literature being difficult to interpret, including low power and publication bias. Many of these issues are similar to other concerns about research integrity (e.g., high false positive rates) that have received increasing attention in the social sciences. We provide recommendations for rigorous research practices for cGxE studies that we believe will advance its potential to contribute more robustly to the understanding of complex behavioral phenotypes.

We review gene × environment interaction (G×E) research in behavioral and psychiatric genetics. Two approaches to G×E are contrasted: a latent-variable approach that seeks to determine whether the heritability of a behavioral outcome varies by environmental exposure, and a candidate-gene × environment approach that seeks to determine whether genotypes are differentially sensitive to environmental conditions. Three major challenges to current G×E research are identified: (1) most published G×E findings are based on small samples and thus a high proportion are likely to be false-positive reports; (2) imprecision in the assessment of the phenotype, environment, and the genotype can significantly attenuate the power of a G×E study; and (3) a G×E is not an inherent property of the organism but rather a feature of a statistical model and so its identification depends on the structure of that model. The promise of genomic medicine is that interventions can be tailored to individual treatments, a form of G×E. Nonetheless, there is currently limited evidence of gene × intervention interactions in behavioral and psychiatric genetics. Future gene × intervention research will benefit from what we have learned from earlier G×E research and especially the need for large samples and the standardization of assessments to enable pooling of data across multiple studies.

Leave a Reply