- 无标题文档
查看论文信息

中文题名:

 全基因组关联分析中荟萃分析的异质性和重叠数据问题的研究    

姓名:

 靳琴琴    

学号:

 1601310103    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 081001    

学科名称:

 工学 - 信息与通信工程 - 通信与信息系统    

学生类型:

 博士    

学位:

 工学博士    

学校:

 西安电子科技大学    

院系:

 通信工程学院    

专业:

 信息与通信工程    

研究方向:

 通信与信息系统    

第一导师姓名:

 史罡    

第一导师单位:

 西安电子科技大学    

完成日期:

 2020-03-28    

答辩日期:

 2020-05-23    

外文题名:

 Meta-Analysis of SNP-Environment Interaction with Heterogeneity and overlapping data in GWAS    

中文关键词:

 固定效应 ; 随机效应 ; 荟萃回归分析 ; 重叠数据 ; 异质性 ; SNP-环境交互作用 ; 检验统计力    

外文关键词:

 Fixed effect ; Random effect ; Meta-regression ; Overlapping data ; Heterogeneity ; SNP-environment interaction ; power    

中文摘要:

全基因组关联分析(Genome-wide association study,GWAS)是在人类全基因组范围内寻找与复杂性状或疾病相关联的遗传变异方法,这里的遗传变异主要是指单核苷酸多态性(Single nucleotide polymorphisms,SNP),占所有已知多态性的90%以上。荟萃分析被广泛应用于GWAS中,它综合多个研究的分析结果,在实现大的有效样本量的同时,提高发现新关联的概率。固定效应模型和随机效应模型是荟萃分析中常用的两种方法。固定效应模型假定各研究间的效应是一样的。在固定效应模型中,对SNP-环境交互作用的研究有SNP与SNP-环境交互作用的联合检验方法和荟萃回归两种分析方法。然而实际研究中,常见复杂疾病或表型通常由多个不同的遗传机制产生时,会具有遗传异质性。GWASs发现的变异在具有不同人口历史的人群中也具有不同的效应大小,甚至不同的关联方向。最近开展了许多大型跨种族人群的荟萃分析,其中通常包含遗传异质性。因此在荟萃分析时需要使用随机效应模型以便考虑遗传效应中的异质性。传统的随机效应模型只检验SNP的固定效应,将异质性作为随机效应并将其视为固定效应方差的一部分。最近的研究建议应同时检验SNP的固定效应和随机效应,该方法已被证明比传统的随机效应方法具有更高的检验统计力。然而,该方法只能对SNP的遗传主效应进行统计检验,对于SNP-环境交互作用检验目前仍缺乏相应的模型和方法。另外,在GWAS研究的实践中,为了节约研究成本或者研究人员无意间会使用一些重叠数据,忽略这些重叠数据会导致假阳性结果。最近的研究提出在检验SNP的遗传主效应时考虑重叠数据问题的荟萃分析方法。同样,目前仍然没有相应的荟萃分析方法用来在存在重叠数据的情况下检验SNP-环境交互作用。本文研究了GWAS中SNP-环境交互作用检验中的异质性和重叠数据问题,主要工作包括以下内容:

首先,提出一项在异质性存在情况下检验SNP-环境交互作用的荟萃分析方法。将SNP与SNP-环境交互作用的异质性作为随机效应引入到荟萃回归分析模型中。提出一种新的SNP-环境交互作用检验方法,称其为随机效应荟萃回归分析方法,用来同时检验SNP-环境交互作用的固定效应和随机效应。基于该模型,还进一步提出一个新的统计检验用来同时检验SNP与SNP-环境交互作用的固定效应及其随机效应。对于提出的方法进行仿真实验,研究它们的原分布和检验统计力。结果表明,在异质性效应较大的情况下,新方法比传统的随机效应模型和常规的荟萃回归分析方法具有更高的统计检验力。这个方法是适用于不同场景的简单有效的方法。此外,当已知基因-环境交互作用存在时,它还可以被进一步推广用来对不同的交互作用方式进行后验估计。

其次,提出另外一种随机效应模型方法,用来在异质性存在情况下进行SNP与SNP-环境交互作用的联合检验,并同时给出SNP-环境交互作用检验方法。该方法基于似然比函数检验,不需要分层水平的统计量数据。仿真实验表明这个检验方法与随机效应荟萃回归分析方法具有相似的检验统计力。该方法可以用于当研究间没有分层水平的统计数据时交互作用的统计检验。由于这个方法需要预先设定一个SNP-环境交互作用的函数形式,如果需要检验一个新的交互作用假设,该方法需要从新以新的模型在各研究数据中进行全基因组分析。

接着,提出一个重叠数据荟萃回归方法用来解决在研究间存在重叠数据情况下SNP-环境交互作用的检验问题。基于Lin和Han的相关研究,引入研究的层间相关矩阵,将常规荟萃回归模型的方差和协方差矩阵进行推广。基于这一模型,给出SNP-环境交互作用以及SNP与SNP-环境交互作用联合效应的统计检验。通过仿真实验,检验该方法的原分布和在不同数据重叠率下的检验统计力。实验结果证明该方法是有效的,同时达到了与在荟萃分析之前预先去除重叠样本的方法即数据拆除法相媲美的检验统计力。另一方面,实验结果表明如果忽略重叠数据会导致原分布对应的点向上偏移。因此,该重叠数据荟萃回归方法有效地处理了数据重叠问题。

最后,在随机效应荟萃回归分析方法和重叠数据荟萃回归方法的基础上,提出一种同时考虑异质性和数据重叠问题的随机效应模型下的有重叠数据的荟萃回归分析方法。对检验SNP-环境交互作用和SNP与SNP-环境交互作用联合效应的似然比统计量进行仿真验证。使用检验统计力评估我们的方法与固定效应模型下的有重叠数据的荟萃分析方法相比的优越性。仿真结果证明该方法在数据重叠、异质性存在的情况下,比固定效应模型下的有重叠数据的荟萃分析方法具有更高的检验统计力。

外文摘要:

Genome-wide association study is a genome-wide method to identify genetic variations associated with complex traits and diseases. Genetic variations here mainly refer to single nucleotide polymorphisms (SNPs), it accounts for more than 90% of known polymorphisms. Meta-analysis is widely used method in genome-wide association studies. They synthesize the analysis results of multiple studies to achieve a large effective sample size and improve the probability of discovering new associations. Fixed effect model methods and random effect model methods are two commonly used methods in meta-analysis. The fixed effects model methods assume that the effects among studies are same. Under the fixed effects model, there are joint tests of SNP and SNP-environment interaction effects method and meta-regression method for SNP-environment interaction. In practice, genetic heterogeneity occurs when the same genetic disease or phenotype or similar genetic diseases or phenotypes are produced by different genetic mechanisms, which requires heterogeneity to be considered in the meta-analysis. The variants identified in GWASs have been shown to have different effect sizes and even different directions of associations in populations with different demographic histories. Recently, many large trans-ancestry meta-analyses have been performed, which routinely include genetic heterogeneity. Therefore, genetic heterogeneity and corresponding random effect models need to be considered in the meta-analysis. The classical random effects approach treats genetic heterogeneity as a random effect and as a part of the variance of fixed effect. Recent work suggests performing hypothesis testing under the null hypothesis that neither fixed nor random effects exist for a variant. This method has been shown to perform better than classical random effects method. However, this method only focuses on SNP main effect model and there is no research on the SNP-environment interaction model at present. In practice, overlapping data between studies may occur when using meta-analysis. This may be caused inadvertently for saving research cost or intentionally by researchers. Spurious association may be achieved if overlapping data exist and are ignored in the meta-analysis Recent studies have proposed methods to handle the issue of overlapping data when testing the genetic main effect of SNP. However, there is still no meta-analysis method for testing SNP-environment interaction when overlapping data exist. Based on these fixed effect model methods and random effects model methods, works was done as follows:

 

first, we proposed a meta-analysis of testing SNP-environment interaction in the presence of genetic heterogeneity. We introduced the random effects of the SNP and SNP-environment interaction under test into a meta-regression model to account for heterogeneity. A test for the SNP-environment interaction was formulated to test for fixed and random effects of the interaction simultaneously. Similarly, a test for total genetic effects was formulated to test for fixed effects of the SNP and the SNP-environment interaction together with their random effects. We performed simulations to study the null distribution and statistical power of the proposed tests. We show that the new methods have higher power than classical random effects and fixed effects meta-regression methods when heterogeneity effects are large. This is a preferred method because it is the simple and effective method applicable to different scenarios. In addition, when the effect of SNP-environment interaction is known to exist, it can be generalized to use more advanced data-driven methods such as different forms of interaction to estimate interactions after the fact.

 

Then, we introduce a test method of joint effect of SNP and SNP-environment interaction method under the random effect model. We also introduce an interaction test method of SNP-environmental interaction. We evaluate the null distribution of these tests and the power through likelihood ratio functions. This method was verified by simulation to give a similar power with random effect model meta-regression without the need of group level statistical data. When there are no group level statistical data, this method is a preferred method.  However, this method needs to assume the form of an interaction in advance. To test a new hypothesis, it needs to reformulate the model and re-estimate the effect.

 

Next, inspired by the methods of testing the main effect of gene with overlapping data, we proposed an overlapping meta-regulation method to address the issue in testing the gene-environment interaction. We generalized the variance and covariance matrices of the regular meta-regression model by employing Lin’s and Han’s correlation structures to incorporate the correlations introduced by the overlapping data. Based on our proposed models, we further provided statistical significance tests of the SNP-environment interaction as well as joint effects of the SNP main effect and the interaction. Through simulations, we examined null distributions and statistical powers of our proposed methods at different levels of data overlap among studies. We demonstrated that our method is suible and simultaneously achieved statistical power comparable with the method that removed overlapping samples a priori before the meta-analysis, i.e., the splitting method. On the other hand, ignoring overlapping data will lead the upward of the points of null distribution. Our proposed method for testing SNP-environment interaction handles the issue of overlapping data effectively and statistically efficiently.

 

Finally, based on the random effect meta-regression method and overlapping meta-regression method, we propose a random effect overlapping meta-regression method that simultaneously considers heterogeneity and overlapping data. Tests for the likelihood ratio statistic of the SNP-environment interaction effect and SNP and SNP-environment joint effects are given. In our simulations, null distributions were proposed to verify the suitability of our method, and powers were proposed to evaluate the superiority of our method. Based on the simulation, we concluded that this method gave higher power than fixed effect model overlapping meta-regression method when overlapping data existed and heterogeneity was high.

中图分类号:

 Q34    

馆藏号:

 46391    

开放日期:

 2020-12-24    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式