Methods for Epidemiology Studies

流行病学研究方法

基本信息

项目摘要

<b>Methods for Genetic Epidemiology</b><br>As more population-based studies suggest associations between genetic variants and disease risk, there is a need to improve the design of follow-up studies (stage II) in independent samples to confirm evidence of association observed at the initial stage (stage I). We proposed to use flexible designs developed for randomized clinical trials in the calculation of sample size for follow-up studies. We applied a bootstrap procedure to correct for regression to the mean, also called winners curse, resulting from choosing to follow up the markers with the strongest associations.<br><br>Standard regression models were convenient for assessing main effects and low-order interactions but not for exploring complex higher-order gene-gene interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. We proposed a new class of semi-parametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models.<br><br>We studied the properties of procedures for case-control genome-wide association studies (CCGWASs) that select the SNPs whose chi-square trend tests are largest (or whose corresponding p-values are smallest). We showed that for rare diseases association tests for SNPs are independent if the SNP genotypes are independent in the source population. This result allowed us to develop analytic and simulation techniques to study CCGWASs. These analyses showed that large samples are needed to have a high detection probability (the chance a true disease SNP appears in the top ranks of chi-square values).<br><br>Statistical power calculations inform the design and interpretation of genetic association studies, but few programs are tailored to case-control studies of single nucleotide polymorphisms (SNPs) in unrelated subjects. Algorithms and graphical user interfaces were developed to calculate sample size and minimum detectable risk for SNP or haplotype effects under dominant, co-dominant, and recessive models. The programs allowed adjustments for multiple comparisons due to linkage disequilibrium or multiple testing.<br><br><b>Survey Sampling Methods and Applications</b><br>We published methods for estimating the attributable number of deaths (AD) from all causes. Our approach involved first estimating population attributable risk (AR) adjusted for confounding covariates, then multiplying the AR by the number of deaths determined from vital mortality statistics that occurred in the population for a specific time period. Proportional hazard regression estimates of adjusted relative hazards obtained from mortality follow-up data from a cohort was combined with a joint distribution of risk factors to compute an adjusted AR.<br><br>We developed new statistical methods for inference from logistic regression analysis with clustered data where there are few positive outcomes in some of the covariate categories. The usual asymptotic Wald and score hypothesis tests for logistic regression coefficients can be slow to converge to nominal levels when appropriate cluster-level variance estimators are used. We presented a simulation-based method for testing logistic regression coefficients which compared favorably to generalized Wald and score tests and a bootstrap hypothesis test in terms of maintaining nominal levels. The proposed methods were also useful when testing goodness-of-fit of logistic regression models using deciles-of-risk tables.<br><br><b>Models for Relative Risks of Environmental Exposures</b><br>To study the joint effects of smoking duration and intensity, we developed a 3-parameter linear excess RR (ERR) model in total pack-years and cigarettes per day to compare total exposure delivered at low intensity for a long period of time with an equal total exposure delivered at high intensity for a short period of time using data from a large case-control study of lung cancer. The model suggested that below 1520 cigarettes per day there was a direct exposure rate (or exposure rate enhancement) effect, i.e., the ERR/pack-year for higher intensity (and shorter duration) smokers was greater than for lower-intensity (and longer duration) smokers. Above 20 cigarettes per day, there was an inverse-exposure-rate (or reduced potency) effect, i.e., the ERR/pack-year for higher intensity smokers was smaller than for lower-intensity smokers. We explored this modeling approach in a series of analyses.<br><br>Application of this model to data from various studies of cancer, including cancers of the lung, bladder, oral cavity, pancreas, and esophagus revealed consistent reduced potency effects across studies, which were statistically homogeneous, indicating that after accounting for total pack-years, intensity patterns were comparable across the diverse cancer sites.<br><br>An extension of the model for studying interactions and effect modification revealed that variations in smoking risk with <i>NAT2</i> status resulted from interactions with smoking intensity and not total pack-years of exposure. In addition, the relative increase in smoking risk in <i>NAT2</i> slow acetylators increased with smoking intensity.<br><br><b>Exposure Assessment, Errors in Exposure Measurements, and Missing Exposure Data</b><br>We published two expository papers discussing the practical impacts of confounding and exposure misclassification. In occupational epidemiology, these factors are routinely raised to argue that an observed result is either a false positive or a false negative finding. We noted that examples of substantial confounding were rare in occupational epidemiology. We also noted that false positive results due to misclassification was unlikely given the expected direction and magnitude of bias expected under non-differential measurement error. We suggested that all potential limitations are considered and that the likelihood of occurrence and the direction and magnitude of effects should be more carefully and realistically considered when making judgments about study design or data interpretation.<br><br>Epidemiologic data from regions of the world with very high arsenic concentrations in drinking water show a strong association between arsenic exposure and risk of several internal cancers, and the association can be considered causal. At lower levels of exposure, in the absence of unambiguous human data, extrapolation from the high exposure studies are used to estimate risk. Studies in lower expose populations have been limited by the challenge of estimating past exposures, and relatively small increases in risk. The effects on risk estimates of exposure misclassification and small study size under various scenarios were graphically illustrated
<b>遗传流行病学的方法</b> <br>随着越来越多的人群研究表明遗传变异与疾病风险之间的关联,需要改善独立样本中的后续研究设计(II阶段)以确认在初始阶段观察到的关联证据(I期)。我们建议在计算样本量进行后续研究中使用用于随机临床试验的灵活设计。我们应用了一个引导程序,以将回归纠正为均值(也称为获胜者的诅咒),这是由于选择跟进具有最强关联的标记而导致的。<br> <br>标准回归模型在评估主要效果和低阶相互作用方面很方便,但对于探索复杂的高级基因相互作用而不是低阶相互作用。基于树的方法是解开可能的相互作用的有吸引力的替代方法,但在建模添加剂主要效果方面很难。我们提出了一类新的半参数回归模型,称为部分线性树的回归(PLTR)模型,这些模型均表现出广义线性回归和树模型的优势。<br> <br> <br>我们研究了细胞对照基因组基因组基因组协会研究(CCGWASS)的属性(CCGWASS)的属性(CCGWASS)的特性。 我们表明,如果SNP基因型在源群中是独立的,那么对于罕见的疾病,SNP的关联测试是独立的。 该结果使我们能够开发分析和仿真技术来研究CCGWASS。这些分析表明,需要大量样本才能具有很高的检测概率(Chi-Square值的最高疾病SNP出现的机会)。<br> <br> <br>统计功率计算为遗传关联研究的设计和解释提供了信息,但很少有程序量身定制为单核苷酸多态性多态性(SNP)的病例对照研究(无核苷)。开发了算法和图形用户界面,以计算样本量和在主导,共同主导和隐性模型下的SNP或单倍型效应的最小可检测风险。由于连锁不平衡或多次测试,该程序允许对多次比较进行调整。<br> <br> <br> <b>调查抽样方法和应用方法</b> <br>我们发表了用于估算所有原因的属性死亡人数(AD)的方法。 我们的方法涉及首先估计针对混杂的协变量调整的人口风险(AR),然后将AR乘以从特定时间段中人口中发生的重要死亡死亡率统计数据确定的死亡人数。从同队的死亡率随访数据获得的调整后的相对危害的比例危险回归估计与危险因素的共同分布结合了计算调整后的AR。<br> <br> <br> <br>我们开发了新的统计方法,用于从逻辑回归分析中推断与集群数据的推理,在某些共变体类别中几乎没有积极的影响。 当使用适当的群集级方差估计值时,通常的渐近WALD和逻辑回归系数的得分假设检验可能会慢慢收敛到名义水平。 我们提出了一种基于仿真的方法,用于测试逻辑回归系数,该系数与广义WALD和得分测试以及在维持名义水平方面进行了比较。 当使用危险表的逻辑回归模型测试逻辑回归模型的合适性时,所提出的方法也很有用。<br> <br> <br> <b> <br> <b>环境暴露相对风险的模型</b> <br>研究吸烟持续时间和强度的关节效应,我们开发了每天的固定时间(eris err RR),以计算总包装量的总固定时间,并在总包装中置于较低的量表,并在总包装中置于较高的量表,并在总包装中进行了长时间的计算。使用大型病例对照研究中的数据,在短时间内以高强度的高强度进行总暴露。 该模型表明,低于每天1520支香烟的直接暴露率(或暴露率提高)效果,即,对于较高强度(且持续时间较短)吸烟者的ERR/PACK年度大于较低强度(持续时间较长)吸烟者。 每天高于20支香烟,存在一个反向暴露率(或降低)效应,即,较高强度吸烟者的ERR/PACK年比低强度吸烟者小。 We explored this modeling approach in a series of analyses.<br><br>Application of this model to data from various studies of cancer, including cancers of the lung, bladder, oral cavity, pancreas, and esophagus revealed consistent reduced potency effects across studies, which were statistically homogeneous, indicating that after accounting for total pack-years, intensity patterns were comparable across the diverse cancer sites.<br><br>An用于研究相互作用和效果修饰的模型的扩展表明,吸烟风险与<i> nat2 </i>状态的变化是由与吸烟强度的相互作用而不是总销售年度的相互作用。此外,随着吸烟强度的速度,乙酰基较慢的吸烟风险相对增加。<br> <br> <br> <b>暴露评估,暴露测量错误和缺失的暴露数据</b> <br>我们发表了两份说明性论文,讨论了混乱和暴露次数错误的影响。 在职业流行病学中,这些因素通常被提高,以指出观察到的结果是假阳性或假阴性发现。我们指出,在职业流行病学中,实质性混杂的例子很少见。我们还指出,鉴于在非分辨率测量误差下预期的偏差方向和偏差的预期方向和幅度,由于错误分类造成的假阳性结果不可能。我们建议考虑所有潜在的局限性,并且在做出有关研究设计或数据解释的判断时,应更加仔细地考虑发生的可能性和影响的方向和大小。<br> <br> <br> <br>来自世界各地的流行病学数据在饮用水中具有很高的砷浓度在饮用水中表现出很高的繁殖水分,这表明了雅氏驾驶范围内的caens caence和几个人的风险。 在较低的暴露水平下,在没有明确的人类数据的情况下,高暴露研究的外推被用于估计风险。 在较低暴露群体中的研究受到估计过去暴露的挑战的限制,风险的增加相对较小。图形上说明了对各种情况下暴露错误分类和小研究规模的风险估计的影响。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Nilanjan Chatterjee其他文献

Nilanjan Chatterjee的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Nilanjan Chatterjee', 18)}}的其他基金

Statistical Methods for Data Integration and Applications to Genome-wide Association Studies
数据集成的统计方法及其在全基因组关联研究中的应用
  • 批准号:
    10889298
  • 财政年份:
    2023
  • 资助金额:
    $ 159.94万
  • 项目类别:
Multifactoral breast cancer risk prediction accounting for ethnic and tumor diversity
考虑种族和肿瘤多样性的多因素乳腺癌风险预测
  • 批准号:
    10609504
  • 财政年份:
    2020
  • 资助金额:
    $ 159.94万
  • 项目类别:
Multifactoral breast cancer risk prediction accounting for ethnic and tumor diversity
考虑种族和肿瘤多样性的多因素乳腺癌风险预测
  • 批准号:
    10416066
  • 财政年份:
    2020
  • 资助金额:
    $ 159.94万
  • 项目类别:
Multifactoral breast cancer risk prediction accounting for ethnic and tumor diversity
考虑种族和肿瘤多样性的多因素乳腺癌风险预测
  • 批准号:
    10263893
  • 财政年份:
    2020
  • 资助金额:
    $ 159.94万
  • 项目类别:
Robust Methods for Polygenic Analysis to Inform Disease Etiology and Enhance Risk Prediction
多基因分析的稳健方法可告知疾病病因并增强风险预测
  • 批准号:
    9920753
  • 财政年份:
    2019
  • 资助金额:
    $ 159.94万
  • 项目类别:
Robust Methods for Polygenic Analysis to Inform Disease Etiology and Enhance Risk Prediction
多基因分析的稳健方法可告知疾病病因并增强风险预测
  • 批准号:
    10359748
  • 财政年份:
    2019
  • 资助金额:
    $ 159.94万
  • 项目类别:
Robust Methods for Polygenic Analysis to Inform Disease Etiology and Enhance Risk Prediction
多基因分析的稳健方法可告知疾病病因并增强风险预测
  • 批准号:
    10112944
  • 财政年份:
    2019
  • 资助金额:
    $ 159.94万
  • 项目类别:
Robust Methods for Polygenic Analysis to Inform Disease Etiology and Enhance Risk Prediction
多基因分析的稳健方法可告知疾病病因并增强风险预测
  • 批准号:
    10579942
  • 财政年份:
    2019
  • 资助金额:
    $ 159.94万
  • 项目类别:
Methods for Epidemiology Studies
流行病学研究方法
  • 批准号:
    8565443
  • 财政年份:
  • 资助金额:
    $ 159.94万
  • 项目类别:
Methods for Epidemiology Studies
流行病学研究方法
  • 批准号:
    9154202
  • 财政年份:
  • 资助金额:
    $ 159.94万
  • 项目类别:

相似国自然基金

分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
  • 批准号:
    12371308
  • 批准年份:
    2023
  • 资助金额:
    43.5 万元
  • 项目类别:
    面上项目
资源受限下集成学习算法设计与硬件实现研究
  • 批准号:
    62372198
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
基于物理信息神经网络的电磁场快速算法研究
  • 批准号:
    52377005
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
  • 批准号:
    12302257
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
面向高维不平衡数据的分类集成算法研究
  • 批准号:
    62306119
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Bayesian Statistical Learning for Robust and Generalizable Causal Inferences in Alzheimer Disease and Related Disorders Research
贝叶斯统计学习在阿尔茨海默病和相关疾病研究中进行稳健且可推广的因果推论
  • 批准号:
    10590913
  • 财政年份:
    2023
  • 资助金额:
    $ 159.94万
  • 项目类别:
Small molecules combination therapy using polypharmacology approach as a novel treatment paradigm for rare bone disease
使用多药理学方法的小分子联合疗法作为罕见骨病的新型治疗范例
  • 批准号:
    10759694
  • 财政年份:
    2023
  • 资助金额:
    $ 159.94万
  • 项目类别:
Semi-automated bladder cancer screening using machine learning: clinical validation and implementation.
使用机器学习的半自动膀胱癌筛查:临床验证和实施。
  • 批准号:
    10349701
  • 财政年份:
    2022
  • 资助金额:
    $ 159.94万
  • 项目类别:
Follow-up and Maintenance of the Newborn Epigenetics STudy (NEST) Cohort
新生儿表观遗传学研究 (NEST) 队列的随访和维护
  • 批准号:
    10443683
  • 财政年份:
    2018
  • 资助金额:
    $ 159.94万
  • 项目类别:
Follow-up and Maintenance of the Newborn Epigenetics STudy (NEST) Cohort
新生儿表观遗传学研究 (NEST) 队列的随访和维护
  • 批准号:
    10205067
  • 财政年份:
    2018
  • 资助金额:
    $ 159.94万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了