Design and Analysis for Cancer Epidemiology Studies
癌症流行病学研究的设计和分析
基本信息
- 批准号:7059077
- 负责人:
- 金额:$ 7.43万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2005
- 资助国家:美国
- 起止时间:2005-09-30 至 2007-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
DESCRIPTION (provided by applicant):
The overall goal of this research is to develop novel statistical methods for addressing the difficult issue of multiplicity in current cancer etiology. To identify determinants of cancer and quantify their role, cancer etiology studies are intrinsically multi-factorial because of the multi-step nature of carcinogenesis and multi-extrinsic factors that lead normal cells to malignant ones. Multiplicity inflates false positive rate. In the simplest example of searching for a cutpoint of one quantitative biomarker for disease status, the common practice of examining different cutpoints and pick the one with the smallest p-value results in highly inflated false positive rate. Even in largest studies, statistical power for testing interactions quickly diminishes, sample sizes rapidly become inadequate with stratification and risk estimates become unstable. Because there are so many risk factors, model overfitting is a common problem and the predictive performance of the statistical model is poor. It is thus not surprising that even main effects (e.g., candidate gene associations) have proven notoriously difficult to replicate and reported interactions even harder. The multiplicity issue is acute today as more biomarkers of risk exposures and even the entire pathways comprising easily dozens of genes and their environmental substrates become available. An effective means to reduce overfitting and prediction error is to constrain model parameters as in least absolute shrinkage and selection operator (lasso) to eliminate the large number of irrelevant variables (e.g., genes). Finding MLE in such regression models with large number of variables is challenging. Since some measures of exposure may not be indicative of cancer and these irrelevant variables reduce the accuracy of the regression model, selecting the most relevant variables into the model would be a significant step. However, classic methods for model/variable selection have not had much success in biomedical application because they too aggressively eliminate significant factors predictor and are numerically unstable due to collinearity. This pilot project application focuses on the commonly used logistic regression model in cancer etiology studies. Built upon the novel accelerated expectation-maximization (EM) algorithm we developed for variable selection in linear models, we propose to develop fast variable selection procedures for logistic regression model that reduces overfitting and has improved predictive property; and to develop computer programs, conduct simulation studies to assess the performance of the method/algorithm and to analyze the esophageal data from two currently NCI funded studies. Upon completion of the proposed research, the methods/algorithms developed can be used to analyze cancer epidemiology data more effectively and efficiently. It also provides a basis for further developments of the approach into potentially an RO1 application. The future study can includes extensions to multinomial (i.e., multi-class) logistic regression models for cancer outcomes, the Cox regression model for time-to-event data such as time to advanced cancer analyzing data in cancer etiology and the Bayesian hierarchical modeling and model selection that incorporate prior biological knowledge about pathways will enhance the ability to detect real causal effects.
描述(由申请人提供):
这项研究的总体目标是开发新的统计方法来解决当前癌症病因学中的多样性难题。为了确定癌症的决定因素并量化其作用,癌症病因学研究本质上是多因素的,因为癌发生的多步骤性质以及导致正常细胞恶性细胞的多种外在因素。多重性会增加误报率。在寻找一种疾病状态定量生物标志物切点的最简单示例中,检查不同切点并选择 p 值最小的切点的常见做法会导致假阳性率高度夸大。即使在规模最大的研究中,测试相互作用的统计功效也会迅速减弱,样本量随着分层而迅速变得不足,风险估计也变得不稳定。由于风险因素较多,模型过拟合是普遍存在的问题,统计模型的预测性能较差。因此,毫不奇怪的是,即使是主要效应(例如候选基因关联)也被证明很难复制,并且报告相互作用更加困难。如今,随着越来越多的风险暴露生物标志物,甚至包括数十个基因及其环境底物的整个途径变得可用,多样性问题变得尖锐。减少过度拟合和预测误差的有效手段是通过最小绝对收缩和选择算子(套索)来约束模型参数,以消除大量不相关变量(例如基因)。在此类具有大量变量的回归模型中寻找 MLE 具有挑战性。由于某些暴露测量可能并不表明癌症,并且这些不相关的变量会降低回归模型的准确性,因此选择最相关的变量到模型中将是重要的一步。然而,模型/变量选择的经典方法在生物医学应用中并没有取得太大成功,因为它们过于激进地消除了重要的预测因子,并且由于共线性而在数值上不稳定。该试点项目应用重点关注癌症病因学研究中常用的逻辑回归模型。基于我们为线性模型中的变量选择开发的新型加速期望最大化(EM)算法,我们建议为逻辑回归模型开发快速变量选择程序,以减少过度拟合并提高预测性能;开发计算机程序、进行模拟研究以评估方法/算法的性能并分析目前 NCI 资助的两项研究的食管数据。完成拟议的研究后,所开发的方法/算法可用于更有效和高效地分析癌症流行病学数据。它还为将该方法进一步开发为潜在的 RO1 应用提供了基础。未来的研究可以包括对癌症结果的多项(即多类)逻辑回归模型的扩展、事件发生时间数据的 Cox 回归模型(例如癌症病因学中晚期癌症分析数据的时间)以及贝叶斯分层模型和结合先前有关通路的生物学知识的模型选择将增强检测真正因果效应的能力。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
MING Tony TAN其他文献
MING Tony TAN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('MING Tony TAN', 18)}}的其他基金
Robust Causal Comparisons of Nonrandomized Oncology Studies
非随机肿瘤学研究的稳健因果比较
- 批准号:
10434299 - 财政年份:2022
- 资助金额:
$ 7.43万 - 项目类别:
Robust Causal Comparisons of Nonrandomized Oncology Studies
非随机肿瘤学研究的稳健因果比较
- 批准号:
10614590 - 财政年份:2022
- 资助金额:
$ 7.43万 - 项目类别:
Design and Analysis for Cancer Epidemiology Studies
癌症流行病学研究的设计和分析
- 批准号:
7127228 - 财政年份:2005
- 资助金额:
$ 7.43万 - 项目类别:
相似国自然基金
超灵敏低频测序技术应用于癌症早筛及复发风险评估
- 批准号:
- 批准年份:2022
- 资助金额:52 万元
- 项目类别:面上项目
基于哈佛癌症指数构建老年骨科大手术患者静脉血栓栓塞症风险预警系统及干预策略研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
癌症生存者心血管健康关键风险因素多维识别、智能预警与防控管理
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于遗传和环境多维度构建和验证癌症患者导管相关性血栓风险预测模型的研究
- 批准号:72174210
- 批准年份:2021
- 资助金额:48 万元
- 项目类别:面上项目
癌症生存者精神健康多维风险识别与演化模型构建、智能管理及效果评价
- 批准号:72174144
- 批准年份:2021
- 资助金额:48 万元
- 项目类别:面上项目
相似海外基金
METEOR-Comprehensive Radiobiology Assessment TRial (METEOR-CRATR)
METEOR-综合放射生物学评估试验 (METEOR-CRATR)
- 批准号:
10715021 - 财政年份:2023
- 资助金额:
$ 7.43万 - 项目类别:
Label-Free Optical Redox Imaging for Pretreatment Prognosis of Early-Stage Triple Negative Breast Cancer
无标记光学氧化还原成像用于早期三阴性乳腺癌的预处理预后
- 批准号:
10803898 - 财政年份:2023
- 资助金额:
$ 7.43万 - 项目类别:
Immunoepigenetic targeting of MHC regulators in FAP
FAP 中 MHC 调节因子的免疫表观遗传学靶向
- 批准号:
10677375 - 财政年份:2023
- 资助金额:
$ 7.43万 - 项目类别: