Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
基本信息
- 批准号:8194802
- 负责人:
- 金额:$ 19.27万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-08-11 至 2014-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsArchivesAutistic DisorderBehavioralBenefits and RisksBiomedical ComputingBiomedical ResearchBiomedical TechnologyCaringChildhoodClinicalClinical TrialsComplexComputer softwareDataData AnalysesData SetDevelopmentDropoutDrug AddictionEffectivenessEnvironmental Risk FactorFaceGene ExpressionGeneric DrugsGeneticGuidelinesIndividualLibrariesLinear ModelsLinear RegressionsMarkov ChainsMeasuresMedical ResearchMedicineMethodsModelingOutcomePatientsPerformancePharmacotherapyPhenotypePreventiveProceduresProcessProteomicsResortSafetyScientistSimulateSolutionsStructureTestingTimeanalytical toolbasecomparative effectivenesscytokinedesigneffectiveness researchflexibilityhealth care deliverypatient orientedrapid growthresponsesmoking cessationsoftware developmenttherapeutic effectiveness
项目摘要
DESCRIPTION (provided by applicant): The applicant seeks to address the problem of missing values A major challenge for biomedical research comes from the problems of missing values, which may be caused by subjective (e.g., nonresponse and dropout) and technical reasons (e.g., censoring over/below quantization level). Generalized linear models (GLMs) and Generalized Linear Mixed Models (GLMMs) are popularly applied in biomedical data analysis where a fundamental task is to identify a subset of independent variables (e.g., genetic, proteomic, behavioral, or environmental factors) to interpret or predict a dependent variable (e.g., therapeutic effectiveness and safety). Given an incomplete data set, practitioners may needlessly resort to the strategy of case-deletion where individuals are excluded from consideration if they miss any of the variables targeted for analysis. This method would not only sacrifice useful information, but also give rise to biased estimates because it requires strong assumptions to accept the missingness mechanisms. A more satisfactory solution for missing data problems involves multiple imputation, where several imputations are created for the same set of missing values. Across multiply imputed data sets, however, traditional variable selection methods (based on significance tests or likelihood criteria) often result in models with different selected predictors, thus presenting a problem of combining the models to make final inferences. In this R01 proposal, we aim to develop alternative strategies of variable selection for GLMs with missing values by drawing on a Bayesian framework. One approach called "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. The second strategy - "simultaneously impute and select" (SIAS) - conducts Bayesian variable selection and missing data imputation simultaneously within one Markov Chain Monte Carlo (MCMC) process. ITS and SIAS offer two generic frameworks within which various Bayesian variable selection algorithms and missing data imputation algorithms can be implemented. The strategies will be extended to handle complex data sets such as those with multi-level design structures and/or large number of variables. The strategies will be developed, evaluated, and implemented into an R library for normal, binomial/multinomial, and Poisson regression models with mixed categorical and continuous explanatory variables. Simulated and practical data sets from studies on childhood autism and drug dependence will be used to address the effectiveness and flexibility of the proposed strategies.
PUBLIC HEALTH RELEVANCE: Missing data is the normal circumstance when developing large data sets. This issue comes to the forefront when using large data sets to develop personalized and individualized care. To avoid this loss of data and provide better predictions of risk and benefit, imputation-based Bayesian variable selection strategy provides a powerful analytical tool. The availability of our new method and software package will greatly enhance the capacity and quality of medical research and healthcare delivery
描述(由申请人提供):申请人寻求解决缺失值的问题生物医学研究的一个主要挑战来自缺失值的问题,这可能是由主观(例如,无响应和退出)和技术原因(例如,审查高于/低于量化水平)。广义线性模型 (GLM) 和广义线性混合模型 (GLMM) 广泛应用于生物医学数据分析,其基本任务是识别自变量子集(例如遗传、蛋白质组、行为或环境因素)来解释或预测因变量(例如治疗效果和安全性)。鉴于数据集不完整,从业者可能不必要地诉诸案例删除策略,如果个人错过了任何分析目标变量,则将其排除在考虑范围之外。这种方法不仅会牺牲有用的信息,而且还会产生有偏差的估计,因为它需要强有力的假设来接受缺失机制。对于缺失数据问题,更令人满意的解决方案涉及多重插补,即为同一组缺失值创建多个插补。然而,在多重插补数据集中,传统的变量选择方法(基于显着性检验或似然标准)通常会产生具有不同选定预测变量的模型,从而提出了组合模型以做出最终推论的问题。 在这个 R01 提案中,我们的目标是通过利用贝叶斯框架为具有缺失值的 GLM 开发变量选择的替代策略。一种称为“插补,然后选择”(ITS) 的方法涉及首先执行多重插补,然后将贝叶斯变量选择应用于多重插补数据集。第二种策略——“同时插补和选择”(SIAS)——在一个马尔可夫链蒙特卡罗(MCMC)过程中同时进行贝叶斯变量选择和缺失数据插补。 ITS 和 SIAS 提供了两个通用框架,可以在其中实现各种贝叶斯变量选择算法和缺失数据插补算法。这些策略将扩展到处理复杂的数据集,例如具有多级设计结构和/或大量变量的数据集。这些策略将被开发、评估并实施到 R 库中,用于具有混合分类和连续解释变量的正态、二项式/多项式和泊松回归模型。来自儿童自闭症和药物依赖研究的模拟和实际数据集将用于解决拟议策略的有效性和灵活性。
公共卫生相关性:在开发大型数据集时,丢失数据是正常情况。当使用大数据集来开发个性化和个性化护理时,这个问题就显得尤为重要。为了避免这种数据丢失并提供更好的风险和收益预测,基于插补的贝叶斯变量选择策略提供了强大的分析工具。我们的新方法和软件包的推出将大大提高医学研究和医疗保健服务的能力和质量
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
XIAOWEI YANG其他文献
XIAOWEI YANG的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('XIAOWEI YANG', 18)}}的其他基金
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8317303 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8471550 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8543193 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
iPhone-based Real-time Data Solution for Drug Abuse and Other Medical Research
基于 iPhone 的药物滥用和其他医学研究实时数据解决方案
- 批准号:
7672825 - 财政年份:2009
- 资助金额:
$ 19.27万 - 项目类别:
Transition Model for Incomplete Longitudinal Binary Data
不完整纵向二进制数据的转换模型
- 批准号:
6676189 - 财政年份:2003
- 资助金额:
$ 19.27万 - 项目类别:
DEVELOPMENT OF AN AUTOMATED NEURAL SPIKE DISCRIMINATOR
自动神经尖峰鉴别器的开发
- 批准号:
3504570 - 财政年份:1991
- 资助金额:
$ 19.27万 - 项目类别:
相似国自然基金
随机阻尼波动方程的高效保结构算法研究
- 批准号:12301518
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大规模黎曼流形稀疏优化算法及应用
- 批准号:12371306
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于任意精度计算架构的量子信息处理算法硬件加速技术研究
- 批准号:62304037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于物理信息神经网络的雷达回波资料反演蒸发波导算法研究
- 批准号:42305048
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Computer-Aided Triage of Body CT Scans with Deep Learning
利用深度学习对身体 CT 扫描进行计算机辅助分类
- 批准号:
10585553 - 财政年份:2023
- 资助金额:
$ 19.27万 - 项目类别:
A visualization interface for BRAIN single cell data, integrating transcriptomics, epigenomics and spatial assays
BRAIN 单细胞数据的可视化界面,集成转录组学、表观基因组学和空间分析
- 批准号:
10643313 - 财政年份:2023
- 资助金额:
$ 19.27万 - 项目类别:
Brain Digital Slide Archive: An Open Source Platform for data sharing and analysis of digital neuropathology
Brain Digital Slide Archive:数字神经病理学数据共享和分析的开源平台
- 批准号:
10735564 - 财政年份:2023
- 资助金额:
$ 19.27万 - 项目类别:
Point-of-care diagnostic test for T. cruzi (Chagas) infection
克氏锥虫(恰加斯)感染的即时诊断测试
- 批准号:
10603665 - 财政年份:2023
- 资助金额:
$ 19.27万 - 项目类别: