CAREER: Scalable algorithms for regularized and non-linear genetic models of gene expression

职业:基因表达的正则化和非线性遗传模型的可扩展算法

基本信息

项目摘要

DNA mutations have a profound effect on how genes work, but it’s still not well understood which mutations affect which genes. Currently, our knowledge is limited due to challenges in analyzing genomics data, such as bias arising from an overrepresentation of European study participants and simplistic statistical models that do not sufficiently capture the data. This project overcomes these challenges across three main scientific goals, in which innovative statistical models map DNA mutations to their target genes, and two educational goals, in which scientific training and diversity are simultaneously cultivated. First, the investigators will improve the fidelity of mapping mutations to target genes for groups of individuals that are not well-studied, such as minority populations. Second, the investigators will develop a new method to connect mutations to genes by considering how genes interact with each other in genome-wide networks, suggesting functional effects for many uncharacterized mutations. Third, the investigators will characterize the specific cells in which mutations exert their effects using scalable models that reflect the natural distribution of data from single cell genomic assays. This research advances the fields of bioinformatics and human genetics by introducing new, robust statistical models that link mutations to their target genes. This project also enhances equity and diversity in biomedical discoveries, while simultaneously enhancing diversity within research environments. Toward the latter, the investigators initiate a multi-week on-campus research program for high school students from under-resourced communities, as well as genetics training courses for undergraduate and graduate students, supplying quantitative interdisciplinary skills coveted by industry and academia alike. This award will generate extensive datasets, open-source statistical models and genomics tools, high-impact publications, and course materials, thereby engaging and fueling the scientific community to partake and propel related research. This project focuses on developing new genetic models to understand how specific genetic variations influence gene expression. These models overcome current limitations in characterizing the function of genetic variation, which often has the subsequent goal of implicating target genes in the regulation of human phenotypes such as height and cancer risk. Challenges of existing algorithms include statistical issues due to finite sample sizes (especially for understudied minority populations), multiple hypothesis burdens restricting the knowledge gained from genome-wide analysis, and model misspecification especially for new datatypes of growing popularity, such as single cell genomics. The investigators address these challenges across three main objectives. First, the investigators link genetic variation to changes in gene expression in understudied minority populations, by jointly modeling genetic associations across globally diverse datasets. Second, the investigators develop a comprehensive approach to map genome-wide genetic variants to changes in gene expression using a priori knowledge of gene regulatory networks and advanced machine learning algorithms to reduce the burden of multiple testing. Third, the investigators design a new statistical model to characterize the cell-type-specificity of gene expression regulation at high resolution; this model leverages the natural distribution of single cell data, resolving model misspecification of state-of-the-art methods and reduces measurement noise by modeling millions of single cell measurements across donors. This award supports the generation of open-source genomics software and data repositories characterizing the function of genetic variants, while also creating educational and training opportunities for under-resourced high school students and motivated undergraduate and graduate students. The symbiotic research and educational intertwine in a relationship that is expected to enhance both the diversity in research environments, as well as the diversity in research cohorts.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
DNA突变对基因的工作方式具有深远的影响,但仍然不太了解哪些突变会影响哪些基因。目前,由于分析基因组学数据的挑战,我们的知识受到限制,例如欧洲研究参与者的过度代表以及无法充分捕获数据的简单统计模型引起的偏见。该项目克服了三个主要科学目标的这些挑战,在该目标中,创新的统计模型将DNA突变映射到其目标基因,以及两个教育目标,其中简单地培养了科学培训和多样性。首先,研究人员将改善将突变映射到目标基因的忠诚度,以构成未经良好研究的个体(例如少数人群)的基因。其次,研究人员将通过考虑基因在全基因组网络中相互相互作用的方式来开发一种将突变与基因联系起来的新方法,这表明许多未表征的突变的功能效应。第三,研究人员将使用可扩展模型来表征突变执行其效果的特定细胞,这些模型反映了单细胞基因组测定的数据自然分布。这项研究通过引入新的,可靠的统计模型将突变与其靶基因联系起来,从而推动了生物信息学和人类遗传学领域的领域。该项目还提高了生物医学发现的公平和多样性,同时增强了研究环境中的多样性。对于后者,调查人员为来自资源不足社区的高中学生以及针对本科和研究生的遗传学培训课程启动了一项为期多周的校园研究计划,并为工业和学术界都垂涎的跨学科技能。该奖项将生成广泛的数据集,开源统计模型和基因组学工具,高影响力的出版物和课程材料,从而吸引并加剧了科学界参与和推进相关的研究。该项目着重于开发新的遗传模型,以了解特定的遗传变异如何影响基因表达。这些模型克服了当前在表征遗传变异功能方面的局限性,该遗传变异的功能通常具有随后在人类表型(例如身高和癌症风险)调节的隐式靶基因的目标。现有算法的挑战包括由于有限的样本量(尤其是了解少数群体),多个假设伯恩斯限制了从全基因组分析中获得的知识,以及模型毫不掩饰,尤其是对于日益增长的受欢迎程度的新数据,例如单细胞基因组学等新的数据类型。调查人员在三个主要目标中应对这些挑战。首先,研究人员通过共同对全球多元化数据集的遗传关联进行建模,将遗传变异与理解少数群体中的基因表达变化联系起来。其次,研究人员使用基因调节网络和先进的机器学习算法的先验知识来绘制全基因组遗传变异的绘制综合方法来改变基因表达,以减少多重测试的燃烧。第三,研究人员设计了一种新的统计模型,以表征高分辨率基因表达调节的细胞类型特异性。该模型利用单细胞数据的自然分布,解决最新方法的模型错误,并通过对供体的数百万个单细胞测量进行建模,从而减少了测量噪声。该奖项支持表征遗传变异功能的开源基因组软件和数据存储库的生成,同时还为资源不足的高中生创造了教育和培训机会,并激发了学科和研究生的动机。在这种关系中,共生的研究和教育交织有望增强研究环境的多样性以及研究群体的多样性。该奖项反映了NSF的法定任务,并被认为是通过基金会的知识分子和更广泛影响的评估标准来评估通过评估而被认为是珍贵的。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tiffany Amariuta-Bartell其他文献

Tiffany Amariuta-Bartell的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

基于随机化的高效可扩展深度学习算法研究
  • 批准号:
    62376131
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
面向二氧化碳封存的高可扩展时空并行区域分解算法及其大规模应用
  • 批准号:
    12371366
  • 批准年份:
    2023
  • 资助金额:
    43.5 万元
  • 项目类别:
    面上项目
数千万核可扩展的稀疏直接法解法器算法研究
  • 批准号:
    62372467
  • 批准年份:
    2023
  • 资助金额:
    50.00 万元
  • 项目类别:
    面上项目
可扩展的自适应深度矩阵补全:快速算法和理论分析
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
可扩展的自适应深度矩阵补全:快速算法和理论分析
  • 批准号:
    62202174
  • 批准年份:
    2022
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Fast Scalable Graph Algorithms
职业:快速可扩展图算法
  • 批准号:
    2340048
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
  • 批准号:
    2340586
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
Unified, Scalable, and Reproducible Neurostatistical Software
统一、可扩展且可重复的神经统计软件
  • 批准号:
    10725500
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
CAREER: Learning Kernels in Operators from Data: Learning Theory, Scalable Algorithms and Applications
职业:从数据中学习算子的内核:学习理论、可扩展算法和应用
  • 批准号:
    2238486
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
CAREER: Scalable Algorithms for Nonlinear, Large-Scale Inverse Problems Governed by Dynamical Systems
职业:动态系统控制的非线性、大规模反问题的可扩展算法
  • 批准号:
    2145845
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了