Mining Thousands of Genomes to Classify Somatic and Pathogenic Structural Variants

挖掘数千个基因组以对体细胞和致病结构变异进行分类

基本信息

  • 批准号:
    10453323
  • 负责人:
  • 金额:
    $ 57.85万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-09-23 至 2027-06-30
  • 项目状态:
    未结题

项目摘要

Project Summary Structural variants (SVs) have been associated with a wide range of cancers and Mendelian disorders, but complexities associated with interpretation have slowed their adoption. It is still a challenge to determine which SVs observed in a cancer patient are somatic and which SVs in a rare disease patient are pathogenetic. The SV interpretation gap is especially stark when compared to the recent progress made with single nucleotide variants (SNVs), which was driven by the release of large-scale population allele frequency estimates from gnomAD. Given that variants that lead to cancer and rare disease should be rare in the general population, the SNV allele frequency from 125 thousand samples is an extremely powerful metric. Allele frequency alone can reduce the number of potentially pathogenic variants by two orders of magnitude. Unfortunately, there is no equivalent resource for SV. There are high-quality SV call sets (SV VCFs) from large cohorts, but these static lists do not make good allele frequency references. SV detection involves extensive filtering to reduce false positives, and because filtering is never perfect, real SVs are inevitably removed making it difficult to draw a conclusion about SVs that are in patients but not in VCF. The SV could be rare and absent from the population or could have been filtered. We propose a new method (STIX) for SV characterization that dynamically searches the raw alignments from thousands of genomes for evidence supporting a putative SV. From such a search we can conclude that an SV with high-level evidence in many samples is likely to be a common variant and unlikely to be somatic or pathogenic. With this method we show that many published somatic and de novo SVs are actually present in reference populations, which implies that these variants are unlikely to cause disease. In fact, STIX is as effective as using calls from a matched-normal sample at removing germline SVs from tumor tissue calls. We also show that by relying on the raw signal, STIX recovers substantially more SVs from a cohort than its corresponding SV VCF. In addition to large-scale SV searching, we propose a robust statistical framework for estimating SV allele frequency and regional noise. We plan to make the searching technology and statics freely available for nearly 30,000 genomes through a public web interface and integration with AnVIL. If funded, this project will provide the means to accurately estimate SV population frequency by leveraging the data in tens of thousands of genomes, which will greatly increase our ability to prioritize SVs in patients and pave the way toward broader inclusion of SVs in medical genetics.
项目摘要 结构变异(SV)与广泛的癌症和孟德尔疾病有关,但 与解释相关的复杂性减慢了他们的采用。确定哪个是一个挑战 在癌症患者中观察到的SV是躯体的,并且在罕见病患者中的SV是致病的。这 与单核苷酸最近取得的进展相比,SV解释差距特别明显 变体(SNV)是由大规模种群等位基因频率估计的驱动的 侏儒。鉴于导致癌症和罕见疾病的变异在一般人群中应该很少 来自12.5万样品的SNV等位基因频率是一个极其强大的度量标准。仅等位基因频率就可以 将潜在病原变体的数量减少两个数量级。不幸的是,没有 SV的等效资源。 大型同类群有高质量的SV呼叫集(SV VCF),但这些静态列表并未构成 良好的等位基因频率参考。 SV检测涉及大量过滤以减少误报,并且 因为过滤永远不是完美的,因此不可避免地将真实的SV删除,因此很难得出结论 患者但不在VCF中的SV。 SV可能很少见,或者不存在 被过滤。 我们为SV表征提出了一种新方法(Stix),该方法动态搜索RAW 数千个基因组的一致性用于支持推定SV的证据。通过这样的搜索,我们可以 得出结论,许多样本中具有高级证据的SV可能是一种常见的变体,并且不太可能 是体细胞或致病性的。通过这种方法,我们表明许多已发表的躯体和从头svs是 实际存在于参考人群中,这意味着这些变体不太可能引起疾病。在 事实,stix与使用肿瘤中去除种系SV时使用匹配正常样品的调用一样有效 纸巾呼叫。我们还表明,通过依靠原始信号,Stix从A中恢复了更多SV 队列比其相应的SV VCF。 除了搜索大型SV外,我们还提出了一个可靠的统计框架来估计SV 等位基因频率和区域噪声。我们计划使搜索技术和静态图可自由使用 通过公共网络界面并与砧集成,将近30,000个基因组。如果资助,这个项目将 通过利用数万年的数据来提供准确估计SV人群频率的方法 基因组将大大提高我们优先考虑患者SV的能力,并铺平道路 更广泛的SV在医学遗传学中。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ryan M Layer其他文献

Ryan M Layer的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ryan M Layer', 18)}}的其他基金

Mining Thousands of Genomes to Classify Somatic and Pathogenic Structural Variants
挖掘数千个基因组以对体细胞和致病结构变异进行分类
  • 批准号:
    10709480
  • 财政年份:
    2022
  • 资助金额:
    $ 57.85万
  • 项目类别:
A scalable, integrative, multi-omic analysis platform
可扩展、综合、多组学分析平台
  • 批准号:
    9769844
  • 财政年份:
    2018
  • 资助金额:
    $ 57.85万
  • 项目类别:
A scalable, integrative, multi-omic analysis platform
可扩展、综合、多组学分析平台
  • 批准号:
    9295640
  • 财政年份:
    2017
  • 资助金额:
    $ 57.85万
  • 项目类别:

相似国自然基金

采用复合防护材料的水下多介质耦合作用下重力坝抗爆机理研究
  • 批准号:
    51779168
  • 批准年份:
    2017
  • 资助金额:
    59.0 万元
  • 项目类别:
    面上项目
采用数值计算求解一类半代数系统全部整数解
  • 批准号:
    11671377
  • 批准年份:
    2016
  • 资助金额:
    48.0 万元
  • 项目类别:
    面上项目
采用pinball loss的MEE算法研究
  • 批准号:
    11401247
  • 批准年份:
    2014
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
采用路径算法和管网简化的城市内涝近实时模拟
  • 批准号:
    41301419
  • 批准年份:
    2013
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目
采用ε近似算法的盲信道均衡
  • 批准号:
    60172058
  • 批准年份:
    2001
  • 资助金额:
    16.0 万元
  • 项目类别:
    面上项目

相似海外基金

Move and Snooze: Adding insomnia treatment to an exercise program to improve pain outcomes in older adults with knee osteoarthritis
活动和小睡:在锻炼计划中添加失眠治疗,以改善患有膝骨关节炎的老年人的疼痛结果
  • 批准号:
    10797056
  • 财政年份:
    2023
  • 资助金额:
    $ 57.85万
  • 项目类别:
High-throughput thermodynamic and kinetic measurements for variant effects prediction in a major protein superfamily
用于预测主要蛋白质超家族变异效应的高通量热力学和动力学测量
  • 批准号:
    10752370
  • 财政年份:
    2023
  • 资助金额:
    $ 57.85万
  • 项目类别:
Bioethical, Legal, and Anthropological Study of Technologies (BLAST)
技术的生物伦理、法律和人类学研究 (BLAST)
  • 批准号:
    10831226
  • 财政年份:
    2023
  • 资助金额:
    $ 57.85万
  • 项目类别:
Discovering clinical endpoints of toxicity via graph machine learning and semantic data analysis
通过图机器学习和语义数据分析发现毒性的临床终点
  • 批准号:
    10745593
  • 财政年份:
    2023
  • 资助金额:
    $ 57.85万
  • 项目类别:
Enhanced Medication Management to Control ADRD Risk Factors Among African Americans and Latinos
加强药物管理以控制非裔美国人和拉丁裔的 ADRD 风险因素
  • 批准号:
    10610975
  • 财政年份:
    2023
  • 资助金额:
    $ 57.85万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了