Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
基本信息
- 批准号:9973582
- 负责人:
- 金额:$ 69.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-05-01 至 2024-02-29
- 项目状态:已结题
- 来源:
- 关键词:AcuteAffectAlgorithmic SoftwareAlgorithmsAll of Us Research ProgramAllelesAreaAutomobile DrivingBiological AssayChromatin StructureChromosome StructuresClipCloud ComputingCodeCommunitiesComplexComputer softwareCopy Number PolymorphismDNADNA SequenceDataData ReportingDetectionDevelopmentDiseaseEnvironmentError SourcesExhibitsFamily StudyFundingFutureGene DuplicationGene ExpressionGene FusionGene StructureGeneticGenetic DiseasesGenetic VariationGenomeGenomicsGenotypeGoalsHumanHuman GenomeIndividualLaboratoriesLarge-Scale SequencingLocationMachine LearningMapsMethodsModelingNoiseNucleotidesPaintPathogenicityPerformancePhenotypePopulationPositioning AttributePrevalenceProcessReciprocal TranslocationResearchRunningSamplingSeaSensitivity and SpecificitySequence AlignmentSeriesSignal TransductionSoftware ToolsSourceSpeedStructureSystematic BiasTechniquesTechnologyTrainingTrans-Omics for Precision MedicineUnited States National Institutes of HealthUntranslated RNAVariantalgorithm developmentbaseconvolutional neural networkdeep learningdevelopmental diseasedosageexomeexperiencegenome analysisgenome sequencinggenome-widehuman diseaseimprovedinnovationinsertion/deletion mutationinsightlarge datasetsmethod developmentnanoporenovelpreventresearch and developmentsoftware developmentsuccesstoolvariant detectionwhole genome
项目摘要
PROJECT SUMMARY
Structural variation (SV), is a diverse class of genome variation that includes copy number variants (CNVs)
such as deletions and duplications, as well as balanced rearrangements, such as inversions and reciprocal
translocations. A typical human genome harbors >4,000 SVs larger than 300bp and their large size increases
the potential to delete or duplicate genes, disrupt chromatin structure, and alter expression. Despite their
prevalence and potential for phenotypic consequence, SVs remain notoriously difficult to detect and genotype
with high accuracy. Much of this difficulty is driven by the fact DNA sequence alignment “signals” indicating
SVs are far more complex than for single-nucleotide and insertion deletion variants. Unlike SNP alignments
that vary only in allele state, alignments supporting SVs vary in state (supports an alternate structure or not)
alignment location, and type. Consequently, the accuracy of SV discovery is much lower than that of SNPs and
INDELs. Furthermore, SV pipelines scale poorly and are difficult to run. These challenges are a barrier for
single genome analysis and studies of families must invest substantial effort into eliminating a sea of false
positives. These problems become exponentially more acute for large-scale sequencing efforts such as
TOPmed, the Centers for Common Disease Genetics, and the All of Us program. Software efficiency is key to
scalability for such projects. However, of equal importance is comprehensive, accurate discovery.
Building upon more than a decade of software development experience and analyzing SV in diverse
disease contexts, we have invested significant effort into understanding the causes of the insufficient accuracy
for SV discovery. These efforts, together with our research and development experience in this area, give us
unique insight into improving the accuracy and scalability of SV discovery. Our goal is to narrow the accuracy
gap between SNP/INDEL variation and structural variation discovery. These developments will empower
studies of human genomes in diverse contexts and will therefore have broad impact. Our goals are to:
1. Develop a deep learning model to correct systematic variation in sequence depth. This new machine
learning model will correct systematic biases in DNA sequence depth and dramatically improve the
discovery of deletions and duplications.
2. Improve the speed, scalability, and accuracy of SV detection and genotyping. Using new algorithms,
we will bring the accuracy of SV detection much closer to that of SNP and INDEL discovery and allow
accurate SV discovery to be deployed at scale.
3. Create a map of genomic constraint for SV from population-scale genome analysis. We will deploy
our new methods to detect and genotype structural variation among tens of thousands of human genomes.
The resulting SV map will empower the creation of a model of genomic constraint for SV and enable new
software to predict deleterious SVs, especially in the noncoding genome.
项目概要
结构变异 (SV) 是一种多样化的基因组变异,包括拷贝数变异 (CNV)
例如删除和重复,以及平衡重排,例如倒置和倒数
典型的人类基因组包含超过 4,000 个大于 300bp 的 SV,并且它们的大小会增加。
尽管存在删除或复制基因、破坏染色质结构和改变表达的潜力。
尽管SVs的患病率和潜在的表型后果仍然难以检测和基因分型
这种困难很大程度上是由 DNA 序列比对“信号”表明的事实造成的。
与 SNP 比对不同,SV 比单核苷酸和插入缺失变体复杂得多。
仅在等位基因状态下变化,支持 SV 的比对在状态上变化(是否支持替代结构)
比对位置和类型进行检查后,SV 发现的准确性远低于 SNP 和 SNP 的准确性。
此外,SV 管道扩展性差且难以运行。
单一基因组分析和家庭研究必须投入大量精力来消除大量虚假信息
对于大规模测序工作,例如,这些问题变得更加严重。
TOPmed、常见疾病遗传学中心和 All of Us 计划的软件效率是关键。
然而,此类项目的可扩展性同样重要的是全面、准确的发现。
以十多年的软件开发经验为基础,对不同领域的 SV 进行分析
在疾病背景下,我们投入了大量精力来理解准确性不足的原因
这些努力以及我们在该领域的研发经验为我们提供了帮助。
对提高 SV 发现的准确性和可扩展性的独特见解我们的目标是缩小准确性。
这些发展将增强 SNP/INDEL 变异和结构变异发现之间的差距。
在不同背景下对人类基因组进行研究,因此将产生广泛的影响,我们的目标是:
1. 开发深度学习模型来纠正序列深度的系统变化。
学习模型将纠正 DNA 序列深度的系统偏差,并显着提高
发现删除和重复。
2. 使用新算法提高 SV 检测和基因分型的速度、可扩展性和准确性。
我们将使 SV 检测的准确性更加接近 SNP 和 INDEL 发现的准确性,并允许
准确的SV发现将被大规模部署。
3. 根据群体规模的基因组分析创建 SV 的基因组约束图。
我们的新方法可以检测数以万计的人类基因组中的结构变异并对其进行基因分型。
由此产生的 SV 图谱将有助于创建 SV 的基因组约束模型,并启用新的
预测有害SV的软件,特别是在非编码基因组中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aaron R Quinlan其他文献
Aaron R Quinlan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Aaron R Quinlan', 18)}}的其他基金
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
10357060 - 财政年份:2022
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
10560502 - 财政年份:2022
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10576268 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10341175 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10153847 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Software for exploring all forms of genetic variation in any species
用于探索任何物种中所有形式的遗传变异的软件
- 批准号:
9749979 - 财政年份:2017
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8273206 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
9272425 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8661785 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8460819 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
相似国自然基金
TiC-TiB2颗粒喷射成形原位合成及其对M2高速工具钢共晶碳化物形成与演化的影响
- 批准号:52361020
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
植被群落演替对河道水流结构和纵向离散特性影响机制研究
- 批准号:52309088
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
热带印度洋海表皮温日变化的数值模拟及对海气热通量的影响
- 批准号:42376002
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
SGO2/MAD2互作调控肝祖细胞的细胞周期再进入影响急性肝衰竭肝再生的机制研究
- 批准号:82300697
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
协同遥感和气候模型的城市高温热浪时空特征及其对热暴露影响研究
- 批准号:42371397
- 批准年份:2023
- 资助金额:46 万元
- 项目类别:面上项目
相似海外基金
Multivariable Artificial Pancreas System to Detect and Mitigate the Effects of Unannounced Physical Activities and Acute Psychological Stress
多变量人工胰腺系统可检测和减轻未经通知的体力活动和急性心理压力的影响
- 批准号:
10488195 - 财政年份:2021
- 资助金额:
$ 69.2万 - 项目类别:
Multivariable Artificial Pancreas System to Detect and Mitigate the Effects of Unannounced Physical Activities and Acute Psychological Stress
多变量人工胰腺系统可检测和减轻未经通知的体力活动和急性心理压力的影响
- 批准号:
10290033 - 财政年份:2021
- 资助金额:
$ 69.2万 - 项目类别:
CardioWatch: An Omics-Based Prediction Assay for Cardiac Late Effects ofAcute Radiation
CardioWatch:基于组学的急性辐射心脏迟发效应预测分析
- 批准号:
10759588 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10341175 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10153847 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别: