Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
基本信息
- 批准号:10341175
- 负责人:
- 金额:$ 69.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-05-01 至 2024-02-29
- 项目状态:已结题
- 来源:
- 关键词:AcuteAffectAlgorithmic SoftwareAlgorithmsAll of Us Research ProgramAllelesAreaAutomobile DrivingBiological AssayChromatin StructureChromosome StructuresClipCloud ComputingCodeCommunitiesComplexComputer softwareCopy Number PolymorphismDNADNA SequenceDataData ReportingDetectionDevelopmentDiseaseEnvironmentError SourcesExhibitsFamily StudyFundingFutureGene DuplicationGene ExpressionGene FusionGene StructureGeneticGenetic DiseasesGenetic VariationGenomeGenomicsGenotypeGoalsHumanHuman GenomeIndividualLaboratoriesLarge-Scale SequencingLocationMapsMethodsModelingNoiseNucleotidesPaintPathogenicityPerformancePhenotypePopulationPositioning AttributePrevalenceProcessReciprocal TranslocationResearchRunningSamplingSeaSensitivity and SpecificitySequence AlignmentSeriesSignal TransductionSoftware ToolsSourceSpeedStructureSystematic BiasTechniquesTechnologyTrainingTrans-Omics for Precision MedicineUnited States National Institutes of HealthUntranslated RNAVariantalgorithm developmentbaseconvolutional neural networkdeep learningdeep learning modeldevelopmental diseasedosageexomeexperiencegenome analysisgenome sequencinggenome-widehuman diseaseimprovedinnovationinsertion/deletion mutationinsightlarge datasetsmachine learning modelmethod developmentnanoporenovelpreventresearch and developmentsoftware developmentsuccesstoolvariant detectionwhole genome
项目摘要
PROJECT SUMMARY
Structural variation (SV), is a diverse class of genome variation that includes copy number variants (CNVs)
such as deletions and duplications, as well as balanced rearrangements, such as inversions and reciprocal
translocations. A typical human genome harbors >4,000 SVs larger than 300bp and their large size increases
the potential to delete or duplicate genes, disrupt chromatin structure, and alter expression. Despite their
prevalence and potential for phenotypic consequence, SVs remain notoriously difficult to detect and genotype
with high accuracy. Much of this difficulty is driven by the fact DNA sequence alignment “signals” indicating
SVs are far more complex than for single-nucleotide and insertion deletion variants. Unlike SNP alignments
that vary only in allele state, alignments supporting SVs vary in state (supports an alternate structure or not)
alignment location, and type. Consequently, the accuracy of SV discovery is much lower than that of SNPs and
INDELs. Furthermore, SV pipelines scale poorly and are difficult to run. These challenges are a barrier for
single genome analysis and studies of families must invest substantial effort into eliminating a sea of false
positives. These problems become exponentially more acute for large-scale sequencing efforts such as
TOPmed, the Centers for Common Disease Genetics, and the All of Us program. Software efficiency is key to
scalability for such projects. However, of equal importance is comprehensive, accurate discovery.
Building upon more than a decade of software development experience and analyzing SV in diverse
disease contexts, we have invested significant effort into understanding the causes of the insufficient accuracy
for SV discovery. These efforts, together with our research and development experience in this area, give us
unique insight into improving the accuracy and scalability of SV discovery. Our goal is to narrow the accuracy
gap between SNP/INDEL variation and structural variation discovery. These developments will empower
studies of human genomes in diverse contexts and will therefore have broad impact. Our goals are to:
1. Develop a deep learning model to correct systematic variation in sequence depth. This new machine
learning model will correct systematic biases in DNA sequence depth and dramatically improve the
discovery of deletions and duplications.
2. Improve the speed, scalability, and accuracy of SV detection and genotyping. Using new algorithms,
we will bring the accuracy of SV detection much closer to that of SNP and INDEL discovery and allow
accurate SV discovery to be deployed at scale.
3. Create a map of genomic constraint for SV from population-scale genome analysis. We will deploy
our new methods to detect and genotype structural variation among tens of thousands of human genomes.
The resulting SV map will empower the creation of a model of genomic constraint for SV and enable new
software to predict deleterious SVs, especially in the noncoding genome.
项目摘要
结构变化(SV)是一种潜水类基因组变异,其中包括拷贝数变体(CNV)
例如删除和复制,以及平衡的重排,例如反转和相互
易位。典型的人类基因组港口> 4,000 svs大于300bp,其大尺寸增加
删除或重复基因,破坏染色质结构并改变表达的潜力。尽管他们
出现表型后果的流行率和潜力,SV仍然难以检测和基因型
具有很高的精度。大部分困难是由DNA序列对准“信号”驱动的,表明
SV比单核苷酸和插入删除变体要复杂得多。与SNP对齐不同
仅在等位基因状态下,支持SVS的一致性在状态下有所不同(是否支持替代结构)
对齐位置和类型。因此,SV发现的准确性远低于SNP和SNP的准确性
indels。此外,SV管道规模较差,难以运行。这些挑战是一个障碍
单个基因组分析和对家庭的研究必须大力消除虚假的海洋
积极的。对于大规模测序工作,例如
最高的是普通疾病遗传学中心和我们所有人的计划。软件效率是关键
此类项目的可扩展性。但是,同等重要的是全面,准确的发现。
基于十多年的软件开发经验并分析潜水员的SV
疾病环境,我们已经投入了大量精力来理解准确性不足的原因
SV发现。这些努力,以及我们在这一领域的研发经验,给我们
对提高SV发现的准确性和可扩展性的独特见解。我们的目标是缩小准确性
SNP/indel变化与结构变化发现之间的差距。这些发展将授权
对潜水员背景下人类基因组的研究将产生广泛的影响。我们的目标是:
1。开发一个深度学习模型,以纠正序列深度的系统变化。这台新机器
学习模型将纠正DNA序列深度的系统偏见,并显着改善
发现删除和重复。
2。提高SV检测和基因分型的速度,可伸缩性和准确性。使用新算法,
我们将使SV检测的准确性更接近SNP和Indel Discovery,并允许
精确的SV发现将大规模部署。
3。根据人群规模的基因组分析,为SV创建基因组约束图。我们将部署
我们在数以万计的人类基因组中检测和基因型结构变异的新方法。
由此产生的SV地图将赋予SV基因组约束模型的创建,并启用新的
预测有害SV的软件,尤其是在非编码基因组中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Aaron R Quinlan其他文献
Aaron R Quinlan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Aaron R Quinlan', 18)}}的其他基金
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
10357060 - 财政年份:2022
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
10560502 - 财政年份:2022
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10576268 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
9973582 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10153847 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Software for exploring all forms of genetic variation in any species
用于探索任何物种中所有形式的遗传变异的软件
- 批准号:
9749979 - 财政年份:2017
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8273206 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
9272425 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8460819 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
New algorithms and tools for large-scale genomic analyses
用于大规模基因组分析的新算法和工具
- 批准号:
8661785 - 财政年份:2012
- 资助金额:
$ 69.2万 - 项目类别:
相似国自然基金
海洋缺氧对持久性有机污染物入海后降解行为的影响
- 批准号:42377396
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
氮磷的可获得性对拟柱孢藻水华毒性的影响和调控机制
- 批准号:32371616
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
还原条件下铜基催化剂表面供-受电子作用表征及其对CO2电催化反应的影响
- 批准号:22379027
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
CCT2分泌与内吞的机制及其对毒性蛋白聚集体传递的影响
- 批准号:32300624
- 批准年份:2023
- 资助金额:10 万元
- 项目类别:青年科学基金项目
在轨扰动影响下空间燃料电池系统的流动沸腾传质机理与抗扰控制研究
- 批准号:52377215
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Multivariable Artificial Pancreas System to Detect and Mitigate the Effects of Unannounced Physical Activities and Acute Psychological Stress
多变量人工胰腺系统可检测和减轻未经通知的体力活动和急性心理压力的影响
- 批准号:
10488195 - 财政年份:2021
- 资助金额:
$ 69.2万 - 项目类别:
Multivariable Artificial Pancreas System to Detect and Mitigate the Effects of Unannounced Physical Activities and Acute Psychological Stress
多变量人工胰腺系统可检测和减轻未经通知的体力活动和急性心理压力的影响
- 批准号:
10290033 - 财政年份:2021
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
9973582 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
CardioWatch: An Omics-Based Prediction Assay for Cardiac Late Effects ofAcute Radiation
CardioWatch:基于组学的急性辐射心脏迟发效应预测分析
- 批准号:
10759588 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别:
Scalable detection and interpretation of structural variation in human genomes
人类基因组结构变异的可扩展检测和解释
- 批准号:
10153847 - 财政年份:2020
- 资助金额:
$ 69.2万 - 项目类别: