Representing structural haplotypes and complex genetic variation in pan-genome graphs
表示泛基因组图中的结构单倍型和复杂的遗传变异
基本信息
- 批准号:9906038
- 负责人:
- 金额:$ 38.9万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-02-10 至 2023-01-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAffectAllelesAttentionCellsChromosomesComplementComplexComputer softwareCopy Number PolymorphismDNA sequencingDataData SetDevelopmentDiseaseEnsureFelis catusFutureGeneticGenetic VariationGenomeGenotypeGoalsGraphHaplotypesHumanHuman CharacteristicsHuman GeneticsHuman GenomeIndividualInfrastructureLarge-Scale SequencingLengthMapsMeasuresMethodologyMethodsMinisatellite RepeatsPhasePopulationResolutionShort Tandem RepeatSingle Nucleotide PolymorphismSingle base substitutionSoftware FrameworkSoftware ToolsStructureVariantbasedata standardsdesigngenome sciencesimprovedinsertion/deletion mutationinteroperabilitynovelopen sourcepan-genomeparalogous genereconstructionreference genomesoftware developmentsuccesstooluser-friendlyvariant detectionworking group
项目摘要
Title: Representing structural haplotypes and complex genetic variation in pan-genome
graphs.
PROJECT SUMMARY
A pan-genome graph (PGG) reference must faithfully reflect structural haplotypes that differ in copy number,
order, and orientation, which are currently poorly represented in a linear reference sequence. This effort
focuses on the most copy variable and complex regions, including segmental duplications (SDs), inversions,
short tandem repeats/variable number tandem repeats (copy-number-variable repeats, CNVRs) and
combinations thereof that are frequently excluded or collapsed in reference genomes. The overarching goal of
this project is to develop the tool infrastructure enabling the construction of whole-chromosome reference
haplotypes that include all of these difficult classes of sequence. There are four specific aims. First, we will
develop methods to construct PGGs from haplotype-phased de novo assemblies, ensuring the graph reflects
both copy number variation and repeat structure, including CNVRs and SD. Second, we will develop software
that will expand SD assembly methods to facilitate the curation of SD loci in PGGs. We will use SD assembly
to detect variants specific to individual copies of a duplication, called paralog-specific variants (PSVs), and
provide software to reconstruct local haplotype paths through the PGG that describe the different copies. Third,
we will design novel methods to exploit single-cell template strand DNA sequencing data (Strand-seq) mapped
to PGGs in order to thread chromosome-length "structural haplotypes" through the graph. Therefore, our
software tool will allow the physical resolution of haplotypes comprising the full spectrum of structural variation,
including inversions and inverted duplications. By virtue of the PSVs, the structural haplotypes will also embed
sequence-resolved SDs. Fourth, we will develop a scalable open-source software framework to systematically
assess how the inclusion of single-nucleotide variants, short indels, and structural variant classes in the PGG
affects variant detection with short-read data. This will enable the optimization of the complexity encoded in the
PGG for short-read variant detection. It will additionally provide a comprehensive view on polymorphic and
fixed k-mers in human populations. We will develop tools to detect allele-specific k-mers and demonstrate how
that enables the rapid genotyping of variants in the PGG based on k-mer composition of a short-read dataset.
Once the framework for enhanced genome representation is established, we will focus on improving efficiency,
scalability, and computational ease to cater to the needs of a broad range of users in genetics and genome
science. This proposal will ensure that the most complex regions of the human genome are encoded into the
PGG and that underlying genetic variation is ultimately assessed for association with disease.
标题:代表泛基因组中的结构单倍型和复杂的遗传变异
图表。
项目概要
泛基因组图 (PGG) 参考必须忠实反映拷贝数不同的结构单倍型,
顺序和方向,目前在线性参考序列中很难表示。
重点关注复制变量最多和最复杂的区域,包括片段重复 (SD)、倒置、
短串联重复序列/可变数量串联重复序列(拷贝数可变重复序列,CNVR)和
参考基因组中经常被排除或折叠的其组合。
该项目旨在开发能够构建全染色体参考的工具基础设施
包含所有这些困难类别的序列的单倍型有四个具体目标。
开发从单倍型定相从头组装构建 PGG 的方法,确保图表反映
拷贝数变异和重复结构,包括 CNVR 和 SD 其次,我们将开发软件。
这将扩展 SD 组装方法,以促进 PGG 中 SD 基因座的管理。我们将使用 SD 组装。
检测重复的各个副本特有的变体,称为旁系同源特异变体(PSV),以及
提供通过描述不同副本的 PGG 重建本地单倍型路径的软件。
我们将设计新方法来利用单细胞模板链 DNA 测序数据 (Strand-seq) 映射
到 PGG 以便将染色体长度的“结构单倍型”穿过图表。
软件工具将允许对包含全谱结构变异的单倍型进行物理解析,
凭借 PSV,结构单倍型也将嵌入,包括倒位和倒置重复。
第四,我们将开发一个可扩展的开源软件框架,以系统地解决问题。
评估如何在 PGG 中包含单核苷酸变异、短插入缺失和结构变异类别
影响短读数据的变异检测这将能够优化编码的复杂性。
用于短读长变异检测的 PGG 此外还将提供多态性和多态性的全面视图。
我们将开发检测等位基因特异性 k-mers 的工具并演示如何进行。
它能够根据短读数据集的 k-mer 组成对 PGG 中的变异进行快速基因分型。
一旦建立了增强基因组表达的框架,我们将专注于提高效率,
可扩展性和计算简便性,可满足遗传学和基因组领域广泛用户的需求
该提案将确保人类基因组最复杂的区域被编码到
最终评估 PGG 和潜在的遗传变异与疾病的关联。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark Chaisson其他文献
Mark Chaisson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark Chaisson', 18)}}的其他基金
Representing structural haplotypes and complex genetic variation in pan-genome graphs
表示泛基因组图中的结构单倍型和复杂的遗传变异
- 批准号:
10832934 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别:
Detection and genotyping complex human genetic variation using single-molecule sequencing
使用单分子测序对复杂的人类遗传变异进行检测和基因分型
- 批准号:
10186109 - 财政年份:2021
- 资助金额:
$ 38.9万 - 项目类别:
Detection and genotyping complex human genetic variation using single-molecule sequencing
使用单分子测序对复杂的人类遗传变异进行检测和基因分型
- 批准号:
10655573 - 财政年份:2021
- 资助金额:
$ 38.9万 - 项目类别:
Detection and genotyping complex human genetic variation using single-molecule sequencing
使用单分子测序对复杂的人类遗传变异进行检测和基因分型
- 批准号:
10447193 - 财政年份:2021
- 资助金额:
$ 38.9万 - 项目类别:
Representing structural haplotypes and complex genetic variation in pan-genome graphs
表示泛基因组图中的结构单倍型和复杂的遗传变异
- 批准号:
10337078 - 财政年份:2020
- 资助金额:
$ 38.9万 - 项目类别:
相似国自然基金
KIR3DL1等位基因启动子序列变异影响其差异表达的分子机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
NUP205双等位基因突变影响纤毛发生而致内脏转位合并先天性心脏病的机理研究
- 批准号:
- 批准年份:2021
- 资助金额:54 万元
- 项目类别:面上项目
全基因组范围内揭示杂交肉兔等位基因特异性表达模式对杂种优势遗传基础的影响
- 批准号:32102530
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
等位基因不平衡表达对采后香蕉果实后熟与品质形成的影响
- 批准号:31972471
- 批准年份:2019
- 资助金额:57 万元
- 项目类别:面上项目
高温影响水稻不同Wx等位基因表达及直链淀粉含量的分子机制研究
- 批准号:31500972
- 批准年份:2015
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Effects of Aging on Neuronal Lysosomal Damage Responses Driven by CMT2B-linked Rab7
衰老对 CMT2B 相关 Rab7 驱动的神经元溶酶体损伤反应的影响
- 批准号:
10678789 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别:
Activity-Dependent Regulation of CaMKII and Synaptic Plasticity
CaMKII 和突触可塑性的活动依赖性调节
- 批准号:
10817516 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别:
Genetic and Environmental Influences on Individual Sweet Preference Across Ancestry Groups in the U.S.
遗传和环境对美国不同血统群体个体甜味偏好的影响
- 批准号:
10709381 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别:
Multi-omic phenotyping of human transcriptional regulators
人类转录调节因子的多组学表型分析
- 批准号:
10733155 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别:
The immunogenicity and pathogenicity of HLA-DQ in solid organ transplantation
HLA-DQ在实体器官移植中的免疫原性和致病性
- 批准号:
10658665 - 财政年份:2023
- 资助金额:
$ 38.9万 - 项目类别: