Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
基本信息
- 批准号:9764454
- 负责人:
- 金额:$ 60.34万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-15 至 2023-05-31
- 项目状态:已结题
- 来源:
- 关键词:Animal ModelArchitectureAwarenessBiological databasesCollaborationsCollectionCommon Data ElementCommunitiesComplexComputer softwareComputing MethodologiesConsensusConsensus SequenceDNADNA Transposable ElementsDataData SetData SourcesData Storage and RetrievalDatabasesDevelopmentDisincentiveEducational workshopElementsEvolutionFamilyFoundationsFundingGenerationsGenomeGrowthHumanHuman GenomeImageryInfrastructureKnowledgeLibrariesLicensingMedicalMetadataMethodsModelingMovementMutationNomenclatureOrganismPaperProductionProtocols documentationPublicationsQuality ControlRepetitive SequenceResearchResearch InstituteResearch PersonnelResourcesSequence AlignmentSequence AnalysisSourceStandardizationSystemTaxonomyTimeTrainingTrustUnited States National Institutes of HealthUpdateWorkadjudicationannotation systembasedata managementdata modelingexpectationexperiencegenetic informationgenome annotationgenome browsergenome-wideimprovedinnovationmarkov modelmeetingsmethod developmentnoveloutreachreference genomerepositoryvertebrate genomewhole genome
项目摘要
Project Summary / Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes.
Thorough and accurate annotation of repetitive content in genomes depends on a comprehensive database of
known TEs, along with robust statistical and procedural methods for recognizing decayed instances of elements
and disentangling their complex relationships.
Annotation of TE instances is usually performed using our RepeatMasker software, which compares a genome
to a database containing representations of known repeat families. These have historically been consensus
sequences, which generally approximate the sequences of the original TEs. The largest repository of such
consensus sequences is Repbase, whose restrictive license and limited interface for curators has led to a lack of
input from third parties and the creation of many unaffiliated, often organism-specific open databases. The parallel
existence of these many databases has led to a divergence in nomenclature and repeat definition.
Our Dfam database is an open access collection of repetitive DNA families, in which each family is represented
by a multiple sequence alignment and a profile hidden Markov model (HMM). We have demonstrated that profile
HMMs support improved annotation sensitivity, and Dfam provides numerous aids to both curators of TE families
and those who make use of the resulting annotations. In this proposal, we describe a plan to develop the
infrastructure of Dfam to expand to 1000s of genomes, and to establish a self-sustaining TE Data Commons
dependent on limited centralized curation. We further describe plans to improve the quality of repeat annotation
through development of methods for more reliable alignment adjudication, to expand approaches to visualization
of this complex data type, and to improve the modeling of TE subfamilies.
By further developing this open access database, we will provide a strong disincentive for the proliferation of
unaffiliated non-standard repeat datasets and ease the burden of data management for those developing TE
libraries.
项目摘要 /摘要
重复的DNA,特别是由于转座元素(TES),占许多基因组的很大一部分。
基因组中重复含量的彻底准确注释取决于一个综合数据库
已知的TE,以及可靠的统计和程序方法,用于识别元素的衰减实例
并解开他们复杂的关系。
通常使用我们的repotmasker软件进行TE实例的注释,该软件比较基因组
到包含已知重复系列表示的数据库。这些历史上已经达成共识
序列,通常近似原始TE的序列。最大的存储库
共识序列是repbase,其限制性许可和策展人的界面有限导致缺乏
来自第三方的输入以及创建许多非相关的,通常特定于有机体的开放数据库。平行
这些许多数据库的存在导致命名法和重复定义的差异。
我们的DFAM数据库是重复的DNA家族的开放访问集合,每个家族的代表
通过多个序列比对和配置文件隐藏的马尔可夫模型(HMM)。我们已经证明了该配置文件
HMMS支持提高注释敏感性,DFAM为TE家族的两个策展人提供了许多辅助工具
以及那些利用由此产生的注释的人。在此提案中,我们描述了制定的计划
DFAM的基础架构将扩展到1000秒的基因组,并建立自我维持的TE数据共享
取决于有限的集中策划。我们进一步描述了提高重复注释质量的计划
通过开发方法以进行更可靠的一致性裁决,扩展可视化方法
在这种复杂的数据类型中,并改善TE亚家族的建模。
通过进一步开发此开放访问数据库,我们将为扩散的强烈抑制
非相关的非标准重复数据集并减轻开发TE的人的数据管理负担
库。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert MacDonald Hubley其他文献
Robert MacDonald Hubley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert MacDonald Hubley', 18)}}的其他基金
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10367846 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10563214 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10165778 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10714226 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10407543 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
相似国自然基金
“共享建筑学”的时空要素及表达体系研究
- 批准号:
- 批准年份:2019
- 资助金额:63 万元
- 项目类别:面上项目
基于城市空间日常效率的普通建筑更新设计策略研究
- 批准号:51778419
- 批准年份:2017
- 资助金额:61.0 万元
- 项目类别:面上项目
宜居环境的整体建筑学研究
- 批准号:51278108
- 批准年份:2012
- 资助金额:68.0 万元
- 项目类别:面上项目
The formation and evolution of planetary systems in dense star clusters
- 批准号:11043007
- 批准年份:2010
- 资助金额:10.0 万元
- 项目类别:专项基金项目
新型钒氧化物纳米组装结构在智能节能领域的应用
- 批准号:20801051
- 批准年份:2008
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Photoactivatable cell sorting to link genetic variation with complex cellular phenotypes
可光激活的细胞分选将遗传变异与复杂的细胞表型联系起来
- 批准号:
10539111 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Mapping Protein Interaction Networks Essential for Gonococcal Pathogenesis
绘制淋球菌发病机制所必需的蛋白质相互作用网络
- 批准号:
10401945 - 财政年份:2021
- 资助金额:
$ 60.34万 - 项目类别:
Mapping Protein Interaction Networks Essential for Gonococcal Pathogenesis
绘制淋球菌发病机制所必需的蛋白质相互作用网络
- 批准号:
10814526 - 财政年份:2021
- 资助金额:
$ 60.34万 - 项目类别: