Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
基本信息
- 批准号:9764454
- 负责人:
- 金额:$ 60.34万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-15 至 2023-05-31
- 项目状态:已结题
- 来源:
- 关键词:Animal ModelArchitectureAwarenessBiological databasesCollaborationsCollectionCommon Data ElementCommunitiesComplexComputer softwareComputing MethodologiesConsensusConsensus SequenceDNADNA Transposable ElementsDataData SetData SourcesData Storage and RetrievalDatabasesDevelopmentDisincentiveEducational workshopElementsEvolutionFamilyFoundationsFundingGenerationsGenomeGrowthHumanHuman GenomeImageryInfrastructureKnowledgeLibrariesLicensingMedicalMetadataMethodsModelingMovementMutationNomenclatureOrganismPaperProductionProtocols documentationPublicationsQuality ControlRepetitive SequenceResearchResearch InstituteResearch PersonnelResourcesSequence AlignmentSequence AnalysisSourceStandardizationSystemTaxonomyTimeTrainingTrustUnited States National Institutes of HealthUpdateWorkadjudicationannotation systembasedata managementdata modelingexpectationexperiencegenetic informationgenome annotationgenome browsergenome-wideimprovedinnovationmarkov modelmeetingsmethod developmentnoveloutreachreference genomerepositoryvertebrate genomewhole genome
项目摘要
Project Summary / Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes.
Thorough and accurate annotation of repetitive content in genomes depends on a comprehensive database of
known TEs, along with robust statistical and procedural methods for recognizing decayed instances of elements
and disentangling their complex relationships.
Annotation of TE instances is usually performed using our RepeatMasker software, which compares a genome
to a database containing representations of known repeat families. These have historically been consensus
sequences, which generally approximate the sequences of the original TEs. The largest repository of such
consensus sequences is Repbase, whose restrictive license and limited interface for curators has led to a lack of
input from third parties and the creation of many unaffiliated, often organism-specific open databases. The parallel
existence of these many databases has led to a divergence in nomenclature and repeat definition.
Our Dfam database is an open access collection of repetitive DNA families, in which each family is represented
by a multiple sequence alignment and a profile hidden Markov model (HMM). We have demonstrated that profile
HMMs support improved annotation sensitivity, and Dfam provides numerous aids to both curators of TE families
and those who make use of the resulting annotations. In this proposal, we describe a plan to develop the
infrastructure of Dfam to expand to 1000s of genomes, and to establish a self-sustaining TE Data Commons
dependent on limited centralized curation. We further describe plans to improve the quality of repeat annotation
through development of methods for more reliable alignment adjudication, to expand approaches to visualization
of this complex data type, and to improve the modeling of TE subfamilies.
By further developing this open access database, we will provide a strong disincentive for the proliferation of
unaffiliated non-standard repeat datasets and ease the burden of data management for those developing TE
libraries.
项目概要/摘要
重复DNA,尤其是由转座元件(TE) 引起的重复DNA,构成了许多基因组的很大一部分。
对基因组中重复内容进行彻底、准确的注释取决于一个全面的数据库
已知的 TE,以及用于识别元素腐烂实例的强大统计和程序方法
并理清他们复杂的关系。
TE 实例的注释通常使用我们的 RepeatMasker 软件来执行,该软件会比较基因组
到包含已知重复家族表示的数据库。这些都是历史上的共识
序列,通常近似于原始 TE 的序列。最大的此类存储库
共识序列是 Repbase,其限制性许可和策展人的有限接口导致缺乏
第三方的投入以及许多独立的、通常针对特定生物体的开放数据库的创建。平行的
如此众多的数据库的存在导致了术语和重复定义的分歧。
我们的 Dfam 数据库是重复 DNA 家族的开放获取集合,每个家族都有代表
通过多序列比对和轮廓隐马尔可夫模型(HMM)。我们已经证明了该配置文件
HMM 支持改进的注释敏感性,Dfam 为 TE 系列的管理者提供了大量帮助
以及那些使用结果注释的人。在本提案中,我们描述了一项开发计划
Dfam 基础设施可扩展到数千个基因组,并建立自我维持的 TE 数据共享
依赖于有限的集中管理。我们进一步描述了提高重复注释质量的计划
通过开发更可靠的对齐判断方法,扩展可视化方法
这种复杂的数据类型,并改进 TE 子族的建模。
通过进一步开发这个开放存取数据库,我们将为
独立的非标准重复数据集,减轻 TE 开发人员的数据管理负担
图书馆。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert MacDonald Hubley其他文献
Robert MacDonald Hubley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert MacDonald Hubley', 18)}}的其他基金
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10367846 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10563214 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10165778 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10714226 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10407543 - 财政年份:2018
- 资助金额:
$ 60.34万 - 项目类别:
相似国自然基金
“共享建筑学”的时空要素及表达体系研究
- 批准号:
- 批准年份:2019
- 资助金额:63 万元
- 项目类别:面上项目
基于城市空间日常效率的普通建筑更新设计策略研究
- 批准号:51778419
- 批准年份:2017
- 资助金额:61.0 万元
- 项目类别:面上项目
宜居环境的整体建筑学研究
- 批准号:51278108
- 批准年份:2012
- 资助金额:68.0 万元
- 项目类别:面上项目
The formation and evolution of planetary systems in dense star clusters
- 批准号:11043007
- 批准年份:2010
- 资助金额:10.0 万元
- 项目类别:专项基金项目
新型钒氧化物纳米组装结构在智能节能领域的应用
- 批准号:20801051
- 批准年份:2008
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Photoactivatable cell sorting to link genetic variation with complex cellular phenotypes
可光激活的细胞分选将遗传变异与复杂的细胞表型联系起来
- 批准号:
10539111 - 财政年份:2022
- 资助金额:
$ 60.34万 - 项目类别:
Mapping Protein Interaction Networks Essential for Gonococcal Pathogenesis
绘制淋球菌发病机制所必需的蛋白质相互作用网络
- 批准号:
10401945 - 财政年份:2021
- 资助金额:
$ 60.34万 - 项目类别:
Mapping Protein Interaction Networks Essential for Gonococcal Pathogenesis
绘制淋球菌发病机制所必需的蛋白质相互作用网络
- 批准号:
10814526 - 财政年份:2021
- 资助金额:
$ 60.34万 - 项目类别: