A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
基本信息
- 批准号:10625529
- 负责人:
- 金额:$ 80.94万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-01 至 2027-03-31
- 项目状态:未结题
- 来源:
- 关键词:ATAC-seqAlgorithmsAtlasesAutomobile DrivingBase PairingBenchmarkingBindingCRISPR/Cas technologyCatalogsCellsChIP-seqChromatinCodeCollaborationsCollectionCommunitiesCommunity OutreachComputer ModelsDNADNA SequenceDataData AnalysesData SetDevelopmentDiseaseEducation and OutreachEducational workshopElementsEpigenetic ProcessExonsFunctional disorderFutureGenesGenomicsHistonesHumanHuman BioMolecular Atlas ProgramHuman GenomeHuman Genome ProjectHuman bodyIndividualInternationalInterruptionMapsMediatingMethodsModelingNematodaOnline SystemsOrganismPatternPhysiologyProcessQuality ControlRegistriesRegulatory ElementResearchResearch PersonnelResolutionResourcesRoleSchemeSignal TransductionSpecific qualifier valueTechniquesTechnologyTestingTimeTissuesTrainingTrans-Omics for Precision MedicineTranscriptional RegulationUntranslated RNAVariantVisualizationWorkbasecell typecommunity buildingcommunity settingdata analysis pipelinedata repositorydeep learningdeep learning modeldeep sequencingdesignepigenomeepigenomicsexperimental studyfollow-upgenome wide association studyin silicoin vivo Modelmachine learning modelmodel developmentnovelonline resourceoutreachpredictive modelingpublic repositoryrepositorysequence learningsyntaxtooltraittranscription factor
项目摘要
Project Summary/Abstract
The Human Genome Project (HGP) completed the first draft human genome sequence two decades ago. The
HGP revealed that human complexity arises from only approximately 20,000 coding genes, roughly the same
number as much simpler organisms such as nematodes. Intricate patterns of transcriptional regulation mediated
by non-coding regulatory elements specify the myriad cell types and states required for human complexity.
Genome-wide association studies have subsequently identified thousands of disease-associated variants, many
of which interrupt the function of these non-coding elements to disrupt transcriptional regulation. Thus, in order
to better understand human physiology and pathophysiology, comprehensive atlases of regulatory elements are
essential. Many previous efforts, including the International Human Epigenome Consortium (IHEC), the
FANTOM Consortium, the Roadmap Epigenomics Project, and the ENCODE Project, have aimed to build
comprehensive collections of regulatory elements, as well as computational models to better predict regulatory
activity and understand the sequence features underlying regulatory function. ENCODE (2003-2022) is a large-
scale consortium effort which aims to annotate every functional non-coding element of the human genome;
during our work on the project, we built a Registry of approximately 1 million human candidate cis-regulatory
elements (cCREs). We further developed deep-learning approaches which model the transcription factor motif
syntax that underlies element function at base-pair resolution and built two web-based resources, SCREEN and
Factorbook, to make our results accessible to the scientific community. Here, we propose to extend this
framework to build the Community Resource for Transcriptional Regulation (CRTR), a comprehensive atlas of
non-coding regulatory elements and machine-learning models which will encompass community and consortium
deep-sequencing data, both bulk and single cell, across a broad array of cell types and states. Our project has
five aims. First, we aim to curate community and consortium data for inclusion in CRTR and perform uniform
processing and quality control. Second, we aim to train deep-learning sequence models on bulk epigenetic
datasets to identify transcription factor motif syntax driving regulatory element activity in distinct tissues and cell
types. Third, we aim to train sequence models on single cell datasets to identify transcription factor motif syntax
driving transcriptional regulation in high-resolution cell states and during cell state transitions. Fourth, we aim to
use the aforementioned results to build comprehensive benchmark datasets and machine-learning model
collections, which will aid future analysts in designing new models to predict regulatory readouts. Fifth, we aim
to build a state-of-the-art web-based user interface to enable users to perform integrative analyses and in silico
experimentation with CRTR, and hold workshops and other outreach to maximize the impact of the resource and
its accessibility to the broader scientific community.
项目概要/摘要
人类基因组计划(HGP)在二十年前完成了第一份人类基因组序列草案。这
HGP 揭示人类的复杂性仅源自大约 20,000 个编码基因,大致相同
数量与更简单的生物一样多,例如线虫。介导的转录调控的复杂模式
通过非编码调节元件指定人类复杂性所需的无数细胞类型和状态。
全基因组关联研究随后发现了数千种与疾病相关的变异,其中许多是
其中中断这些非编码元件的功能以破坏转录调节。因此,为了
为了更好地了解人类生理学和病理生理学,调节元件的综合图集
基本的。之前的许多努力,包括国际人类表观基因组联盟 (IHEC)、
FANTOM 联盟、Roadmap Epigenomics 项目和 ENCODE 项目旨在建立
监管要素的全面集合以及更好地预测监管的计算模型
活性并了解调节功能背后的序列特征。 ENCODE(2003-2022)是一个大型的
规模联盟的努力,旨在注释人类基因组的每个功能非编码元件;
在我们的项目工作期间,我们建立了一个包含大约 100 万人类候选顺式监管的注册库
元素(cCRE)。我们进一步开发了模拟转录因子基序的深度学习方法
语法是碱基对解析时元素功能的基础,并构建了两个基于 Web 的资源,SCREEN 和
Factorbook,让科学界能够获取我们的结果。在此,我们建议延长此规定
建立转录调控社区资源(CRTR)的框架,这是一个综合的图集
非编码监管要素和机器学习模型,其中包括社区和联盟
跨多种细胞类型和状态的批量和单细胞深度测序数据。我们的项目有
五个目标。首先,我们的目标是整理社区和联盟数据以纳入 CRTR 并执行统一
加工和质量控制。其次,我们的目标是训练大量表观遗传的深度学习序列模型
用于识别驱动不同组织和细胞中调节元件活性的转录因子基序语法的数据集
类型。第三,我们的目标是在单细胞数据集上训练序列模型以识别转录因子基序语法
在高分辨率细胞状态和细胞状态转换期间驱动转录调节。第四,我们的目标是
使用上述结果构建全面的基准数据集和机器学习模型
集合,这将帮助未来的分析师设计新模型来预测监管读数。五、我们的目标
构建最先进的基于网络的用户界面,使用户能够执行综合分析和计算机模拟
与 CRTR 进行实验,并举办研讨会和其他外展活动,以最大限度地发挥资源和
更广泛的科学界的可及性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Anshul Kundaje其他文献
Anshul Kundaje的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Anshul Kundaje', 18)}}的其他基金
Multi-Omics DACC: The Data Analysis and Coordination Center for the collaborative multi-omics for health and disease initiative
多组学 DACC:健康和疾病协作多组学计划的数据分析和协调中心
- 批准号:
10744561 - 财政年份:2023
- 资助金额:
$ 80.94万 - 项目类别:
A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
- 批准号:
10411262 - 财政年份:2022
- 资助金额:
$ 80.94万 - 项目类别:
A Comprehensive Genomic Community Resource of Transcriptional Regulation
转录调控的综合基因组群落资源
- 批准号:
10842047 - 财政年份:2022
- 资助金额:
$ 80.94万 - 项目类别:
Identifying causal genetic variants and molecular mechanisms impacting mental health
识别影响心理健康的因果遗传变异和分子机制
- 批准号:
10116649 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Identifying causal genetic variants and molecular mechanisms impacting mental health
识别影响心理健康的因果遗传变异和分子机制
- 批准号:
10380573 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10297562 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Multi-omic functional assessment of novel AD variants using high-throughput and single-cell technologies
使用高通量和单细胞技术对新型 AD 变体进行多组学功能评估
- 批准号:
10436207 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10474459 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Predicting context-specific molecular and phenotypic effects of genetic variation through the lens of the cis-regulatory code
通过顺式调控密码的视角预测遗传变异的特定背景分子和表型效应
- 批准号:
10659170 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
Identifying causal genetic variants and molecular mechanisms impacting mental health
识别影响心理健康的因果遗传变异和分子机制
- 批准号:
10571911 - 财政年份:2021
- 资助金额:
$ 80.94万 - 项目类别:
相似国自然基金
随机阻尼波动方程的高效保结构算法研究
- 批准号:12301518
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大规模黎曼流形稀疏优化算法及应用
- 批准号:12371306
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于任意精度计算架构的量子信息处理算法硬件加速技术研究
- 批准号:62304037
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
基于物理信息神经网络的雷达回波资料反演蒸发波导算法研究
- 批准号:42305048
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Precision Medicine Digital Twins for Alzheimer’s Target and Drug Discovery and Longevity
用于阿尔茨海默氏症靶点和药物发现及长寿的精准医学数字孪生
- 批准号:
10727793 - 财政年份:2023
- 资助金额:
$ 80.94万 - 项目类别:
Cell type harmonization of single cell data in HuBMAP and GTEx
HuBMAP 和 GTEx 中单细胞数据的细胞类型协调
- 批准号:
10777089 - 财政年份:2023
- 资助金额:
$ 80.94万 - 项目类别:
New software tools for differential analysis of single-cell genomics perturbation experiments
用于单细胞基因组扰动实验差异分析的新软件工具
- 批准号:
10735033 - 财政年份:2023
- 资助金额:
$ 80.94万 - 项目类别: