Novel Use of Genome Information to Understand Mutations
利用基因组信息来理解突变的新方法
基本信息
- 批准号:10488281
- 负责人:
- 金额:$ 46.39万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-13 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:Adverse effectsAffectAmino Acid SequenceAmino Acid SubstitutionAmino AcidsArtificial IntelligenceBackBehaviorBindingBinding ProteinsBinding SitesCell physiologyCharacteristicsClinicalComplexComputer softwareDataData AnalysesDatabasesDetectionDiseaseDrug DesignDrug TargetingEnvironmentFamilyFoundationsFutureGenetic VariationGenomeGoldImpairmentIndividualKnowledgeLearningLiteratureMachine LearningMethodsMissense MutationMolecularMutationNatureNormal RangeOrganismOutcomeOutputPatientsPharmaceutical PreparationsPhenotypePositioning AttributeProbabilityPropertyProtein DynamicsProteinsQuantitative EvaluationsReportingScientific Advances and AccomplishmentsSequence AlignmentSeveritiesSiteSolidSpecificityStructureTestingTrainingTranslatingUrsidae FamilyVariantbasecostdata miningdesigndisease phenotypedrug developmentexperiencegenome sequencinghuman diseaseimprovedindividual patientindividualized medicineinnovationlarge datasetsmachine learning algorithmmutantnovelprotein functionprotein structurerandom forestscreeningtooluser friendly software
项目摘要
There are significant advantages from translating genome sequences into proteins, where there is a large body
of accumulated knowledge regarding their relationships among sequence, structure and function. Advances in
genome sequencing are producing a deluge of data that can be used to train and test prediction methods to
identify the characteristics of various mutants by building atop the large functional protein data. Clinicians
need to know the functional behavior of mutants - whether they are neutral or deleterious - whether they affect
protein structure – whether they affect protein dynamics - whether they affect protein binding specificity.
Protein structures have local environments for each amino acid in the sequence, and usually amino acids at
each position are compatible with their local environment. This leads to strongly correlated amino acids as
manifested in the multiple sequence alignments. This project will combine protein sequence and structure data
together with amino acid properties and their correlations to characterize each site in the protein structure to
investigate the hypothesis that outliers in the distributions over the important amino acid properties for each
position will negatively impact functionality, i.e. they will be deleterious mutants. The project will drill down
deeply to learn what is the nature of the impaired mechanism. Two diverse approaches will be taken in the two
aims: Aim 1 will investigate the amino acid property distributions to identify the properties that best characterize
each position in the sequence and structure, and determine how the outliers negatively impact the functional
structures, dynamics and binding characteristics. Preliminary results show that the deleterious mutants usually
have a significantly broader range of single amino acid properties for the deleterious mutants. Data from these
analyses will be fed into Aim 2 where two type of machine learning approaches – Extreme Learning Machines
and Random Forests will be jointly applied. Preliminary results show that incorporating just one amino acid
property yields significant gains over existing methods. One of the major strengths of this project is that results
from the two Aims will be exchanged frequently to achieve improved predictions for both approaches. The
project builds on the long experience of the PIs in datamining from protein structures and sequences, as well
as previous machine learning applications. Important potential outcomes include a more reliable, more
informed understanding of how mutants affect function. In addition, the project aims to predict connections of
mutants to specific diseases. The results of the project will be important for drug development, because the
specific part of the protein where function is impaired will be identified, to allow drug developers to narrow their
focus onto more limited parts of a protein that is targeted for drug design. The predictors established by this
project will also have the potential to screen for large numbers of previously unknown mutations that could be
used to identify specific regions of a protein structure susceptible to further disease-related mutations.
将基因组序列翻译成蛋白质具有显着的优势,因为蛋白质的体量很大
关于序列、结构和功能之间关系的积累的知识。
基因组测序正在产生大量数据,可用于训练和测试预测方法
临床医生通过构建大量功能蛋白数据来识别各种突变体的特征。
需要知道突变体的功能行为 - 它们是中性的还是有害的 - 它们是否影响
蛋白质结构 - 是否影响蛋白质动力学 - 是否影响蛋白质结合特异性。
蛋白质结构对于序列中的每个氨基酸都有局部环境,通常是位于
每个位置都与其当地环境相容,这导致氨基酸的强相关性。
体现在多重序列比对上。
结合氨基酸特性及其相关性来表征蛋白质结构中的每个位点
研究以下假设:每个重要氨基酸特性的分布中存在异常值
位置将对功能产生负面影响,即它们将是有害的突变体,该项目将深入研究。
深入了解受损机制的本质是什么,这两种方法将采取两种不同的方法。
目标:目标 1 将研究氨基酸特性分布,以确定最能表征的特性
序列和结构中的每个位置,并确定异常值如何对功能产生负面影响
结构、动力学和结合特征的初步结果表明,有害突变体通常是有害的。
来自这些有害突变体的数据具有更广泛的显着单氨基酸特性。
分析将被纳入目标 2,其中两种机器学习方法 – 极限学习机
和随机森林将联合应用,初步结果表明仅合并一种氨基酸。
该项目的主要优势之一是结果。
这两个目标的结果将经常交换,以改进这两种方法的预测。
该项目建立在 PI 在蛋白质结构和序列数据挖掘方面的长期经验的基础上
与之前的机器学习应用一样,重要的潜在成果包括更可靠、更可靠。
此外,该项目旨在预测突变体如何影响功能。
该项目的结果对于药物开发非常重要,因为
将识别出功能受损的蛋白质的特定部分,从而使药物开发人员能够缩小研究范围
重点关注药物设计所针对的蛋白质的更有限部分。
该项目还将有潜力筛选大量以前未知的突变,这些突变可能会被
用于识别蛋白质结构中易受进一步疾病相关突变影响的特定区域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
ROBERT L JERNIGAN其他文献
ROBERT L JERNIGAN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('ROBERT L JERNIGAN', 18)}}的其他基金
Novel Use of Genome Information to Understand Mutations
利用基因组信息来理解突变的新方法
- 批准号:
10303852 - 财政年份:2021
- 资助金额:
$ 46.39万 - 项目类别:
Novel Use of Genome Information to Understand Mutations
利用基因组信息来理解突变的新方法
- 批准号:
10661834 - 财政年份:2021
- 资助金额:
$ 46.39万 - 项目类别:
Modeling Ribosomal Control, Function and Assembly
核糖体控制、功能和组装建模
- 批准号:
7290378 - 财政年份:2006
- 资助金额:
$ 46.39万 - 项目类别:
Modeling Ribosomal Control, Function and Assembly
核糖体控制、功能和组装建模
- 批准号:
7486144 - 财政年份:2006
- 资助金额:
$ 46.39万 - 项目类别:
Modeling Ribosomal Control, Function and Assembly
核糖体控制、功能和组装建模
- 批准号:
7681539 - 财政年份:2006
- 资助金额:
$ 46.39万 - 项目类别:
Modeling Ribosomal Control, Function and Assembly
核糖体控制、功能和组装建模
- 批准号:
7149659 - 财政年份:2006
- 资助金额:
$ 46.39万 - 项目类别:
相似国自然基金
TiC-TiB2颗粒喷射成形原位合成及其对M2高速工具钢共晶碳化物形成与演化的影响
- 批准号:52361020
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
植被群落演替对河道水流结构和纵向离散特性影响机制研究
- 批准号:52309088
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
热带印度洋海表皮温日变化的数值模拟及对海气热通量的影响
- 批准号:42376002
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
SGO2/MAD2互作调控肝祖细胞的细胞周期再进入影响急性肝衰竭肝再生的机制研究
- 批准号:82300697
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
协同遥感和气候模型的城市高温热浪时空特征及其对热暴露影响研究
- 批准号:42371397
- 批准年份:2023
- 资助金额:46 万元
- 项目类别:面上项目
相似海外基金
Cellular Mechanisms of Neuroligin-4 Gene in Human Neurons
Neuroligin-4 基因在人类神经元中的细胞机制
- 批准号:
10367707 - 财政年份:2022
- 资助金额:
$ 46.39万 - 项目类别:
Cellular Mechanisms of Neuroligin-4 Gene in Human Neurons
Neuroligin-4 基因在人类神经元中的细胞机制
- 批准号:
10552576 - 财政年份:2022
- 资助金额:
$ 46.39万 - 项目类别:
Novel Use of Genome Information to Understand Mutations
利用基因组信息来理解突变的新方法
- 批准号:
10303852 - 财政年份:2021
- 资助金额:
$ 46.39万 - 项目类别:
Novel Use of Genome Information to Understand Mutations
利用基因组信息来理解突变的新方法
- 批准号:
10661834 - 财政年份:2021
- 资助金额:
$ 46.39万 - 项目类别:
A poly-omic study of the molecular mechanisms underlying maternal diet interventions for offspring obesity and NAFLD
母亲饮食干预对后代肥胖和 NAFLD 分子机制的多组学研究
- 批准号:
9903288 - 财政年份:2017
- 资助金额:
$ 46.39万 - 项目类别: