Statistical methods for higher order dependences to understand protein functions

用于了解蛋白质功能的高阶依赖性统计方法

基本信息

项目摘要

This proposal brings together a strong team from molecular science and statistics to tackle the important problem of how to integrate protein structure and sequence information in complex systems. Some of the most important characteristics of these data are the strong correlations buried within them, with the pairwise correlations in the sequence data already being routinely used to predict structural contacts. Here, we are developing novel ways to use huge data sets to extract higher-order dependences, which are now possible with the availability of the large volumes of sequence data from genomics; and in addition, in the molecular structures such higher-order dependences are directly observable in the protein structures where groups of amino acids interact directly. Importantly, these higher-order dependences reflect the dense physical environment in the cell that requires for proper statistical characterization. A new model free information-theoretic measure is introduced to quantify the higher-order dependences, which serves as the central method in this project. By identifying the major challenges in drawing statistical inference based on this measure, we develop, evaluate, and improve a new statistical inference and computational framework for analyses of higher-order dependences with discrete data of a general type, motivated by the protein multiple sequence data. The new computationally efficient framework makes it possible to discover reliable higher-order dependences with the ability of quantifying uncertainty. The preliminary data here combine the information from sequences and structures to yield unexpected results that immediately relate to the dynamics of the protein structures. The outcome is an entirely new approach to handle the large volumes of protein sequence data and other omics data now available and the enormous volumes about to arrive on the doorsteps of omics analysts.
This proposal brings together a strong team from molecular science and statistics to tackle the important problem of how to integrate protein structure and sequence information in complex systems.一些 most important characteristics of these data are the strong correlations buried within them, with the pairwise correlations in the sequence data already being routinely used to predict structural contacts.这里, we are developing novel ways to use huge data sets to extract higher-order dependences, which are now possible with the availability of the large volumes of sequence data from genomics; and in addition, in the molecular structures such higher-order dependences are directly observable in the protein structures where groups of amino acids interact directly. Importantly, these higher-order dependences reflect the dense physical environment in the cell that requires for proper statistical characterization. A new model free information-theoretic measure is introduced to quantify the higher-order dependences, which serves as the central method in this project. By identifying the major challenges in drawing statistical inference based on this measure, we develop, evaluate, and improve a new statistical inference and computational framework for analyses of higher-order dependences with discrete data of a general type, motivated by the protein multiple sequence data. The new computationally efficient framework makes it possible to discover reliable higher-order dependences with the ability of quantifying uncertainty. The preliminary data here combine the information from sequences and structures to yield unexpected results that immediately relate to the dynamics of the protein structures. The outcome is an entirely new approach to handle the large volumes 现在可用的蛋白质序列数据和其他OMIC数据以及即将到达的大量卷 the doorsteps of omics analysts.

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据

数据更新时间:2024-06-01

Wen Zhou的其他基金

ConProject-001
ConProject-001
  • 批准号:
    10707337
    10707337
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
ConProject-001
ConProject-001
  • 批准号:
    10492724
    10492724
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Statistical methods for higher order dependences to understand protein functions
用于了解蛋白质功能的高阶依赖性统计方法
  • 批准号:
    10378307
    10378307
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Statistical methods for higher order dependences to understand protein functions
用于了解蛋白质功能的高阶依赖性统计方法
  • 批准号:
    10707332
    10707332
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
ConProject-002
ConProject-002
  • 批准号:
    10492725
    10492725
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
ConProject-002
ConProject-002
  • 批准号:
    10707338
    10707338
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:

相似国自然基金

基于祖先序列重构的D-氨基酸解氨酶的新酶设计及分子进化
  • 批准号:
    32271536
  • 批准年份:
    2022
  • 资助金额:
    54.00 万元
  • 项目类别:
    面上项目
模板化共晶聚合合成高分子量序列聚氨基酸
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
模板化共晶聚合合成高分子量序列聚氨基酸
  • 批准号:
    22201105
  • 批准年份:
    2022
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
基于祖先序列重构的D-氨基酸解氨酶的新酶设计及分子进化
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
C-末端40个氨基酸插入序列促进细菌脂肪酸代谢调控因子FadR转录效率的机制研究
  • 批准号:
    82003257
  • 批准年份:
    2020
  • 资助金额:
    24 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Proteasomal recruiters of PAX3-FOXO1 Designed via Sequence-Based Generative Models
通过基于序列的生成模型设计的 PAX3-FOXO1 蛋白酶体招募剂
  • 批准号:
    10826068
    10826068
  • 财政年份:
    2023
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Nanopores for Processing Proteins
用于加工蛋白质的纳米孔
  • 批准号:
    10645984
    10645984
  • 财政年份:
    2023
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Integrative deep learning algorithms for understanding protein sequence-structure-function relationships: representation, prediction, and discovery
用于理解蛋白质序列-结构-功能关系的集成深度学习算法:表示、预测和发现
  • 批准号:
    10712082
    10712082
  • 财政年份:
    2023
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Data-driven, evolution-based design of proteins
数据驱动、基于进化的蛋白质设计
  • 批准号:
    10451529
    10451529
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别:
Data-driven, evolution-based design of proteins
数据驱动、基于进化的蛋白质设计
  • 批准号:
    10185231
    10185231
  • 财政年份:
    2021
  • 资助金额:
    $ 20.33万
    $ 20.33万
  • 项目类别: