III: Medium: Detecting Low Dimensional Structures in Genomic Data
III:中:检测基因组数据中的低维结构
基本信息
- 批准号:1705197
- 负责人:
- 金额:$ 119.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-08-15 至 2022-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
New sequencing technologies have made genomics a big data science. These data have complexity and represent many variables. In trying to get biological information from genomic sequence, it is often necessary to reduce the complexity. There are a number of different approaches to use computationally, but these often introduce errors because of assumptions made about the data. This project will lead to the development of novel approaches specific to the type of genomic data collected. One of these types of data represents the DNA sequence and the other comes from natural modifications to the sequence when genes are expressed. These new methods will identify important differences more accurately in the two data types by correctly modeling unique properties of these data in a statistical framework. Methods developed during this project will have a great impact on the genomics field, where researchers may discover the genetic basis of complex diseases. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the novel methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.Discovering a low dimensional structure from the high dimensional genomic data is a very important procedure in genomic studies because this structure may infer unknown confounding factors in genomic data as well as other important properties of data such as ethnicity of individuals. There are several dimensionality reduction methods prevalently used in the genomics, they may not generate an accurate low dimensional structure from genomic data because their underlying assumption on the statistical model is often violated in the data. This project proposes to develop dimensionality reduction methods aimed for genomic data, especially for methylation and genotype data. These methods will incorporate unique properties present in genomic data such as the discrete nature and correlation structure of genotype data, and different methylation patterns across different cell types and tissues. This project will also analyze asymptotic behavior of the novel methods using random matrix theory. Three strategies will be used to validate the methods. First, for all genomics applications, there are datasets where there is gold standard information, Second, simulated data based on current practices in the genomics community will be used to perform evaluate genomics applications. For example, it is standard in the community to simulate the genetics of admixed individuals by combining the genotypes of individuals of known ancestry from a reference dataset such as the 1000 Genomes project. Third, the team will evaluate the general algorithms by generating simulated data using various generative models to validate that the algorithms have the asymptotic behavior expected and also examine how these algorithms perform when their assumptions are violated. The methods will contribute both to the statistical field by improving current low dimensionality methods and to the genomics field by releasing software tools. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.
新的测序技术使基因组学成为大数据科学。这些数据具有复杂性并表示许多变量。 在尝试从基因组序列获取生物学信息时,通常有必要降低复杂性。有许多不同的方法可以使用计算方法,但是由于对数据的假设,这些方法通常会引入错误。该项目将导致开发针对收集的基因组数据类型的新颖方法。 这些类型的数据代表DNA序列,另一个数据来自表达基因时对序列的自然修饰。这些新方法将通过在统计框架中正确对这些数据的唯一属性进行正确建模,从而在两种数据类型中更准确地识别重要差异。该项目期间开发的方法将对基因组学领域产生重大影响,研究人员可能会发现复杂疾病的遗传基础。该项目的更广泛的影响是对复杂疾病的遗传基础的更深入了解,通过公共网络服务器和用于学术研究和教育目的的软件工具分发新方法,并培训本科生,研究生和博士后学者。 In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.Discovering a low dimensional structure from the high dimensional genomic data is a very important procedure in genomic studies because this structure may infer unknown confounding factors in genomic data as well as other important properties of data such as ethnicity of individuals.基因组学中普遍使用了几种维度降低方法,它们可能不会从基因组数据中产生准确的低维结构,因为它们在统计模型上的基本假设经常在数据中违反。该项目建议开发针对基因组数据的降维方法,尤其是用于甲基化和基因型数据。这些方法将结合基因组数据中存在的独特特性,例如基因型数据的离散性质和相关结构,以及不同细胞类型和组织的不同甲基化模式。该项目还将使用随机矩阵理论分析新方法的渐近行为。将使用三种策略来验证方法。 首先,对于所有基因组学应用程序,有一些数据集,其中有黄金标准信息,其次,基于基因组学界当前实践的模拟数据将用于执行评估基因组学应用程序。 例如,在社区中,通过结合参考数据集(例如1000个基因组项目)中已知血统的个体的基因型来模拟混合个体的遗传学是标准的。 第三,团队将通过使用各种生成模型生成模拟数据来评估一般算法,以验证算法是否具有预期的渐近行为,并检查这些算法在违反假设时的性能。 这些方法将通过释放软件工具来改善当前的低维方法和基因组学领域来为统计领域做出贡献。该项目的更广泛影响是对复杂疾病的遗传基础的更深入了解,通过公共网络服务器和用于学术研究和教育目的的软件工具分发方法,并培训本科生,研究生和博士后学者。特别是,该项目将通过夏季密集计划为代表性不足的团体提供培训,该计划传统上招募了STEM领域中代表性不足的少数民族。
项目成果
期刊论文数量(28)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
- DOI:10.1038/s41598-020-67513-5
- 发表时间:2020-07-03
- 期刊:
- 影响因子:4.6
- 作者:Alvarez, Marcus;Rahmani, Elior;Pajukanta, Paivi
- 通讯作者:Pajukanta, Paivi
Leveraging allelic imbalance to refine fine-mapping for eQTL studies
- DOI:10.1371/journal.pgen.1008481
- 发表时间:2019-12-01
- 期刊:
- 影响因子:4.5
- 作者:Zou, Jennifer;Hormozdiari, Farhad;Eskin, Eleazar
- 通讯作者:Eskin, Eleazar
Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates.
人群分离株的扩展谱系中常见和罕见变异对双相情感障碍易感性的贡献。
- DOI:10.1038/s41398-020-0758-1
- 发表时间:2020
- 期刊:
- 影响因子:6.8
- 作者:Sul,JaeHoon;Service,SusanK;Huang,AldenY;Ramensky,Vasily;Hwang,Sun-Goo;Teshiba,TerriM;Park,YoungJun;Ori,AnilPS;Zhang,Zhongyang;Mullins,Niamh;OldeLoohuis,LoesM;Fears,ScottC;Araya,Carmen;Araya,Xinia;Spesny,Mitzi;Bejaran
- 通讯作者:Bejaran
ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest
- DOI:10.1371/journal.pcbi.1007556
- 发表时间:2019-12-01
- 期刊:
- 影响因子:4.3
- 作者:Li, Jiajin;Jew, Brandon;Sul, Jae Hoon
- 通讯作者:Sul, Jae Hoon
Stochasticity constrained by deterministic effects of diet and age drive rumen microbiome assembly dynamics
- DOI:10.1038/s41467-020-15652-8
- 发表时间:2020-04-20
- 期刊:
- 影响因子:16.6
- 作者:Furman, Ori;Shenhav, Liat;Mizrahi, Itzhak
- 通讯作者:Mizrahi, Itzhak
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eleazar Eskin其他文献
MEF: Malicious Email Filter - A UNIX Mail Filter That Detects Malicious Windows Executables
MEF:恶意电子邮件过滤器 - 检测恶意 Windows 可执行文件的 UNIX 邮件过滤器
- DOI:
- 发表时间:
2001 - 期刊:
- 影响因子:0
- 作者:
M. Schultz;Eleazar Eskin;E. Zadok;Manasi Bhattacharyya;Salvatore J. Stolfo - 通讯作者:
Salvatore J. Stolfo
Eleazar Eskin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eleazar Eskin', 18)}}的其他基金
III: Medium: Causal inference in biobanks: Leveraging genetics to infer causal relationships using electronic health records
III:中:生物库中的因果推断:利用电子健康记录利用遗传学来推断因果关系
- 批准号:
2106908 - 财政年份:2021
- 资助金额:
$ 119.97万 - 项目类别:
Continuing Grant
III:Small: Replication Studies for High Dimensional Data: Insights into Confounding and Heterogeneity
III:小:高维数据的复制研究:洞察混杂和异质性
- 批准号:
1910885 - 财政年份:2019
- 资助金额:
$ 119.97万 - 项目类别:
Continuing Grant
III: Small: Causal and Statistical Inference in the Presence of Confounding Factors
III:小:存在混杂因素时的因果和统计推断
- 批准号:
1320589 - 财政年份:2013
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
BSF:2012304:Methods for Preprocessing Population Sequence Data
BSF:2012304:群体序列数据的预处理方法
- 批准号:
1331176 - 财政年份:2013
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
III: Medium: Meta-analysis reinterpreted using causal graphs
III:中:使用因果图重新解释荟萃分析
- 批准号:
1302448 - 财政年份:2013
- 资助金额:
$ 119.97万 - 项目类别:
Continuing Grant
III: Medium: Private Identification of Relatives and Private GWAS: First Steps in the New Field of CryptoGenomics
III:媒介:亲属的私人身份识别和私人 GWAS:密码基因组学新领域的第一步
- 批准号:
1065276 - 财政年份:2011
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
III: Small: Inference of Causal Regulatory Relationships from Genetic Studies
III:小:从遗传研究中推断因果调节关系
- 批准号:
0916676 - 财政年份:2009
- 资助金额:
$ 119.97万 - 项目类别:
Continuing Grant
Collaborative Research: Design and Analysis of Compressed Sensing DNA Microarrays
合作研究:压缩传感 DNA 微阵列的设计和分析
- 批准号:
0729049 - 财政年份:2007
- 资助金额:
$ 119.97万 - 项目类别:
Continuing Grant
Collaborative Research: SEIII: Estimating Haplotype Frequencies
合作研究:SEIII:估计单倍型频率
- 批准号:
0731455 - 财政年份:2007
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
Collaborative Research: SEIII: Estimating Haplotype Frequencies
合作研究:SEIII:估计单倍型频率
- 批准号:
0513612 - 财政年份:2005
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
相似国自然基金
复合低维拓扑材料中等离激元增强光学响应的研究
- 批准号:12374288
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
基于管理市场和干预分工视角的消失中等企业:特征事实、内在机制和优化路径
- 批准号:72374217
- 批准年份:2023
- 资助金额:41.00 万元
- 项目类别:面上项目
托卡马克偏滤器中等离子体的多尺度算法与数值模拟研究
- 批准号:12371432
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
中等质量黑洞附近的暗物质分布及其IMRI系统引力波回波探测
- 批准号:12365008
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
中等垂直风切变下非对称型热带气旋快速增强的物理机制研究
- 批准号:42305004
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
SaTC: CORE: Medium: After the Breach: Detecting Lateral Movement, Reconnaissance, and Exfiltration in Enterprise Networks
SaTC:核心:中:违规后:检测企业网络中的横向移动、侦察和渗透
- 批准号:
2152644 - 财政年份:2022
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
- 批准号:
1955797 - 财政年份:2020
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
- 批准号:
1955883 - 财政年份:2020
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
- 批准号:
1955939 - 财政年份:2020
- 资助金额:
$ 119.97万 - 项目类别:
Standard Grant
CPS: Medium: Detecting and Controlling Unwanted Data Flows in the Internet of Things
CPS:中:检测和控制物联网中不需要的数据流
- 批准号:
1953740 - 财政年份:2019
- 资助金额:
$ 119.97万 - 项目类别:
Cooperative Agreement