AF: Small: Redundancy exploiting algorithms for high throughput genomics
AF:小:利用冗余算法实现高通量基因组学
基本信息
- 批准号:1619081
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-08-01 至 2020-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Determining the genomic makeup of individuals is crucial for understanding how certain genomic variants ultimately lead to disease (such as cancer). Determining genomic makeup of agriculturally important plants, trees, farm animals and wild life help improve agriculture, forestry, veterinary medicine and environmental science. Since the introduction of "next generation sequencing technologies" in 2008, the cost of genome sequencing has dropped by a factor of 1000. This has led to an increase in the speed genomic data is generated that far outpaces the improvements in our computing and data storage capability. With the advent of these cheap, and fast genome sequencing technologies, the scientific community has been able to launch mega-projects such as The Pan Cancer Analysis of Whole Genomes Project, which aim to determine the genome sequences of thousands of cancer patients. Our project aims to address the imminent data size challenges in these large scale genomic studies through new genomic data compression methods that aim to reduce the redundancy in how genomic sequences are represented. The source of this redundancy is the high similarity among genome sequences of individual patients, as well as the high similarity between regions across the genome of a single human genome. Since the main difficulty in extracting information from genome sequences is computational, reduction in the computational resources needed to manage and analyze genomic data through the compression methods will help genomics improve human life and the environment. The impact of this project on student and personnel training will be in terms of two new graduate courses at Indiana University: a course on data management, access and processing for genomic data by PI Sahinalp, and a course on compressed algorithms with a focus on genomic data, emphasizing the effects of new big data paradigms compression, by PI Ergun. Both courses will fit into the CS PhD program, as well as into the existing Bioinformatics and Data Science Master's programs; they are also intended to attract the more curious undergraduates.The rapid advancement of nucleic acid sequencing technology has re-shaped almost every field of life science, from agriculture to bioenergy, and from environmental science to biomedicine. Large-scale genome projects are producing petabyte-scale data from thousands of patients or by mobile sensors collecting environmental samples. As the technology marches forward, most people who visit hospitals will eventually have their (possibly tissue-specific) genomes sequenced. Genomic data will be collected from thousands to millions of non-model organisms and their populations in order to assess the biodiversity within the corresponding ecosystem. Complex microbial communities will be sampled from thousands of geographic locations to study the influence of environmental conditions. Furthermore, these studies will involve continuous data collection efforts, for the purpose of monitoring the dynamic changes in biosystems by the use of genome-wide or transcriptome-wide sequencing. As a result, genomic data generation is to occur at an unprecedented pace, necessitating the development of novel algorithms to help reduce the burden of genomic sequence data on computational, storage and transmission systems. This project combines the unique strengths of the two investigators at Indiana University, bringing a principled, algorithmic approach to critical infrastructure problems in genomics. The project will address the needs of the next stage of genomic data generation by mega cancer projects, portable devices collecting environmental samples, and even smaller sensors to be embedded in the human body, through the use of new compression tools and compressed data structures for communicating, storing, managing, and accessing large collections of (streaming) genome data. For this purpose, we will employ and expand the existing algorithmic repertoire involving approximation algorithms, sublinear algorithms, lossless data compression, I/O efficient, memory hierarchy aware/oblivious and compressed data structures.
确定个体的基因组构成对于了解某些基因组变异如何最终导致疾病(例如癌症)至关重要。确定农业上重要的植物、树木、农场动物和野生动物的基因组组成有助于改善农业、林业、兽医学和环境科学。自2008年推出“下一代测序技术”以来,基因组测序的成本下降了1000倍。这导致基因组数据生成速度的提高,远远超过了我们计算和数据存储的改进能力。随着这些廉价、快速的基因组测序技术的出现,科学界已经能够启动大型项目,例如全基因组泛癌症分析项目,旨在确定数千名癌症患者的基因组序列。我们的项目旨在通过新的基因组数据压缩方法来解决这些大规模基因组研究中迫在眉睫的数据大小挑战,这些方法旨在减少基因组序列表示方式的冗余。这种冗余的根源在于个体患者的基因组序列之间的高度相似性,以及单个人类基因组的基因组区域之间的高度相似性。由于从基因组序列中提取信息的主要困难在于计算,因此通过压缩方法减少管理和分析基因组数据所需的计算资源将有助于基因组学改善人类生活和环境。该项目对学生和人员培训的影响将体现在印第安纳大学的两门新研究生课程上:一门由 PI Sahinalp 开设的关于基因组数据的数据管理、访问和处理的课程,以及一门以基因组为重点的压缩算法课程。数据,强调新的大数据范式压缩的效果,作者:PI Ergun。这两门课程都将适合计算机科学博士课程以及现有的生物信息学和数据科学硕士课程;它们也旨在吸引更好奇的本科生。核酸测序技术的快速进步已经重塑了生命科学的几乎每个领域,从农业到生物能源,从环境科学到生物医学。大规模基因组项目正在从数千名患者或通过收集环境样本的移动传感器产生拍字节级数据。随着技术的进步,大多数去医院的人最终都会对其(可能是组织特异性的)基因组进行测序。将从数千至数百万个非模式生物及其种群中收集基因组数据,以评估相应生态系统内的生物多样性。将从数千个地理位置对复杂的微生物群落进行采样,以研究环境条件的影响。此外,这些研究将涉及持续的数据收集工作,目的是通过使用全基因组或全转录组测序来监测生物系统的动态变化。因此,基因组数据的生成将以前所未有的速度发生,需要开发新的算法来帮助减轻基因组序列数据对计算、存储和传输系统的负担。该项目结合了印第安纳大学两位研究人员的独特优势,为基因组学中的关键基础设施问题带来了原则性的算法方法。该项目将通过使用新的压缩工具和压缩数据结构进行通信,满足大型癌症项目、收集环境样本的便携式设备、甚至嵌入人体的较小传感器的下一阶段基因组数据生成的需求,存储、管理和访问大量(流式)基因组数据。为此,我们将采用和扩展现有的算法库,包括近似算法、次线性算法、无损数据压缩、I/O 效率、内存层次感知/忽略和压缩数据结构。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Qin Zhang其他文献
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
多臂强盗中有效沟通的协作遗憾最小化
- DOI:
10.1109/cdc45484.2021.9683253 - 发表时间:
2021-12-14 - 期刊:
- 影响因子:0
- 作者:
Nikolai Karpov;Qin Zhang - 通讯作者:
Qin Zhang
False-positive Transesophageal Echocardiography after False-positive Computed Tomography Angiography in Suspected Type A Aortic Dissection.
疑似 A 型主动脉夹层的计算机断层扫描血管造影假阳性后经食管超声心动图假阳性。
- DOI:
10.1097/aln.0000000000002347 - 发表时间:
2018-11-01 - 期刊:
- 影响因子:8.8
- 作者:
E. Gologorsky;Qin Zhang;Angela Gologorsky - 通讯作者:
Angela Gologorsky
Identification of novel rheumatoid arthritis-associated MiRNA-204-5p from plasma exosomes
从血浆外泌体中鉴定新型类风湿性关节炎相关的 miRNA-204-5p
- DOI:
10.1038/s12276-022-00751-x - 发表时间:
2022-03-01 - 期刊:
- 影响因子:12.8
- 作者:
Long;Qin Zhang;X. Mo;Jun Lin;Yang Wu;Xin Lu;P. He;Jian Wu;Yufan Guo;Ming;W. Ren;H. Deng;S. Lei;F. Deng - 通讯作者:
F. Deng
Thoracic ultrasound-guided real-time pleural biopsy in the diagnosis of pleural diseases: a systematic review and meta-analysis
胸部超声引导下实时胸膜活检诊断胸膜疾病:系统评价和荟萃分析
- DOI:
10.1080/17476348.2023.2266377 - 发表时间:
2023-09-02 - 期刊:
- 影响因子:3.9
- 作者:
Qin Zhang;Ming;Xue;Ye Lu;Gang Hou - 通讯作者:
Gang Hou
Rare presentation of immunoglobulin A vasculitis as acute pancreatitis in a 10‐year‐old girl
10 岁女孩中罕见的免疫球蛋白 A 血管炎表现为急性胰腺炎
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:1.7
- 作者:
Li Wang;Shen;Fang Deng;Qin Zhang;Ling Lu;Jing;Yao Xu - 通讯作者:
Yao Xu
Qin Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Qin Zhang', 18)}}的其他基金
Collaborative Research: AF: Small: Parallel Reinforcement Learning with Communication and Adaptivity Constraints
协作研究:AF:小型:具有通信和适应性约束的并行强化学习
- 批准号:
2006591 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER:Foundation of Communication-Efficient Distributed Computation and Monitoring
职业:通信高效的分布式计算和监控的基础
- 批准号:
1844234 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
BIGDATA: Collaborative Research: F: Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics
BIGDATA:协作研究:F:流行病学和传染动力学中大规模图问题的高效分布式计算
- 批准号:
1633215 - 财政年份:2016
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
AF: Small: Efficient Algorithms for Querying Noisy Distributed/Streaming Datasets
AF:小:查询嘈杂分布式/流数据集的高效算法
- 批准号:
1525024 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似国自然基金
小分子代谢物Catechin与TRPV1相互作用激活外周感觉神经元介导尿毒症瘙痒的机制研究
- 批准号:82371229
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
DHEA抑制小胶质细胞Fis1乳酸化修饰减轻POCD的机制
- 批准号:82301369
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
异常激活的小胶质细胞通过上调CTSS抑制微血管特异性因子MFSD2A表达促进1型糖尿病视网膜病变的免疫学机制研究
- 批准号:82370827
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
SETDB1调控小胶质细胞功能及参与阿尔茨海默病发病机制的研究
- 批准号:82371419
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
PTBP1驱动H4K12la/BRD4/HIF1α复合物-PKM2正反馈环路促进非小细胞肺癌糖代谢重编程的机制研究及治疗方案探索
- 批准号:82303616
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Identification of small molecule inhibitors to exonuclease 1 for breast cancer treatment
鉴定用于乳腺癌治疗的核酸外切酶 1 小分子抑制剂
- 批准号:
10735307 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
NSF-BSF: FET: Small: Redundancy for Storage in the Edge
NSF-BSF:FET:小型:边缘存储的冗余
- 批准号:
2120262 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Cerebral Spinal Fluid Shunt System with Dual Lumen Distal Catheter Redundancy to Minimize Revision Surgery
具有双腔远端导管冗余的脑脊液分流系统,可最大限度地减少翻修手术
- 批准号:
10169627 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
CIF: Small: Collaborative Research: Error Correction with Natural Redundancy
CIF:小型:协作研究:利用自然冗余进行纠错
- 批准号:
1718886 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CIF: Small: Collaborative Research: Error Correction with Natural Redundancy
CIF:小型:协作研究:利用自然冗余进行纠错
- 批准号:
1717884 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Standard Grant