A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
基本信息
- 批准号:9096211
- 负责人:
- 金额:$ 45.09万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-07-01 至 2018-06-30
- 项目状态:已结题
- 来源:
- 关键词:AfricaAfricanAlgorithmsAmericasArchitectureAsthmaAuthorization documentationBacteriaCaribbean regionCatalogingCatalogsCentral AmericaCommunitiesComputational algorithmComputer softwareDNA SequenceDNA Sequence DatabasesDataData AnalysesData SetDatabasesDevicesDiseaseGenesGenetic VariationGenetic studyGenomeGenomicsGoalsHealthHumanHuman GenomeHypersensitivityIndividualInvestigationLengthLicensingLifeLocationMapsMethodsModelingMutationMutation DetectionNational Heart, Lung, and Blood InstituteNucleic Acid Regulatory SequencesNucleotidesPopulationProcessProtocols documentationRNA SplicingReadingResearch PersonnelResourcesRetrievalRiskSchemeScientistSepsisSequence AlignmentSiteSoftware FrameworkSoftware ToolsSouth AmericaSpeedSystemTimeUnited StatesVariantbasedata sharingdatabase of Genotypes and Phenotypesdeep sequencingdesignfusion genegene discoverygenome databasegenome sequencinghigh riskhuman subjectindexinginterestmicrobialnext generation sequencingnovelopen sourcepreventprogramsreference genomesample collectionsearch enginesoftware developmentterabytetooltrait
项目摘要
DESCRIPTION (provided by applicant): We propose to create new software and analysis methods designed to make possible the exploration of a unique dataset, the 1,004 genomes sequenced by the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA). The size of this dataset, over 130 Terabytes, currently prevents it from being explored with alignment-based tools, and researchers instead are limited to using the much smaller files containing single-nucleotide variants. Our proposed software will make this dataset and others like it available for real- time searching, a capability that is not yet possible for any genomic database of this size. Since the early 1990s, scientists have used DNA sequence databases to study a wide range of problems, including novel gene discovery, mutation detection, the investigation of larger structural variants, and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the vast size of the CAAPA dataset makes it impossible to search the data itself using current tools. One cannot look for specific mutations, extract and re-analyze data for any particular gene or regulatory region, or look for structural variants. Newer, fast next-generation sequence alignment programs such as Bowtie, originally developed in our group, allow far faster alignment of NGS reads to the genome, but even these programs cannot search data on the scale of CAAPA in real time. Different architectures need to be designed and built to accommodate these very large datasets. The CAAPA exploration system (CESYS) will use a combination of a highly efficient database, very fast storage, and fast search algorithms to achieve our goals. This project aims to accomplish several goals that will dramatically enhance the value of CAAPA. First, the data will be made available to a very large community of researchers, who can use it not only to study the genetics of asthma and allergy in the CAAPA populations, but also to compare these subjects to other groups. The data currently resides on hard drives and is available only to a small number of the project's PIs, a situation that limits its value. Second, b creating an authentication system consistent with dbGaP, we will create a data sharing model that other projects can use and that will remove some of the technical barriers to sharing genome data from human subjects. Third, as part of building the database, we will re-call all the SNPs using the newly released human genome build (hg20), creating a consistent set of variants that we will also share freely through the project database. Fourth, we will identify all bacterial contaminants, including those in a subset of subjects known to have bloodstream infections at the time of sample collection. Fifth, we will identify structural variants unique to he CAAPA population, which we can then explore for any association with the risk of asthma.
描述(由申请人提供):我们建议创建新的软件和分析方法,旨在使探索独特的数据集成为可能,该数据集是由美洲非洲裔哮喘联盟 (CAAPA) 测序的 1,004 个基因组的大小。该数据集超过 130 TB,目前无法使用基于比对的工具对其进行探索,研究人员只能使用包含单核苷酸变体的小得多的文件。所提出的软件将使该数据集和其他类似数据集可用于实时搜索,这种能力对于任何这种规模的基因组数据库来说都是不可能的。自 20 世纪 90 年代初以来,科学家们已经使用 DNA 序列数据库来研究广泛的问题。 ,包括新的基因发现、突变检测、更大的结构变异的研究和进化过程,人们长期以来一直认为能够使用 BLAST 和类似程序搜索所有已知的基因和基因组,并且世界上的序列搜索引擎提供了这种能力。 , 的巨大尺寸CAAPA 数据集使得人们无法使用当前的工具来搜索数据本身,无法提取和重新分析任何特定基因或调控区域的数据,也无法寻找更新、快速的下一代序列比对程序。例如我们小组最初开发的 Bowtie,可以更快地将 NGS 读数与基因组进行比对,但即使这些程序也无法实时搜索 CAAPA 规模的数据,需要设计和构建不同的架构来容纳这些非常大的数据。 CAAPA 探索数据集。系统(CESYS)将结合使用高效数据库、快速存储和快速搜索算法来实现我们的目标。该项目旨在实现几个目标,从而显着提高 CAAPA 的价值。可供大量研究人员使用,他们不仅可以使用它来研究 CAAPA 人群中的哮喘和过敏遗传学,还可以将这些受试者与其他群体进行比较。数据目前驻留在硬盘上,仅供参考。该项目的一小部分PI,这种情况限制了它的价值。其次,通过创建一个与 dbGaP 一致的认证系统,我们将创建一个其他项目可以使用的数据共享模型,这将消除共享人类受试者基因组数据的一些技术障碍。作为构建数据库的一部分,我们将使用新发布的人类基因组构建 (hg20) 重新调用所有 SNP,创建一组一致的变体,我们也将通过项目数据库免费共享这些变体。细菌污染物,包括第五,我们将识别 CAAPA 人群特有的结构变异,然后我们可以探索其与哮喘风险的任何关联。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kathleen C Barnes其他文献
Kathleen C Barnes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kathleen C Barnes', 18)}}的其他基金
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10378108 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10077882 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10094181 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
9522470 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10331294 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
New Approaches for Empowering Studies of Asthma in Populations of African Descent
非洲人后裔哮喘研究的新方法
- 批准号:
9256781 - 财政年份:2016
- 资助金额:
$ 45.09万 - 项目类别:
A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
- 批准号:
9301024 - 财政年份:2015
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9230688 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
8798769 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9244716 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
相似海外基金
mAnaging siCkle CELl disease through incReased AdopTion of hydroxyurEa in Nigeria (ACCELERATE)
在尼日利亚通过增加羟基脲的使用来控制镰状细胞病(加速)
- 批准号:
10638598 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Development and implementation of a pediatric AI multi-modal digital stethoscope and respiratory surveillance system in South Africa
在南非开发和实施儿科人工智能多模态数字听诊器和呼吸监测系统
- 批准号:
10740943 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa
利用基于人工智能/机器学习的技术克服非洲结直肠癌诊断和预测的专业培训和技术障碍
- 批准号:
10712793 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Clinical decision support algorithm to optimize management of respiratory tract infection in children attending primary health facilities in Kilimanjaro Region, Tanzania
用于优化坦桑尼亚乞力马扎罗地区初级卫生机构儿童呼吸道感染管理的临床决策支持算法
- 批准号:
10734148 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Computer Vision for Malaria Microscopy: Automated Detection and Classification of Plasmodium for Basic Science and Pre-Clinical Applications
用于疟疾显微镜的计算机视觉:用于基础科学和临床前应用的疟原虫自动检测和分类
- 批准号:
10576701 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别: