Computing persistent homology in biological datasets

计算生物数据集中的持久同源性

基本信息

项目摘要

Persistent homology (PH) is a tool from Topological Data Analysis that can determine robust topological features in a data set. We work on computing PH for a point-cloud data set, specifically in our case, a set of points in a two- or three-dimensional space. The topological features that we compute can be interpreted as holes in the point-cloud in two-dimensional space and as hollow polyhedra in three-dimensional space. After determining topological features that are significant, we aim to explain possible functional relevance of these features in the physical system underlying the data. Such an analysis might reveal mechanistic features of the system that are related to or driven by its spatial structure. A data set of interest to us is the genome-wide Hi-C interaction map at 1 kb resolution, that is comprised of around 3 million points when considered as a point-cloud. Significant topological features in this data set are indicative of genes that are far apart along the linear chromosome to be spatially close to each other in the folded genome. This might elucidate long-range genomic interaction and regulation, that arises as a feature of the folding of the chromosome. However, when computing the PH, the size of this data set turned out to be beyond the computational ability of pre-existing software packagesthey either ran out of memory or were running for hours before we manually interrupted them. To surpass this hurdle, we developed a novel algorithm that was able to process the same data set in under four minutes, using only 4 GB of memory. Further, we computed PH of human genome under two different experimental conditions, with and without auxin. Auxin is a molecule that impairs function of cohesin, which is a protein complex that has been observed to localize at anchors of chromatin loops in the DNA. The results showed a decrease in the number of significant topological features upon addition of auxin. This provides supporting evidence for the prevalent hypothesis that cohesin is integral for loop formation in the human genome. In general, we have shown that our algorithm to compute PH outperforms others in most cases and has an efficient balance between memory consumption and computation time. We call it Dory and make it available as a user-friendly Python package. We have submitted this work to a journal for review. After computing PH, we explore possible functional significance of the computed topological features. This requires determining a representative location or boundary of significant topological features in the point-cloud data set. However, this computation is not well-defined and the resulting locations are not geometrically precise. As a result, most analyses that use PH are limited to studying significance of topological features. To surmount this hurdle, we developed new strategies to compute representative boundaries with improved geometric precision. We used our tool to analyze arrangement of cells (alpha, delta, beta) in human pancreatic islets. It has been hypothesized that these cells are arranged such that there are clusters of beta cells that are surrounded by alpha-delta mantle to confer efficient signaling in the pancreatic endocrine system. We compared the possibility of such an arrangement between control and diabetic subjects. This was done by using our algorithm to compute locations of holes in the alpha-delta structure (experimental data from 2D slices of human pancreatic islets), and then counting beta-cells in them. Our results showed that, compared to control subjects, a higher percentage of diabetic subjects have a low percentage of beta-cells that are surrounded by an alpha-delta mantle. This can possibly support the alpha-delta mantle hypothesis by suggesting that disruption in this particular arrangement of cells might be a contributing factor in impairment of proper function of the endocrine system in diabetic subjects.
持续的同源性(PH)是拓扑数据分析的工具,可以确定数据集中的强大拓扑特征。我们致力于计算点云数据集的pH值,特别是在我们的情况下,是两维空间中的一组点。我们计算的拓扑特征可以解释为二维空间中点云中的孔,而在三维空间中的空心polyhedra。在确定了重要的拓扑特征之后,我们旨在解释数据基础的物理系统中这些特征的可能功能相关性。这样的分析可能揭示系统的机械特征与其空间结构相关或驱动。我们感兴趣的数据集是整个基因组HI-C相互作用图的分辨率为1 kb,当被视为点云时,约为300万点。该数据集中的重要拓扑特征表明沿线性染色体与折叠基因组中彼此相近的基因差异很大。这可能阐明了作为染色体折叠的特征而产生的远程基因组相互作用和调节。但是,当计算pH值时,此数据集的大小远远超出了预先存在的软件packagesthey的计算能力,或者在我们手动中断它们之前运行了数小时。为了超越这一障碍,我们开发了一种新型算法,该算法能够在不到四分钟的时间内使用4 GB的内存来处理相同的数据集。此外,我们在两个不同的实验条件下计算了人类基因组的pH,具有或没有生长素。生长素是一种损害粘着素功能的分子,它是一种蛋白质复合物,已观察到在DNA中染色质环的锚定位。结果表明,添加生长素后,重要的拓扑特征的数量减少。这为普遍的假设提供了支持的证据,即粘蛋白是人类基因组中环形形成不可或缺的一部分。通常,我们已经表明,在大多数情况下,我们的算法计算pH值优于其他算法,并且在内存消耗和计算时间之间具有有效的平衡。我们将其称为Dory,并将其作为用户友好的Python软件包提供。我们已将这项工作提交给日记以进行审查。计算pH值后,我们探讨了计算的拓扑特征的可能功能意义。这需要确定点云数据集中重要的拓扑特征的代表位置或边界。但是,此计算不是明确的,并且所得的位置在几何上也不精确。结果,大多数使用pH的分析仅限于研究拓扑特征的重要性。为了克服这一障碍,我们制定了新的策略,以改进的几何精度来计算代表性边界。我们使用我们的工具来分析人类胰岛中细胞的排列(Alpha,Delta,Beta)。已经假设将这些细胞排列,使得有许多被α-Delta套包围的β细胞簇,以在胰腺内分泌系统中赋予有效的信号传导。我们比较了控制受试者和糖尿病受试者之间这种安排的可能性。这是通过使用我们的算法来计算α-Delta结构中孔的位置(来自人类胰岛的2D切片的实验数据),然后在其中计数β细胞。我们的结果表明,与对照对象相比,较高的糖尿病患者比例较高,其β-tella包围的β细胞百分比较低。这可能会通过暗示这种特定细胞排列的破坏可能是糖尿病受试者中内分泌系统适当功能的损害的因素来支持α-delta地幔假说。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Tight basis cycle representatives for persistent homology of large biological data sets.
  • DOI:
    10.1371/journal.pcbi.1010341
  • 发表时间:
    2023-05
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Aggarwal, Manu;Periwal, Vipul
  • 通讯作者:
    Periwal, Vipul
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Vipul Periwal其他文献

Vipul Periwal的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Vipul Periwal', 18)}}的其他基金

Adipocyte development and insulin resistance
脂肪细胞发育和胰岛素抵抗
  • 批准号:
    7967147
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Single Cell Data Analysis Algorithms
单细胞数据分析算法
  • 批准号:
    9553307
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Liver regeneration after partial hepatectomy
部分肝切除术后的肝脏再生
  • 批准号:
    10697819
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Adipocyte development and insulin resistance
脂肪细胞发育和胰岛素抵抗
  • 批准号:
    7733953
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Single Cell Data Analysis Algorithms
单细胞数据分析算法
  • 批准号:
    10253772
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Inferring epidemic characteristics with networks
利用网络推断流行病特征
  • 批准号:
    10253777
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Model of mitochondrial function
线粒体功能模型
  • 批准号:
    10253711
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Growth and development of islets and beta-cells in the pancreas
胰腺中胰岛和β细胞的生长和发育
  • 批准号:
    7967846
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Pattern Identification in Sequence Activity Data
序列活动数据中的模式识别
  • 批准号:
    8939733
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Adipocyte development and insulin resistance
脂肪细胞发育和胰岛素抵抗
  • 批准号:
    8939489
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:

相似国自然基金

分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
  • 批准号:
    12371308
  • 批准年份:
    2023
  • 资助金额:
    43.5 万元
  • 项目类别:
    面上项目
资源受限下集成学习算法设计与硬件实现研究
  • 批准号:
    62372198
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
基于物理信息神经网络的电磁场快速算法研究
  • 批准号:
    52377005
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
  • 批准号:
    12302257
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
面向高维不平衡数据的分类集成算法研究
  • 批准号:
    62306119
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Using auxin to understand context-dependent hormone response
使用生长素了解背景依赖性激素反应
  • 批准号:
    10605909
  • 财政年份:
    2023
  • 资助金额:
    $ 26.99万
  • 项目类别:
The C. elegans Germline: A Test Tube for Cell and Developmental Biology
线虫种系:细胞和发育生物学的试管
  • 批准号:
    10893272
  • 财政年份:
    2022
  • 资助金额:
    $ 26.99万
  • 项目类别:
A toolkit to reversibly disrupt nuclear bodies and move genes among compartments
可逆地破坏核体并在区室之间移动基因的工具包
  • 批准号:
    9134116
  • 财政年份:
    2015
  • 资助金额:
    $ 26.99万
  • 项目类别:
A toolkit to reversibly disrupt nuclear bodies and move genes among compartments
可逆地破坏核体并在区室之间移动基因的工具包
  • 批准号:
    9326959
  • 财政年份:
    2015
  • 资助金额:
    $ 26.99万
  • 项目类别:
Computing persistent homology in biological datasets
计算生物数据集中的持久同源性
  • 批准号:
    10697862
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了