Computing persistent homology in biological datasets

计算生物数据集中的持久同源性

基本信息

项目摘要

Persistent homology (PH) is a tool from Topological Data Analysis that can determine robust topological features in a data set. We work on computing PH for a point-cloud data set, specifically in our case, a set of points in a two- or three-dimensional space. The topological features that we compute can be interpreted as holes in the point-cloud in two-dimensional space and as hollow polyhedra in three-dimensional space. After determining topological features that are significant, we aim to explain possible functional relevance of these features in the physical system underlying the data. Such an analysis might reveal mechanistic features of the system that are related to or driven by its spatial structure. A data set of interest to us is the genome-wide Hi-C interaction map at 1 kb resolution, that is comprised of around 3 million points when considered as a point-cloud. Significant topological features in this data set are indicative of genes that are far apart along the linear chromosome to be spatially close to each other in the folded genome. This might elucidate long-range genomic interaction and regulation, that arises as a feature of the folding of the chromosome. However, when computing the PH, the size of this data set turned out to be beyond the computational ability of pre-existing software packagesthey either ran out of memory or were running for hours before we manually interrupted them. To surpass this hurdle, we developed a novel algorithm that was able to process the same data set in under four minutes, using only 4 GB of memory. Further, we computed PH of human genome under two different experimental conditions, with and without auxin. Auxin is a molecule that impairs function of cohesin, which is a protein complex that has been observed to localize at anchors of chromatin loops in the DNA. The results showed a decrease in the number of significant topological features upon addition of auxin. This provides supporting evidence for the prevalent hypothesis that cohesin is integral for loop formation in the human genome. In general, we have shown that our algorithm to compute PH outperforms others in most cases and has an efficient balance between memory consumption and computation time. We call it Dory and make it available as a user-friendly Python package. We have submitted this work to a journal for review. After computing PH, we explore possible functional significance of the computed topological features. This requires determining a representative location or boundary of significant topological features in the point-cloud data set. However, this computation is not well-defined and the resulting locations are not geometrically precise. As a result, most analyses that use PH are limited to studying significance of topological features. To surmount this hurdle, we developed new strategies to compute representative boundaries with improved geometric precision. We used our tool to analyze arrangement of cells (alpha, delta, beta) in human pancreatic islets. It has been hypothesized that these cells are arranged such that there are clusters of beta cells that are surrounded by alpha-delta mantle to confer efficient signaling in the pancreatic endocrine system. We compared the possibility of such an arrangement between control and diabetic subjects. This was done by using our algorithm to compute locations of holes in the alpha-delta structure (experimental data from 2D slices of human pancreatic islets), and then counting beta-cells in them. Our results showed that, compared to control subjects, a higher percentage of diabetic subjects have a low percentage of beta-cells that are surrounded by an alpha-delta mantle. This can possibly support the alpha-delta mantle hypothesis by suggesting that disruption in this particular arrangement of cells might be a contributing factor in impairment of proper function of the endocrine system in diabetic subjects.
持久同源性 (PH) 是拓扑数据分析中的一种工具,可以确定数据集中稳健的拓扑特征。我们致力于计算点云数据集的 PH,特别是在我们的例子中,是二维或三维空间中的一组点。我们计算的拓扑特征可以解释为二维空间中点云中的孔和三维空间中的空心多面体。在确定重要的拓扑特征后,我们的目标是解释这些特征在数据底层的物理系统中可能的功能相关性。这样的分析可能会揭示系统与其空间结构相关或由其空间结构驱动的机械特征。我们感兴趣的数据集是分辨率为 1 kb 的全基因组 Hi-C 相互作用图,当被视为点云时,它由大约 300 万个点组成。该数据集中的显着拓扑特征表明,沿着线性染色体相距较远的基因在折叠基因组中在空间上彼此接近。这可能阐明长程基因组相互作用和调控,这是染色体折叠的一个特征。然而,在计算 PH 时,该数据集的大小超出了现有软件包的计算能力,它们要么耗尽了内存,要么在我们手动中断它们之前运行了几个小时。为了克服这一障碍,我们开发了一种新颖的算法,该算法能够在四分钟内处理相同的数据集,并且仅使用 4 GB 内存。此外,我们在有和没有生长素的两种不同实验条件下计算了人类基因组的 PH 值。生长素是一种损害粘连蛋白功能的分子,粘连蛋白是一种蛋白质复合物,已被观察到位于 DNA 染色质环的锚点上。结果表明,添加生长素后显着拓扑特征的数量减少。这为普遍的假设提供了支持证据,即粘连蛋白是人类基因组环形成不可或缺的一部分。总的来说,我们已经证明我们的计算 PH 的算法在大多数情况下都优于其他算法,并且在内存消耗和计算时间之间具有有效的平衡。我们将其称为 Dory,并将其作为用户友好的 Python 包提供。我们已将这项工作提交给期刊进行审查。计算 PH 后,我们探索计算的拓扑特征可能的功能意义。这需要确定点云数据集中重要拓扑特征的代表性位置或边界。然而,这种计算没有明确定义,并且所得位置在几何上并不精确。因此,大多数使用 PH 的分析仅限于研究拓扑特征的重要性。为了克服这一障碍,我们开发了新的策略来计算具有更高几何精度的代表性边界。我们使用我们的工具来分析人类胰岛中的细胞排列(α、δ、β)。据推测,这些细胞的排列方式使得β细胞簇被α-δ外套膜包围,从而在胰腺内分泌系统中提供有效的信号传导。我们比较了对照组和糖尿病受试者之间这种安排的可能性。这是通过使用我们的算法计算 α-δ 结构中孔的位置(来自人类胰岛二维切片的实验数据),然后计算其中的 β 细胞来完成的。我们的结果表明,与对照受试者相比,较高比例的糖尿病受试者具有较低比例的被 α-δ 外套膜包围的 β 细胞。这可能支持α-δ地幔假说,表明这种特定细胞排列的破坏可能是糖尿病受试者内分泌系统正常功能受损的一个促成因素。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Tight basis cycle representatives for persistent homology of large biological data sets.
大型生物数据集持久同源性的紧密基循环代表。
  • DOI:
  • 发表时间:
    2023-05
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Aggarwal, Manu;Periwal, Vipul
  • 通讯作者:
    Periwal, Vipul
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Vipul Periwal其他文献

Vipul Periwal的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Vipul Periwal', 18)}}的其他基金

Growth and development of islets and beta-cells in the pancreas
胰腺中胰岛和β细胞的生长和发育
  • 批准号:
    7967846
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Autoregulation of free radicals via control of uncoupling proteins in beta-cells
通过控制β细胞中的解偶联蛋白来自动调节自由基
  • 批准号:
    8939488
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Adipocyte development and insulin resistance
脂肪细胞发育和胰岛素抵抗
  • 批准号:
    8939489
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Inferring epidemic characteristics with networks
利用网络推断流行病特征
  • 批准号:
    10697857
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Liver regeneration after partial hepatectomy
部分肝切除术后的肝再生
  • 批准号:
    8349967
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Inferring epidemic characteristics with networks
利用网络推断流行病特征
  • 批准号:
    10253777
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Model of mitochondrial function
线粒体功能模型
  • 批准号:
    10253711
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Single Cell Data Analysis Algorithms
单细胞数据分析算法
  • 批准号:
    10919520
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Quantitative Estimation of Sensitivity of Lipolysis to Insulin
脂肪分解对胰岛素敏感性的定量评估
  • 批准号:
    10919382
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
Quantitative Estimation of Sensitivity of Lipolysis to Insulin
脂肪分解对胰岛素敏感性的定量评估
  • 批准号:
    8553371
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:

相似国自然基金

基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
  • 批准号:
    82304250
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
多模态高层语义驱动的深度伪造检测算法研究
  • 批准号:
    62306090
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
高精度海表反照率遥感算法研究
  • 批准号:
    42376173
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
基于新型深度学习算法和多组学研究策略鉴定非编码区剪接突变在肌萎缩侧索硬化症中的分子机制
  • 批准号:
    82371878
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
基于深度学习与水平集方法的心脏MR图像精准分割算法研究
  • 批准号:
    62371156
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Using auxin to understand context-dependent hormone response
使用生长素了解背景依赖性激素反应
  • 批准号:
    10605909
  • 财政年份:
    2023
  • 资助金额:
    $ 26.99万
  • 项目类别:
The C. elegans Germline: A Test Tube for Cell and Developmental Biology
线虫种系:细胞和发育生物学的试管
  • 批准号:
    10893272
  • 财政年份:
    2022
  • 资助金额:
    $ 26.99万
  • 项目类别:
A toolkit to reversibly disrupt nuclear bodies and move genes among compartments
可逆地破坏核体并在区室之间移动基因的工具包
  • 批准号:
    9134116
  • 财政年份:
    2015
  • 资助金额:
    $ 26.99万
  • 项目类别:
A toolkit to reversibly disrupt nuclear bodies and move genes among compartments
可逆地破坏核体并在区室之间移动基因的工具包
  • 批准号:
    9326959
  • 财政年份:
    2015
  • 资助金额:
    $ 26.99万
  • 项目类别:
Computing persistent homology in biological datasets
计算生物数据集中的持久同源性
  • 批准号:
    10697862
  • 财政年份:
  • 资助金额:
    $ 26.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了