Cluster Analysis for High-Dimensional and Multi-Source Data
高维多源数据聚类分析
基本信息
- 批准号:2013905
- 负责人:
- 金额:$ 22.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-08-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Rapid technology advances in devices and computer systems continue to grow our capacity to collect and store data. Clustering is often the first stage analysis performed to discover patterns, gain insights, and extract knowledge from massive amount of data routinely faced in science, engineering, and commercial domains. For instance, in biomedical studies, clustering is used to reveal pathological subgroups and help researchers form new hypothesis for in-depth investigation. It is thus imperative to develop new clustering methods to meet the ever-increasing challenges of data with high complexity, huge volume, and from distributed sources. In this project, novel statistical and optimization-based approaches and software packages will be developed to address these challenges. Graduate students will be trained to conduct research at the forefront of machine learning. The research results will be used to enrich courses and outreach educational materials in data science. A prominent statistical paradigm for clustering is based on mixture models, which is objective, parsimonious, not biased for known clusters, and has a probabilistic framework that can be extended and interpreted in standard ways. For high-dimensional large-scale data, existing mixture-model based methods have fundamental limitations. Furthermore, a big data environment can require the integration of clustering results at distributed sites, a problem called multi-source clustering. This research will advance cluster analysis from multiple aspects. First, hidden Markov model on variable blocks (HMM-VB), a special Gaussian mixture model (GMM), is developed to tackle high dimensionality. The estimation of HMM-VB will be enhanced by computationally efficient methods to identify the latent variable block structure and by mixture factor analyzers. Second, leveraging the latent states of HMM-VB, a new variable selection approach will be developed for clustering high-dimensional data. Third, the emerging topic of multi-source clustering will be studied. New methods based on optimal transport and Wasserstein barycenter will be developed for aggregating clustering results from multiple sources. Applications in biomedical areas will be pursued.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
设备和计算机系统技术的快速进步不断增强我们收集和存储数据的能力。聚类通常是为发现模式、获得见解并从科学、工程和商业领域中经常面临的大量数据中提取知识而执行的第一阶段分析。例如,在生物医学研究中,聚类用于揭示病理亚组并帮助研究人员形成新的假设以进行深入研究。 因此,迫切需要开发新的聚类方法来应对高复杂性、海量、分布式数据日益增长的挑战。在该项目中,将开发新的基于统计和优化的方法和软件包来应对这些挑战。研究生将接受培训以在机器学习的前沿进行研究。研究结果将用于丰富数据科学的课程和推广教育材料。聚类的一个突出的统计范式是基于混合模型,它是客观的、简约的、不偏向于已知的聚类,并且具有可以以标准方式扩展和解释的概率框架。对于高维大规模数据,现有的基于混合模型的方法具有根本的局限性。此外,大数据环境可能需要集成分布式站点的聚类结果,这个问题称为多源聚类。本研究将从多个方面推进聚类分析。首先,开发了可变块隐马尔可夫模型(HMM-VB),这是一种特殊的高斯混合模型(GMM),用于解决高维问题。 HMM-VB 的估计将通过计算有效的方法来识别潜在变量块结构和混合因子分析器来增强。其次,利用 HMM-VB 的潜在状态,将开发一种新的变量选择方法来聚类高维数据。第三,将研究多源聚类这一新兴课题。将开发基于最优传输和 Wasserstein 重心的新方法,用于聚合来自多个来源的聚类结果。将寻求生物医学领域的应用。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multisource Single-Cell Data Integration by MAW Barycenter for Gaussian Mixture Models
MAW Barycenter 用于高斯混合模型的多源单细胞数据集成
- DOI:10.1111/biom.13630
- 发表时间:2022
- 期刊:
- 影响因子:1.9
- 作者:Lin, Lin;Shi, Wei;Ye, Jianbo;Li, Jia
- 通讯作者:Li, Jia
Optimal Transport With Relaxed Marginal Constraints
放宽边际约束的最佳运输
- DOI:10.1109/access.2021.3072613
- 发表时间:2021
- 期刊:
- 影响因子:3.9
- 作者:Li, Jia;Lin, Lin
- 通讯作者:Lin, Lin
VtNet: A neural network with variable importance assessment
- DOI:10.1002/sta4.325
- 发表时间:2020-10
- 期刊:
- 影响因子:1.7
- 作者:Lixiang Zhang;Lin Lin-Lin;Jia Li
- 通讯作者:Lixiang Zhang;Lin Lin-Lin;Jia Li
Robust deep neural network surrogate models with uncertainty quantification via adversarial training
- DOI:10.1002/sam.11610
- 发表时间:2023-01
- 期刊:
- 影响因子:0
- 作者:Lixiang Zhang;Jia Li
- 通讯作者:Lixiang Zhang;Jia Li
Mixture of Linear Models Co-supervised by Deep Neural Networks
- DOI:10.1080/10618600.2022.2107533
- 发表时间:2021-08
- 期刊:
- 影响因子:2.4
- 作者:Beomseok Seo;Lin Lin-Lin;Jia Li
- 通讯作者:Beomseok Seo;Lin Lin-Lin;Jia Li
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jia Li其他文献
An Assessment of Autonomous Vehicles: Traffic Impacts and Infrastructure Needs—Final Report
自动驾驶汽车评估:交通影响和基础设施需求——最终报告
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
K. Kockelman;S. Boyles;P. Stone;Daniel J. Fagnant;Rahul Patel;M. Levin;Guni Sharon;M. Simoni;Michael Albert;Hagen Fritz;Rebecca Hutchinson;P. Bansal;Gleb B. Domnenko;P. Bujanovic;Bumsik Kim;Elham Pourrahmani;Sudesh Agrawal;Tianxin Li;Josiah P. Hanna;Aqshems Nichols;Jia Li - 通讯作者:
Jia Li
SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
SimXRD-4M:大量模拟 X 射线衍射数据加速晶体对称性分类
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Bin Cao;Yang Liu;Zinan Zheng;Ruifeng Tan;Jia Li;Tong - 通讯作者:
Tong
The influence of rolling pressure on the changes in non-volatile compounds and sensory quality of congou black tea: The combination of metabolomics, E-tongue, and chromatic differences analyses.
- DOI:
10.1016/j.fochx.2023.100989 - 发表时间:
2023-12-30 - 期刊:
- 影响因子:6.1
- 作者:
Shan Zhang;Shimin Wu;Qinyan Yu;Xujiang Shan;Le Chen;Yuliang Deng;Jinjie Hua;Jiayi Zhu;Qinghua Zhou;Yongwen Jiang;Haibo Yuan;Jia Li - 通讯作者:
Jia Li
Robust Jump Regressions
鲁棒跳跃回归
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Jia Li;V. Todorov;George Tauchen - 通讯作者:
George Tauchen
We use these formulas and numerical simulations to examine the relative importance of di erent stages of infection and di erent chronic levels of virus to the spreading of the disease
我们使用这些公式和数值模拟来检查不同感染阶段和不同慢性病毒水平对疾病传播的相对重要性
- DOI:
- 发表时间:
1999 - 期刊:
- 影响因子:0
- 作者:
J. Hyman;Jia Li;E. Stanley - 通讯作者:
E. Stanley
Jia Li的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jia Li', 18)}}的其他基金
RII Track-4:NSF: Resistively-Detected Electron Spin Resonance in Multilayer Graphene
RII Track-4:NSF:多层石墨烯中电阻检测的电子自旋共振
- 批准号:
2327206 - 财政年份:2024
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
CAREER: studying superconductivity and ferromagnetism in 2D material heterostructures with flat energy band
职业:研究具有平坦能带的二维材料异质结构中的超导性和铁磁性
- 批准号:
2143384 - 财政年份:2022
- 资助金额:
$ 22.5万 - 项目类别:
Continuing Grant
CIF: Small: Interpretable Machine Learning based on Deep Neural Networks: A Source Coding Perspective
CIF:小:基于深度神经网络的可解释机器学习:源编码视角
- 批准号:
2205004 - 财政年份:2022
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
EAGER-DynamicData:动态和分布式数据的生成统计建模
- 批准号:
1462230 - 财政年份:2015
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Statistical Learning for Image Annotation
图像标注的统计学习
- 批准号:
1521092 - 财政年份:2015
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Parametric and nonparametric regressions on spot volatility
现货波动率的参数和非参数回归
- 批准号:
1326819 - 财政年份:2013
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Estimation and Inference Methods for Continuous-Time Models
连续时间模型的估计和推理方法
- 批准号:
1227448 - 财政年份:2012
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Modeling of Mosquitoes Carrying Transgenes or Genetically Modified Bacteria in Preventing the Transmission of Mosquito-Borne Diseases
携带转基因或转基因细菌的蚊子模型以预防蚊媒疾病的传播
- 批准号:
1118150 - 财政年份:2011
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
The Second International Conference on Mathematical Modeling and Analysis of Populations in Biological Systems; October 2009; Huntsville, Alabama
第二届生物系统群体数学建模与分析国际会议;
- 批准号:
0931213 - 财政年份:2009
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
Essential Roles of Receptor-Like Kinases in Brassinosteroid and Cell-Death Control Signaling Pathways
受体样激酶在油菜素类固醇和细胞死亡控制信号通路中的重要作用
- 批准号:
0849206 - 财政年份:2009
- 资助金额:
$ 22.5万 - 项目类别:
Standard Grant
相似国自然基金
带二维变尺寸装箱特点的网络构建及相关问题的算法设计与分析
- 批准号:12361066
- 批准年份:2023
- 资助金额:27 万元
- 项目类别:地区科学基金项目
大尺寸超薄柔性硅晶圆力学性能分析与增强设计方法
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
结构尺寸对CFRP筋混凝土剪力墙抗震性能影响分析
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
大尺寸砷化硼晶体缺陷的调控与分析
- 批准号:
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
考虑多重尺寸效应的FRP抗剪加固混凝土梁的可靠度分析及设计
- 批准号:51908372
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Rapid Free-Breathing 3D High-Resolution MRI for Volumetric Liver Iron Quantification
用于体积肝铁定量的快速自由呼吸 3D 高分辨率 MRI
- 批准号:
10742197 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Open-source Software Development Supplement for 3D quantitative analysisof mouse models of structural birth defects through computational anatomy
通过计算解剖学对结构性出生缺陷小鼠模型进行 3D 定量分析的开源软件开发补充
- 批准号:
10839199 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
A Multidimensional Approach to Understanding Prenatal Health and Psychosocial Factors in Relation to the Maternal Inflammatory Milieu and Offspring Neurodevelopment
了解与母体炎症环境和后代神经发育相关的产前健康和心理社会因素的多维方法
- 批准号:
10607823 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Single molecule biomolecular condensate analysis in neurons
神经元中的单分子生物分子凝聚物分析
- 批准号:
10583437 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别:
Integrating multi-omics, imaging, and longitudinal data to predict radiation response in cervical cancer
整合多组学、成像和纵向数据来预测宫颈癌的放射反应
- 批准号:
10734702 - 财政年份:2023
- 资助金额:
$ 22.5万 - 项目类别: