Collaborative Research: Design-Based Optimal Subdata Selection Using Mixture-of-Experts Models to Account for Big Data Heterogeneity

协作研究:基于设计的最佳子数据选择,使用专家混合模型来解释大数据异构性

基本信息

  • 批准号:
    2210546
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-15 至 2025-07-31
  • 项目状态:
    未结题

项目摘要

With technological advances, it has become easy to collect massive amounts of data for most areas of research. But with the size of datasets measured in terabytes or even petabytes, analyzing such datasets can become an expensive computational challenge and may be impossible on a typical desktop or laptop computer. However, for making impactful discoveries, it may be unnecessary to analyze an entire dataset. Consequently, there is great interest in developing and studying methods for selecting a subset from a massive dataset and for drawing conclusions based on the much smaller selected dataset. Such methods are known as subdata selection or subsampling methods. One obvious subsampling method consists of randomly selecting data from the entire dataset. While this is often the simplest and fastest option, it has been established that better options are often available. In this project, the principal investigators (PIs) aim to develop and study a rigorous framework and new methods for optimal subdata selection by using models that account for heterogeneity in the data, which is often present in large datasets. Research findings will be incorporated in topical courses to train graduate students in large-scale data analysis. The work will also be disseminated via the PIs’ collaborations in public health, biomedical science, and business.Rather than assuming a multiple regression model, the PIs plan to develop and study subdata selection methods based on mixture-of-experts (ME) models, which can account for heterogeneity in the data. The PIs will initially develop and study subdata selection methods for a subclass of the ME models, known as clusterwise linear regression models, for which the gate functions are constant. This will be followed by studying logistic-normal mixture models, in which the gate functions depend on the regression variables. For both cases, the investigators plan to develop information-based optimal subdata selection methods, first for continuous response variables and then for binary response variables, study their statistical properties, and develop efficient algorithms for the methods that will be made available in an R package.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着技术进步,对于大多数研究领域,收集大量数据变得很容易。但是,随着数据集的大小在trabytes甚至pb bit中测量,分析此类数据集可能会成为昂贵的计算挑战,并且在典型的台式机或笔记本电脑上可能不可能成为一个昂贵的计算挑战。但是,对于有影响力的发现,可能不必分析整个数据集。因此,人们对开发和研究从大型数据集中选择子集的方法以及基于较小的选定数据集的结论进行了非常兴趣。此类方法称为子数据选择或亚采样方法。一种明显的子采样方法包括从整个数据集中随机选择数据。尽管这通常是最简单,最快的选择,但已经确定通常可以使用更好的选择。在该项目中,主要研究人员(PIS)旨在通过使用解释数据中异质性的模型来开发和研究一种严格的框架和新方法,以选择最佳子数据,这通常是在大型数据集中存在的。研究结果将纳入局部课程,以培训研究生大规模数据分析。这项工作还将通过PIS在公共卫生,生物医学科学和商业方面的合作进行传播。 PIS最初将开发和研究ME模型子类的子数据选择方法,称为群集线性回归模型,栅极函数是恒定的。随后将研究逻辑正常混合模型,其中门函数取决于回归变量。在这两种情况下,研究人员都计划开发基于信息的最佳子数据选择方法,首先是连续响应变量,然后是二进制响应变量,研究其统计属性,并为将在R套件中提供的方法开发有效的算法。该奖项将对NSF的法定任务进行了评估,并通过评估了Infestia the Infectia the Intellitia the Infectial andiria this Infectia infortial and Foundlit anditial and Foundliatial and Foundlial and Foundlit and Founlitial的支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Min Yang其他文献

[Association between polymorphisms of XPD gene and susceptibility to chronic benzene poisoning].
XPD基因多态性与慢性苯中毒易感性的关系[J].
A second-order finite volume element method on quadrilateral meshes for elliptic equations
Throughput Range Based on Concurrent Transmission in Wireless Mesh Networks
无线网状网络中基于并发传输的吞吐量范围
Crystallization of poly(ethylene oxide) with acetaminophen--a study on solubility, spherulitic growth, and morphology.
聚环氧乙烷与对乙酰氨基酚的结晶——溶解度、球晶生长和形态的研究。
A Novel Selective ERK1/2 Inhibitor, Laxiflorin B, Targets EGFR Mutation Subtypes in Non-small-cell Lung Cancer
一种新型选择性 ERK1/2 抑制剂 Laxiflorin B,针对非小细胞肺癌中的 EGFR 突变亚型
  • DOI:
    10.1101/2022.06.27.497627
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chengyao Chiang;M. Zhang;Junrong Huang;Juanxian Zeng;Chun;Dongmei Pan;Heng Yang;Min Yang;Qiangqiang Han;Wang Zou;Tian Xiao;Yongdong Zou;F. Yin;Zigang Li;Lizhi Zhu;D. Zheng
  • 通讯作者:
    D. Zheng

Min Yang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Min Yang', 18)}}的其他基金

Collaborative Research: Information-Based Subdata Selection Inspired by Optimal Design of Experiments
协作研究:受实验优化设计启发的基于信息的子数据选择
  • 批准号:
    1811291
  • 财政年份:
    2018
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative research: A major leap forward: Optimal designs for correlated data, multiple objectives, and multiple covariates
协作研究:重大飞跃:相关数据、多目标和多协变量的优化设计
  • 批准号:
    1407518
  • 财政年份:
    2014
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Synthesis of glycosyl-novobiocins: probes of Hsp90 C-terminal affinity binding and novel anti-cancer drugs
糖基新生霉素的合成:Hsp90 C 端亲和结合探针和新型抗癌药物
  • 批准号:
    EP/K023071/1
  • 财政年份:
    2013
  • 资助金额:
    $ 15万
  • 项目类别:
    Research Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
  • 批准号:
    1322797
  • 财政年份:
    2012
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
CAREER: Optimal Design of Experiments for Generalized Linear Models
职业:广义线性模型实验的优化设计
  • 批准号:
    0748409
  • 财政年份:
    2008
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Collaborative Research: Optimal Design of Experiments for Categorical Data
协作研究:分类数据实验的优化设计
  • 批准号:
    0707013
  • 财政年份:
    2007
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
  • 批准号:
    0600943
  • 财政年份:
    2005
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Crossover Designs for Comparing Test Treatments with a Control Treatment: Optimality, Efficiency, and Robustness
用于比较测试处理与控制处理的交叉设计:最优性、效率和稳健性
  • 批准号:
    0304661
  • 财政年份:
    2003
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似国自然基金

载人飞行器-地形共融多平台协作起降机构设计及容错控制研究
  • 批准号:
    52305039
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于不完善信息的协作型多智能体系统设计与优化技术研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
全感知智能力控机器人系统设计及安全人机协作控制方法研究
  • 批准号:
    U22A2060
  • 批准年份:
    2022
  • 资助金额:
    255.00 万元
  • 项目类别:
    联合基金项目
基于不完善信息的协作型多智能体系统设计与优化技术研究
  • 批准号:
    62206091
  • 批准年份:
    2022
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
面向人机协作的人机交互界面设计机制研究
  • 批准号:
    72271053
  • 批准年份:
    2022
  • 资助金额:
    45 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
  • 批准号:
    2334970
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: Concurrent Design Integration of Products and Remanufacturing Processes for Sustainability and Life Cycle Resilience
协作研究:产品和再制造流程的并行设计集成,以实现可持续性和生命周期弹性
  • 批准号:
    2348641
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: DMREF: Closed-Loop Design of Polymers with Adaptive Networks for Extreme Mechanics
合作研究:DMREF:采用自适应网络进行极限力学的聚合物闭环设计
  • 批准号:
    2413579
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: Design and synthesis of hybrid anode materials made of chemically bonded carbon nanotube to copper: a concerted experiment/theory approach
合作研究:设计和合成由化学键合碳纳米管和铜制成的混合阳极材料:协调一致的实验/理论方法
  • 批准号:
    2334039
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Collaborative Research: Design: Strengthening Inclusion by Change in Building Equity, Diversity and Understanding (SICBEDU) in Integrative Biology
合作研究:设计:通过改变综合生物学中的公平、多样性和理解(SICBEDU)来加强包容性
  • 批准号:
    2335235
  • 财政年份:
    2024
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了