BIGDATA: F: Reliable Inference with Big Data: Reproducibility, Data Sharing, Heterogeneity

BIGDATA:F:大数据的可靠推理:再现性、数据共享、异构性

基本信息

  • 批准号:
    1741162
  • 负责人:
  • 金额:
    $ 65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

Over the last decade, 'big data' technologies have allowed the acquisition of vast amount of data (e.g. through smartphones) and their accumulation into large scale databases. Powerful hardware and software systems have been developed to crunch these data and extract statistical models. For instance, the outcome of a certain medical procedure can be modeled in terms of the features of the patient, thus in principle providing a personalized risk score for that procedure. Unfortunately, the increasing complexity of these data and of the algorithms used has made statistical models significantly less transparent. How certain are we of these statistical predictions? What is their limit of validity? How biased is the resulting model?This project focuses on four main challenges that are ubiquitous in big-data, and are crucial to extract reliable insights: reproducibility; data sharing; missing data; data heterogeneity. (1) Reproducibility requires being able to compare two models extracted from different data sets (e.g. after additional data have been accumulated). This is in turn impossible unless we have reliable procedures to quantify uncertainty and confidence in complex high-dimensional models. Recently proposed ideas in this direction are still insufficient to cope with realistic large-scale applications.(2) Data sharing is a key feature of modern data analysis, whereby a single massive data set is being studied by hundreds of independent researchers. Unguarded statistical inference by such a population of researchers unavoidably leads to large numbers of false discoveries. The project builds on false discovery rate-controlling methods to propose safe approaches for decentralized data analysis.(3) Missing data are ubiquitous in big data. While several methods have been developed in the past to deal with missing data, it is unclear to what extent they are applicable to modern scenarios. The project aims at developing principled guidelines based on a rigorous comparison of various approaches, and developing new algorithms based on maximum likelihood.(4) Data heterogeneity. Big data are often produced by the aggregation of multiple data sources. How can we prevent standard statistical procedures to be critically affected by such heterogeneities? The project uses new regularization schemes to fusion information across multiple sources.
在过去的十年中,“大数据”技术允许获取大量数据(例如通过智能手机)及其积累到大型数据库中。已经开发了强大的硬件和软件系统来处理这些数据并提取统计模型。例如,可以根据患者的特征对某个医疗程序的结果进行建模,因此原则上为该程序提供个性化的风险评分。不幸的是,这些数据和所使用的算法的复杂性日益增加,使统计模型显着透明。我们如何确定这些统计预测?他们的有效性限制是什么?由此产生的模型有多偏见?该项目着重于大数据中无处不在的四个主要挑战,对于提取可靠的见解至关重要:可重复性;数据共享;缺少数据;数据异质性。 (1)可重复性需要能够比较从不同数据集提取的两个模型(例如,在累积了其他数据之后)。反过来,除非我们有可靠的程序来量化对复杂的高维模型的不确定性和信心,否则这是不可能的。最近在这个方向上提出的想法仍然不足以应对现实的大规模应用。(2)数据共享是现代数据分析的关键特征,因此,数百名独立研究人员正在研究一个大规模的数据集。这样的研究人员不可避免地会导致大量错误发现。该项目建立在错误的发现率控制方法的基础上,以提出用于分散数据分析的安全方法。(3)丢失的数据在大数据中无处不在。尽管过去已经开发了几种方法来处理丢失的数据,但尚不清楚它们在多大程度上适用于现代方案。该项目旨在基于对各种方法的严格比较制定原则指南,并根据最大可能性开发新算法。(4)数据异质性。大数据通常是由多个数据源的汇总产生的。我们如何防止标准统计程序受到这种异质性的严重影响?该项目使用新的正规化方案来跨多个来源融合信息。

项目成果

期刊论文数量(28)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function”
  • DOI:
    10.1214/19-aos1910
  • 发表时间:
    2020-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    B. Ghorbani;Song Mei;Theodor Misiakiewicz;A. Montanari
  • 通讯作者:
    B. Ghorbani;Song Mei;Theodor Misiakiewicz;A. Montanari
Learning with invariances in random features and kernel models
  • DOI:
  • 发表时间:
    2021-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Song Mei;Theodor Misiakiewicz;A. Montanari
  • 通讯作者:
    Song Mei;Theodor Misiakiewicz;A. Montanari
When do neural networks outperform kernel methods?
Optimization of the Sherrington--Kirkpatrick Hamiltonian
Sherrington--Kirkpatrick 哈密顿量的优化
  • DOI:
    10.1137/20m132016x
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Montanari, Andrea
  • 通讯作者:
    Montanari, Andrea
The threshold for SDP-refutation of random regular NAE-3SAT
  • DOI:
    10.1137/1.9781611975482.140
  • 发表时间:
    2018-04
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Y. Deshpande;A. Montanari;R. O'Donnell;T. Schramm;S. Sen
  • 通讯作者:
    Y. Deshpande;A. Montanari;R. O'Donnell;T. Schramm;S. Sen
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Andrea Montanari其他文献

Understanding Inverse Scaling and Emergence in Multitask Representation Learning
了解多任务表示学习中的逆缩放和涌现
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. E. Ildiz;Zhe Zhao;Samet Oymak;Xiangyu Chang;Yingcong Li;Christos Thrampoulidis;Lin Chen;Yifei Min;Mikhail Belkin;Aakanksha Chowdhery;Sharan Narang;Jacob Devlin;Maarten Bosma;Gaurav Mishra;Adam Roberts;Liam Collins;Hamed Hassani;M. Soltanolkotabi;Aryan Mokhtari;Sanjay Shakkottai;Provable;Simon S. Du;Wei Hu;S. Kakade;Chelsea Finn;A. Rajeswaran;Deep Ganguli;Danny Hernandez;Liane Lovitt;Amanda Askell;Yu Bai;Anna Chen;Tom Conerly;Nova Dassarma;Dawn Drain;Sheer Nelson El;El Showk;Stanislav Fort;Zac Hatfield;T. Henighan;Scott Johnston;Andy Jones;Nicholas Joseph;Jackson Kernian;Shauna Kravec;Benjamin Mann;Neel Nanda;Kamal Ndousse;Catherine Olsson;D. Amodei;Tom Brown;Jared Ka;Sam McCandlish;Chris Olah;Dario Amodei;Trevor Hastie;Andrea Montanari;Saharon Rosset;Jordan Hoffmann;Sebastian Borgeaud;A. Mensch;Elena Buchatskaya;Trevor Cai;Eliza Rutherford;Diego de;Las Casas;Lisa Anne Hendricks;Johannes Welbl;Aidan Clark;Tom Hennigan;Eric Noland;Katie Millican;George van den Driessche;Bogdan Damoc;Aurelia Guy;Simon Osindero;Karen Si;Erich Elsen;Jack W. Rae;O. Vinyals;Jared Kaplan;B. Chess;R. Child;S. Gray;Alec Radford;Jeffrey Wu;I. R. McKenzie;Alexander Lyzhov;Michael Pieler;Alicia Parrish;Aaron Mueller;Ameya Prabhu;Euan McLean;Aaron Kirtland;Alexis Ross;Alisa Liu;Andrew Gritsevskiy;Daniel Wurgaft;Derik Kauff;Gabriel Recchia;Jiacheng Liu;Joe Cavanagh;Tom Tseng;Xudong Korbak;Yuhui Shen;Zhengping Zhang;Najoung Zhou;Samuel R Kim;Bowman Ethan;Perez;Feng Ruan;Youngtak Sohn
  • 通讯作者:
    Youngtak Sohn
Optimization of random cost functions and statistical physics
  • DOI:
  • 发表时间:
    2024-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrea Montanari
  • 通讯作者:
    Andrea Montanari
Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition
通过测量分解进行稀疏线性回归的可证明有效的后验采样
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrea Montanari;Yuchen Wu
  • 通讯作者:
    Yuchen Wu
Phase diagram of random heteropolymers.
无规杂聚物的相图。
  • DOI:
    10.1103/physrevlett.92.185509
  • 发表时间:
    2003
  • 期刊:
  • 影响因子:
    8.6
  • 作者:
    Andrea Montanari;Markus Müller;Marc Mézard
  • 通讯作者:
    Marc Mézard
Tractability from overparametrization: the example of the negative perceptron
过度参数化的可处理性:负感知器的例子
  • DOI:
    10.1007/s00440-023-01248-y
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Andrea Montanari;Yiqiao Zhong;Kangjie Zhou
  • 通讯作者:
    Kangjie Zhou

Andrea Montanari的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Andrea Montanari', 18)}}的其他基金

CIF: Small: Learning and estimation with rough non-convex objectives: Fundamental limits and efficient algorithms
CIF:小:具有粗略非凸目标的学习和估计:基本限制和高效算法
  • 批准号:
    2006489
  • 财政年份:
    2020
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
Workshop: Advances in Asymptotic Probability
研讨会:渐近概率的进展
  • 批准号:
    1839440
  • 财政年份:
    2018
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CIF:Small:Information-theoretic and Computational Thresholds in Statistical Learning
CIF:小:统计学习中的信息理论和计算阈值
  • 批准号:
    1714305
  • 财政年份:
    2017
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CIF: Small: Optimal Iterative Estimation in Signal Processing, Information Theory and Machine Learning
CIF:小:信号处理、信息论和机器学习中的最优迭代估计
  • 批准号:
    1319979
  • 财政年份:
    2013
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
The game dynamics of social interaction: Algorithms and applications
社交互动的博弈动力学:算法与应用
  • 批准号:
    0915145
  • 财政年份:
    2009
  • 资助金额:
    $ 65万
  • 项目类别:
    Standard Grant
CAREER: New Information Processing Techniques from Statistical Physics and Probability Theory
职业:统计物理学和概率论的新信息处理技术
  • 批准号:
    0743978
  • 财政年份:
    2008
  • 资助金额:
    $ 65万
  • 项目类别:
    Continuing Grant

相似国自然基金

空间类比推理的概念、推理方法和可靠性评价模型
  • 批准号:
    42371463
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
基于因果推理的高速列车牵引变流器可靠状态预警方法研究
  • 批准号:
    52302426
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于长期监测数据和贝叶斯推理的桥梁结构可靠性预后方法研究
  • 批准号:
    51878235
  • 批准年份:
    2018
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于精细可靠度评估的优化证据推理及应用
  • 批准号:
    61672431
  • 批准年份:
    2016
  • 资助金额:
    61.0 万元
  • 项目类别:
    面上项目
基于不确定性推理的既有结构可靠性评定
  • 批准号:
    50678143
  • 批准年份:
    2006
  • 资助金额:
    28.0 万元
  • 项目类别:
    面上项目

相似海外基金

Reliable and robust causal inference approaches for effective connectivity research with fMRI data
可靠且稳健的因果推理方法,可利用功能磁共振成像数据进行有效的连通性研究
  • 批准号:
    10709066
  • 财政年份:
    2022
  • 资助金额:
    $ 65万
  • 项目类别:
Development of reliable wireless network technology for drone swarm activities in underground spaces where wireless communication is difficult
开发可靠的无线网络技术,用于无线通信困难的地下空间中的无人机群活动
  • 批准号:
    21K18746
  • 财政年份:
    2021
  • 资助金额:
    $ 65万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Establishing a Flexible and Reliable Automatic Approximate Inference Method to Accelerate the Social Execution of Statistical Modeling.
建立灵活可靠的自动近似推理方法,加速统计建模的社会化执行。
  • 批准号:
    21J11859
  • 财政年份:
    2021
  • 资助金额:
    $ 65万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Development of High Reliable Optimization System for Asteroid Deflection Missions
小行星偏转任务高可靠优化系统开发
  • 批准号:
    19K15210
  • 财政年份:
    2019
  • 资助金额:
    $ 65万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Fundamental research on finite length analysis in information theory and optimization theory for practical, reliable, and highly efficient communications
信息论有限长度分析和实用、可靠、高效通信的优化理论基础研究
  • 批准号:
    17K06446
  • 财政年份:
    2017
  • 资助金额:
    $ 65万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了