BIGDATA: Small: DA: DCM: Measurement and Learning in Large-Scale Social Networks

BIGDATA：小型：DA：DCM：大规模社交网络中的测量和学习

基本信息

批准号：
1251267
负责人：
Animashree Anandkumar
金额：
$ 74.68万
依托单位：
University of California-Irvine
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2013
资助国家：
美国
起止时间：
2013-09-01 至 2016-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1251267&HistoricalAwards=false
关键词：
BIGDATA Small DA DCM Measurement

项目摘要

In the context of social networks, "big data" generally involves information on very large social systems whose elements of interest display complex dependence. State-of-the-art statistical models for such systems require the use of computationally expensive stochastic simulation techniques to capture this dependence; these techniques do not generally scale well to the large-population case. One potential solution to this problem is to focus detailed modeling efforts on smaller subpopulations (e.g., groups, communities, etc.) extracted from the larger system. While scalability of the subsystem models is less challenging in this case, one must have appropriate methods for sampling from large networks in such a manner as to permit principled inference, and modeling techniques that recognize the coupling between local subpopulations and the broader network in which they are embedded.The PI will bridge the gap between expensive, highly detailed models and the limits of computability imposed by Big Data by combining expertise from machine learning and social network modeling within a unifying exponential family framework. The research will develop novel methods for the scalable measurement and analysis of large social networks, validating these techniques by deploying them in the context of dynamic data collection from online social networks. Specifically, the researchers will combine probabilistic graphical models and exponential family random graph models (ERGMs) to: (i) identify models with low computational requirements by exploiting limited-range dependence; (ii) develop machine learning techniques for identifying weakly coupled regimes in large networks to facilitate sampling and subgraph modeling; and (iii) develop integrated sampling and modeling strategies for inference from subgraphs of large networks that capture coupling to the structures in which they are embedded. This proposal investigates these questions in both the cross-sectional and dynamic contexts, for networks with and without vertex attributes. The sampling techniques created via this project will be deployed as an extension of a broader infrastructure for data collection in online social networks developed and maintained by one of the PIs, allowing for evaluation in a practical setting.The methods developed via this research will allow for analysis of data relating to many problems of public interest, including epidemiological, security, and emergency management applications; data collection and analysis activities within the project will include applications in the natural hazard context, with the potential to inform policies that can save lives and property during disasters. The project will be integrated with graduate and undergraduate education, as well as postdoctoral mentoring. Tools developed via this project will be released as part of a widely used open-source toolkit for statistical network analysis (statnet), allowing widespread dissemination to researchers and practitioners in a range of fields.

在社交网络的背景下，“大数据”通常涉及有关非常大的社交系统的信息，这些社交系统的元素显示出复杂的依赖性。此类系统的最先进的统计模型需要使用计算昂贵的随机模拟技术来捕获这种依赖性。这些技术通常不能很好地扩展到大型人口案例。解决此问题的一种潜在解决方案是将详细的建模工作集中在从较大系统中提取的较小亚群（例如，组，社区等）上。尽管在这种情况下，子系统模型的可伸缩性较小，但必须具有适当的方法来从大型网络中进行采样，以允许原则上的推理，并确认建模技术，并确认当地亚群和更广泛的网络之间的耦合，并通过将它们嵌入的较广泛的网络嵌入到昂贵的模型之间，并在昂贵的模型之间进行跨度的模型，并在较大的模型中弥合了一定的限制，该模型在高度详细的模型中弥合了限制的限制。统一指数家庭框架。这项研究将开发用于大型社交网络的可扩展测量和分析的新方法，从而通过在线社交网络收集动态数据收集的背景下部署这些技术来验证这些技术。具体而言，研究人员将结合概率图形模型和指数族的随机图模型（ERGMS）为：（i）通过利用有限范围依赖性来识别具有低计算需求的模型；（ii）开发机器学习技术，以识别大型网络中弱耦合方案的范围，以促进采样和子图建模；（iii）从捕获耦合到它们嵌入的结构的大型网络的子图中开发了集成的采样和建模策略。该建议在具有和没有顶点属性的网络的横截面和动态环境中都研究了这些问题。通过该项目创建的采样技术将被部署为在由PIS之一开发和维护的在线社交网络中的更广泛基础架构的扩展，允许在实用环境中进行评估。通过这项研究开发的方法将允许分析与公众兴趣的数据有关的数据，包括许多问题，包括流行病学，安全，安全，安全，应急管理和应急管理和应急管理；项目内的数据收集和分析活动将包括在自然危害环境中的应用程序，并有可能告知可以在灾难期间挽救生命和财产的政策。该项目将与研究生和本科教育以及博士后指导集成。通过该项目开发的工具将作为广泛使用的用于统计网络分析（StatNet）的开源工具包的一部分发布，从而使研究人员和从业人员在一系列领域中广泛传播。