Collaborative Research: SI2:SSE: Extending the Physics Reach of LHCb in Run 3 Using Machine Learning in the Real-Time Data Ingestion and Reduction System
合作研究:SI2:SSE:在运行 3 中使用实时数据摄取和还原系统中的机器学习扩展 LHCb 的物理范围
基本信息
- 批准号:1740102
- 负责人:
- 金额:$ 22.46万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the past 200 years, physicists have discovered the basic constituents of ordinary matter and the developed a very successful theory to describe the interactions (forces) between them. All atoms, and the molecules from which they are built, can be described in terms of these constituents. The nuclei of atoms are bound together by strong nuclear interactions. Their decays result from strong and weak nuclear interactions. Electromagnetic forces bind atoms together, and bind atoms into molecules. The electromagnetic, weak nuclear, and strong nuclear forces are described in terms of quantum field theories. The predictions of these theories can be very, very precise, and they have been validated with equally precise experimental measurements. Most recently, a new fundamental particle required to unify the weak and electromagnetic interactions, the Higgs boson, was discovered at the Large Hadron Collider (LHC), located at the CERN laboratory in Switzerland. Despite the vast amount of knowledge acquired over the past century about the fundamental particles and forces of nature, many important questions still remain unanswered. For example, most of the matter in the universe that interacts gravitationally does not have ordinary electromagnetic or nuclear interactions. As it has only been observed via its gravitation interactions, it is called dark matter. What is it? Equally interesting, why is there so little anti-matter in the universe when the fundamental interactions we know describe matter and anti-matter as almost perfect mirror images of each other? The LHC was built to discover and study the Higgs boson and to search for answers to these questions. The first data-taking run (Run 1, 2010-2012) of the LHC was a huge success, producing over 1000 journal articles, highlighted by the discovery of the Higgs boson. The current LHC run (Run 2, 2015-present) has already produced many world-leading results; however, the most interesting questions remained unanswered. The LHCb experiment, located on the LHC at CERN, has unique potential to answer some of these questions. LHCb is searching for signals of dark matter produced in high-energy particle collisions at the LHC, and performing high-precision studies of rare processes that could reveal the existence of the as-yet-unknown forces that caused the matter/anti-matter imbalance observed in our universe. The primary goal of this project - supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering and the Physics Division and the Division of Mathematical Sciences in the Directorate of Mathematical and Physical Sciences - is developing and deploying software utilizing Machine Learning (ML) that will enable the LHCb experiment to significantly improve its discovery potential in Run 3 (2021-2023). Specifically, the ML developed will greatly increase the sensitivity to many proposed types of dark matter and new forces by making it possible to much more efficiently identify and study potential signals -- using the finite computing resources available. The data sets collected by the LHC experiments are some of the largest in the world. For example, the sensor arrays of the LHCb experiment, on which both PIs work, produce about 100 terabytes of data per second, close to a zettabyte of data per year. Even after drastic data-reduction performed by custom-built read-out electronics, the data volume is still about 10 exabytes per year, comparable to the largest-scale industrial data sets. Such large data sets cannot be stored indefinitely; therefore, all high energy physics (HEP) experiments employ a data-reduction scheme executed in real time by a data-ingestion system - referred to as a trigger system in HEP - to decide whether each event is to be persisted for future analysis or permanently discarded. Trigger-system designs are dictated by the rate at which the sensors can be read out, the computational power of the data-ingestion system, and the available storage space for the data. The LHCb detector is being upgraded for Run 3 (2021-2023), when the trigger system will need to process 25 exabytes per year. Currently, only 0.3 of the 10 exabytes per year processed by the trigger are analyzed using high-level computing algorithms; the rest is discarded prior to this stage using simple algorithms executed on FPGAs. To process all the data on CPU farms, ML will be used to develop and deploy new trigger algorithms. The specific objectives of this proposal are to more fully characterize LHCb data using ML and build algorithms using these characterizations: to replace the most computationally expensive parts of the event pattern recognition; to increase the performance of the event-classification algorithms; and to reduce the number of bytes persisted per event without degrading physics performance. Many potential explanations for dark matter and the matter/anti-matter asymmetry of our universe are currently inaccessible due to trigger-system limitations. As HEP computing budgets are projected to be approximately flat moving forward, the LHCb trigger system must be redesigned for the experiment to realize its full potential. This redesign must go beyond scalable technical upgrades; radical new strategies are needed.
在过去的 200 年里,物理学家发现了普通物质的基本成分,并发展了非常成功的理论来描述它们之间的相互作用(力)。 所有原子以及构成它们的分子都可以用这些成分来描述。 原子核通过强核相互作用结合在一起。 它们的衰变是强核相互作用和弱核相互作用的结果。电磁力将原子结合在一起,并将原子结合成分子。电磁力、弱核力和强核力用量子场论来描述。 这些理论的预测可以非常非常精确,并且已经通过同样精确的实验测量得到了验证。 最近,位于瑞士欧洲核子研究中心实验室的大型强子对撞机(LHC)发现了一种统一弱相互作用和电磁相互作用所需的新基本粒子——希格斯玻色子。尽管在过去的一个世纪中人们获得了大量关于基本粒子和自然力的知识,但许多重要的问题仍然没有得到解答。例如,宇宙中大多数通过引力相互作用的物质不具有普通的电磁或核相互作用。 由于它只能通过引力相互作用来观察到,因此被称为暗物质。 它是什么? 同样有趣的是,当我们所知道的基本相互作用将物质和反物质描述为彼此几乎完美的镜像时,为什么宇宙中反物质如此之少?大型强子对撞机的建立是为了发现和研究希格斯玻色子并寻找这些问题的答案。 LHC 的第一次数据采集(2010-2012 年第 1 次)取得了巨大成功,发表了 1000 多篇期刊文章,其中尤以希格斯玻色子的发现为亮点。当前的大型强子对撞机运行(运行2,2015年至今)已经产生了许多世界领先的结果;然而,最有趣的问题仍未得到解答。位于欧洲核子研究组织大型强子对撞机上的大型强子对撞机实验具有回答其中一些问题的独特潜力。 LHCb 正在寻找 LHC 高能粒子碰撞中产生的暗物质信号,并对罕见过程进行高精度研究,这些过程可能揭示导致物质/反物质不平衡的未知力量的存在在我们的宇宙中观察到。该项目的主要目标是利用机器开发和部署软件,得到计算机和信息科学与工程局高级网络基础设施办公室以及数学和物理科学局物理司和数学科学司的支持。学习 (ML) 将使 LHCb 实验能够显着提高其在运行 3(2021-2023)中的发现潜力。具体来说,开发的机器学习将使用有限的可用计算资源,更有效地识别和研究潜在信号,从而大大提高对许多拟议类型的暗物质和新力量的敏感性。 大型强子对撞机实验收集的数据集是世界上最大的。例如,两个 PI 所使用的 LHCb 实验的传感器阵列每秒产生约 100 TB 的数据,每年产生接近 1 ZB 的数据。即使在定制读出电子设备大幅减少数据后,数据量仍然约为每年 10 艾字节,与最大规模的工业数据集相当。如此大的数据集不可能无限期地存储;因此,所有高能物理 (HEP) 实验均采用由数据摄取系统(在 HEP 中称为触发系统)实时执行的数据缩减方案,以决定是否保留每个事件以供将来分析或永久保留被丢弃。触发系统设计取决于传感器的读取速率、数据摄取系统的计算能力以及数据的可用存储空间。 LHCb 探测器正在针对运行 3(2021-2023)进行升级,届时触发系统每年需要处理 25 艾字节。目前,触发器每年处理的 10 艾字节中只有 0.3 个是使用高级计算算法进行分析的;在此阶段之前,使用在 FPGA 上执行的简单算法丢弃其余部分。为了处理 CPU 群上的所有数据,机器学习将用于开发和部署新的触发算法。该提案的具体目标是使用 ML 更全面地表征 LHCb 数据,并使用这些表征构建算法: 取代事件模式识别中计算成本最高的部分;提高事件分类算法的性能;并减少每个事件保留的字节数而不降低物理性能。由于触发系统的限制,目前无法获得对暗物质和宇宙物质/反物质不对称性的许多潜在解释。由于 HEP 计算预算预计将大致持平,因此必须重新设计 LHCb 触发系统,以便该实验充分发挥其潜力。这种重新设计必须超越可扩展的技术升级;需要彻底的新战略。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A hybrid deep learning approach to vertexing
混合深度学习顶点方法
- DOI:10.1088/1742-6596/1525/1/012079
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Fang, Rui;Schreiner, Henry F;Sokoloff, Michael D;Weisser, Constantin;Williams, Mike
- 通讯作者:Williams, Mike
Progress in developing a hybrid deep learning algorithm for identifying and locating primary vertices
用于识别和定位主要顶点的混合深度学习算法的开发进展
- DOI:10.1051/epjconf/202125104012
- 发表时间:2021-01
- 期刊:
- 影响因子:0
- 作者:Akar, Simon;Atluri, Gowtham;Boettcher, Thomas;Peters, Michael;Schreiner, Henry;Sokoloff, Michael;Stahl, Marian;Tepe, William;Weisser, Constantin;Williams, Mike
- 通讯作者:Williams, Mike
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Michael Sokoloff其他文献
Michael Sokoloff的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Michael Sokoloff', 18)}}的其他基金
Collaborative Research : Elements : Extending the physics reach of LHCb by developing and deploying algorithms for a fully GPU-based first trigger stage
合作研究:要素:通过开发和部署完全基于 GPU 的第一触发阶段的算法来扩展 LHCb 的物理范围
- 批准号:
2004364 - 财政年份:2020
- 资助金额:
$ 22.46万 - 项目类别:
Standard Grant
Collaborative Research: S2I2: Cncp: Conceptualization of an S2I2 Institute for High Energy Physics
合作研究:S2I2:Cncp:S2I2 高能物理研究所的概念化
- 批准号:
1558219 - 财政年份:2016
- 资助金额:
$ 22.46万 - 项目类别:
Standard Grant
Collaborative Research: SI2-SSI: Data-Intensive Analysis for High Energy Physics (DIANA/HEP)
合作研究:SI2-SSI:高能物理数据密集型分析 (DIANA/HEP)
- 批准号:
1450319 - 财政年份:2015
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant
Collaborative Research: Construction of the Upstream Tracker for the LHCb Upgrade
合作研究:LHCb升级上游跟踪器的构建
- 批准号:
1433120 - 财政年份:2014
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant
Enabling High Energy Physics at the Information Frontier Using GPUs and Other Many/Multi-Core Architectures
使用 GPU 和其他多核架构在信息前沿实现高能物理
- 批准号:
1414736 - 财政年份:2014
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant
相似国自然基金
基于肿瘤病理图片的靶向药物敏感生物标志物识别及统计算法的研究
- 批准号:82304250
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
肠道普拉梭菌代谢物丁酸抑制心室肌铁死亡改善老龄性心功能不全的机制研究
- 批准号:82300430
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
社会网络关系对公司现金持有决策影响——基于共御风险的作用机制研究
- 批准号:72302067
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向图像目标检测的新型弱监督学习方法研究
- 批准号:62371157
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
面向开放域对话系统信息获取的准确性研究
- 批准号:62376067
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
相似海外基金
SI2-SSI: Collaborative Research: Einstein Toolkit Community Integration and Data Exploration
SI2-SSI:协作研究:Einstein Toolkit 社区集成和数据探索
- 批准号:
2114580 - 财政年份:2020
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant
Collaborative Research: SI2-SSI: Expanding Volunteer Computing
合作研究:SI2-SSI:扩展志愿者计算
- 批准号:
2039142 - 财政年份:2020
- 资助金额:
$ 22.46万 - 项目类别:
Standard Grant
Collaborative Research: SI2-SSI: Expanding Volunteer Computing
合作研究:SI2-SSI:扩展志愿者计算
- 批准号:
2001752 - 财政年份:2019
- 资助金额:
$ 22.46万 - 项目类别:
Standard Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
- 批准号:
1743180 - 财政年份:2018
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant
Collaborative Research: NISC SI2-S2I2 Conceptualization of CFDSI: Model, Data, and Analysis Integration for End-to-End Support of Fluid Dynamics Discovery and Innovation
合作研究:NISC SI2-S2I2 CFDSI 概念化:模型、数据和分析集成,用于流体动力学发现和创新的端到端支持
- 批准号:
1743191 - 财政年份:2018
- 资助金额:
$ 22.46万 - 项目类别:
Continuing Grant