III: Small: Probabilistic Hashing for Efficient Search Learning
III:小:用于高效搜索学习的概率哈希
基本信息
- 批准号:1319830
- 负责人:
- 金额:$ 47.51万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-01 至 2013-10-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Numerous applications involve massive, high-dimensional datasets. For example, the search industry routinely deals with billions of web pages, where each page is often represented as a binary vector in 2^64 dimensions. In computer vision, images are often represented as non-binary vectors in millions of dimensions. Algorithms which are capable of efficiently compressing, retrieving, and mining these datasets are of high practical importance. Mathematically rigorous and computationally efficient hashing methods will be developed to dramatically reduce ultra-high-dimensional datasets. These algorithms will be integrated with a variety of learning techniques including classification, clustering, near-neighbor search, matrix factorizations, etc. The project builds on and extends minwise hashing, and b-bit minwise hashing which are standard hashing techniques in search applications. The project aims to (i) rigorously analyze b-bit minwise hashing and develop, analyze, and apply significantly more efficient (and more accurate) to problems in search and learning; (ii) develop a unified framework of probabilistic hashing which essentially consists of one permutation followed by (at most) one random projection; (iii) develop a unified theory of summary statistics under a variety of engineering constraints (storage space, computational speed, indexing capability, adaptation to streaming, etc.). Hashing algorithms developed under this framework are expected to be substantially much more efficient and more accurate than existing popular algorithms such as random projections and minwise hashing. This general framework allows the design algorithms to accommodate many different data types (sparse or dense data, binary or real-valued data, static or streaming data), many different engineering needs (computing inner products or lp distances, kernel learning or linear learning), and different storage requirements. Anticipated results of the proposed research include rigorous and computationally efficient hashing algorithms for dealing with ultra-high-dimensional datasets, the integration of the resulting hashing algorithms into with a variety of learning techniques for classification, clustering, near-neighbor search, singular value decompositions, matrix factorization, etc; and rigorous experimental evaluation of the resulting methods on big (e.g., TeraByte or potentially PetaByte) data of the order of up to 2^64 dimensions. Broader Impacts: Effective approaches to building predictive models from extremely high dimensional data can impact many areas of science that rely on machine learning as the primary methodology for knowledge acquisition from data. The PI's education and outreach efforts aim to broaden the participation of women and underrepresented groups. The publications, software, and datasets resulting from the project will be freely disseminated to the larger scientific community.
许多应用程序涉及大量高维数据集。例如,搜索行业通常处理数十亿个网页,其中每个页面通常以2^64维度表示为二进制矢量。在计算机视觉中,图像通常表示为数百万个维度的非二进制向量。能够有效地压缩,检索和挖掘这些数据集的算法非常重要。将开发在数学上严格和计算高效的哈希方法,以大大减少超高维数据集。这些算法将与各种学习技术集成,包括分类,聚类,近邻居搜索,矩阵因素化等。该项目基于并扩展了Minwise Hashing,并且B-Bit Minwise Hashing是搜索应用中的标准哈希技术。该项目的目的是(i)严格分析B-Bit Minwise Hashing并开发,分析并应用于搜索和学习中的问题的效率更高(并且更准确); (ii)开发一个统一的概率散列框架,该框架基本上由一个排列组成,然后是(最多)一个随机投影; (iii)在各种工程限制(存储空间,计算速度,索引能力,对流媒体的适应等)下,开发了统一统计的统一理论。预计在此框架下开发的哈希算法将比现有流行算法(例如随机预测和微小的哈希)更高效,更准确。 该一般框架允许设计算法适应许多不同的数据类型(稀疏或密集的数据,二进制或实用值数据,静态或流数据),许多不同的工程需求(计算内部产品或LP距离,内核学习或线性学习)以及不同的存储需求。拟议的研究的预期结果包括用于处理超高维数据集的严格和计算高效的哈希算法,将结果的哈希算法与各种分类,群集,近犹太人搜索,奇异的值分解,矩阵分解,矩阵分解等各种学习技术的整合;以及对最高2^64维度的大(例如Terabyte或潜在的PB)数据对所得方法进行严格的实验评估。更广泛的影响:从极高的维数据中构建预测模型的有效方法可能会影响许多依赖机器学习作为从数据获取知识的主要方法的科学领域。 PI的教育和外展工作旨在扩大妇女和代表性不足的群体的参与。该项目产生的出版物,软件和数据集将被自由传播给更大的科学界。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

暂无数据
数据更新时间:2024-06-01
Ping Li其他文献
Some results by energy methods on large-time behavior of viscous gas
粘性气体大时间行为能量法的一些结果
- DOI:
- 发表时间:20122012
- 期刊:
- 影响因子:0
- 作者:Ping Li;Kefeng Liu;X. Huang;Martin Guest;X. Huang;A. MatsumuraPing Li;Kefeng Liu;X. Huang;Martin Guest;X. Huang;A. Matsumura
- 通讯作者:A. MatsumuraA. Matsumura
Global strong solution to the 2D nonhomogeneous density-dependent incompressible MHD and Navier-Stokes system
二维非均匀密度相关不可压缩 MHD 和纳维-斯托克斯系统的全局强解
- DOI:
- 发表时间:20132013
- 期刊:
- 影响因子:0
- 作者:Ping Li;Kefeng Liu;X. Huang;Martin Guest;X. HuangPing Li;Kefeng Liu;X. Huang;Martin Guest;X. Huang
- 通讯作者:X. HuangX. Huang
Some applications of Hirzebruch\chi_y genus
Hirzebruch\chi_y 属的一些应用
- DOI:
- 发表时间:20122012
- 期刊:
- 影响因子:0
- 作者:Ping Li;Kefeng Liu;X. Huang;Martin Guest;X. Huang;A. Matsumura;Martin Guest;X. Huang;Ping LiPing Li;Kefeng Liu;X. Huang;Martin Guest;X. Huang;A. Matsumura;Martin Guest;X. Huang;Ping Li
- 通讯作者:Ping LiPing Li
Elastic anisotropies and thermal conductivities of WB2 diborides in different crystal structures: A first-principles calculation
不同晶体结构中WB2二硼化物的弹性各向异性和热导率:第一性原理计算
- DOI:10.1016/j.jallcom.2018.03.10910.1016/j.jallcom.2018.03.109
- 发表时间:2018-052018-05
- 期刊:
- 影响因子:6.2
- 作者:Ping Li;Lishi Ma;Mingjun Peng;Baipo Shu;Yonghua DuanPing Li;Lishi Ma;Mingjun Peng;Baipo Shu;Yonghua Duan
- 通讯作者:Yonghua DuanYonghua Duan
Large aperture and non-critical phase-matched fourth harmonic generation of Nd:Glass lasers
大孔径、非临界相位匹配四次谐波产生 Nd:Glass 激光器
- DOI:10.1063/1.508745310.1063/1.5087453
- 发表时间:2019-042019-04
- 期刊:
- 影响因子:5.1
- 作者:Fang Wang;Fuquan Li;Wei Han;Wei Wang;Ping Li;Lidan Zhou;Yong Xiang;Bin Feng;Xuewei Deng;Jingqin Su;Qihua ZhuFang Wang;Fuquan Li;Wei Han;Wei Wang;Ping Li;Lidan Zhou;Yong Xiang;Bin Feng;Xuewei Deng;Jingqin Su;Qihua Zhu
- 通讯作者:Qihua ZhuQihua Zhu
共 1620 条
- 1
- 2
- 3
- 4
- 5
- 6
- 324
Ping Li的其他基金
Collaborative Research: Study of A- and B-class dye-decolorizing peroxidases (DyPs): From molecular mechanisms to applications in dye removal and lignin degradation
合作研究:A 类和 B 类染料脱色过氧化物酶 (DyPs) 的研究:从分子机制到在染料去除和木质素降解中的应用
- 批准号:18075321807532
- 财政年份:2018
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
Efficient Data Reduction and Summarization
高效的数据缩减和汇总
- 批准号:14441241444124
- 财政年份:2014
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Continuing GrantContinuing Grant
Neurocognitive Mechanisms of Second Language Learning: Role of Learning Context and Cognitive Functions
第二语言学习的神经认知机制:学习情境和认知功能的作用
- 批准号:13389461338946
- 财政年份:2013
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
III: Small: Probabilistic Hashing for Efficient Search Learning
III:小:用于高效搜索学习的概率哈希
- 批准号:13609711360971
- 财政年份:2013
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Continuing GrantContinuing Grant
BIGDATA: Small: DA: A Random Projection Approach
大数据:小:DA:随机投影方法
- 批准号:14192101419210
- 财政年份:2013
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
BIGDATA: Small: DA: A Random Projection Approach
大数据:小:DA:随机投影方法
- 批准号:12509141250914
- 财政年份:2013
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
EAGER: Preliminary Study of Hashing Algorithms for Large-Scale Learning
EAGER:大规模学习的哈希算法初步研究
- 批准号:12493161249316
- 财政年份:2012
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
Collaborative Research: Cross-Language Lexical Interaction
合作研究:跨语言词汇交互
- 批准号:10578771057877
- 财政年份:2011
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
Efficient Data Reduction and Summarization
高效的数据缩减和汇总
- 批准号:08088640808864
- 财政年份:2008
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Continuing GrantContinuing Grant
RUI: Self-organization and the Acquisition, Representation, and Processing of Language
RUI:自组织和语言的习得、表示和处理
- 批准号:01318290131829
- 财政年份:2003
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Continuing GrantContinuing Grant
相似国自然基金
高维小失效概率下的涡轮盘疲劳寿命可靠性优化设计方法研究
- 批准号:12302154
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
平均风险时间表征方式对小概率持续风险决策的影响机制及助推效应
- 批准号:72001158
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
基于重采样机制的电力系统小概率高危风险评估理论与应用研究
- 批准号:
- 批准年份:2019
- 资助金额:57 万元
- 项目类别:面上项目
自然电位的小波分析与概率成像联合方法及其在滑坡监测中的应用探索
- 批准号:41874082
- 批准年份:2018
- 资助金额:65.0 万元
- 项目类别:面上项目
小概率失效下基于主动学习Kriging模型的可靠性分析方法研究
- 批准号:51705433
- 批准年份:2017
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
III: Small: Scalable Probabilistic Inference for Large Knowledge Bases
III:小:大型知识库的可扩展概率推理
- 批准号:16147381614738
- 财政年份:2016
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
III: Small: Efficient Query Processing over Large Probabilistic Knowledge Bases
III:小型:大型概率知识库的高效查询处理
- 批准号:15267531526753
- 财政年份:2015
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
III: Small: Collaborative Research: Probabilistic Models using Generalized Exponential Families
III:小:协作研究:使用广义指数族的概率模型
- 批准号:15647651564765
- 财政年份:2015
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant
III: Small: Probabilistic Hashing for Efficient Search Learning
III:小:用于高效搜索学习的概率哈希
- 批准号:13609711360971
- 财政年份:2013
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Continuing GrantContinuing Grant
III: Small: Collaborative Research: Probabilistic Models using Generalized Exponential Families
III:小:协作研究:使用广义指数族的概率模型
- 批准号:11177051117705
- 财政年份:2011
- 资助金额:$ 47.51万$ 47.51万
- 项目类别:Standard GrantStandard Grant