CAREER: A Theoretical Exploration of Efficient and Accurate Clustering Algorithms

职业生涯:高效准确聚类算法的理论探索

基本信息

  • 批准号:
    2337832
  • 负责人:
  • 金额:
    $ 64.77万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-05-01 至 2029-04-30
  • 项目状态:
    未结题

项目摘要

Clustering, a fundamental technique in data analysis and machine learning, involves grouping data points to ensure higher similarities within a group (cluster) than across clusters. Since its inception in the early 20th century, clustering has proven highly valuable in diverse fields such as biology, economics, marketing, statistics, computer science, and social network analysis. This CAREER project aims to create a unified framework along with various tools and techniques to design optimal approximation algorithms for a broad range of NP-hard clustering problems. Despite significant progress in developing efficient approximations for computationally hard clustering problems, the analysis of the large-scale and complex datasets remains challenging resulting in suboptimal accuracy and limiting practical applications in science and engineering. The algorithmic development and analysis are expected to have a direct impact on the fields of data science and bioinformatics. The educational components of the project include a 3-day summer workshop for high school students, training and mentoring of graduate students, development of new courses, and the fostering of student participation from underrepresented groups in theoretical computer science.The project comprises two primary thrusts. The first thrust focuses on addressing clustering challenges in strings with special emphasis on centroid-based clustering problems such as k-median and k-center. This study will encompass various metric spaces including edit distance, Ulam, and Kendall tau. The objective is to formulate methodologies that transcend the limitations set by the triangle inequality, yielding polynomial-time algorithms with approximations arbitrarily close to one. The second thrust focuses on issues related to hierarchical clustering. Here the objective is to evaluate how well the techniques and frameworks originally devised for addressing string clustering problems can be adapted for aggregating hierarchical clusters. Additionally, the project will develop new methods for constructing hierarchical clusters specifically tailored to address challenges related to large datasets. The new tools and combinatorial methods developed in this project are expected to enhance algorithms for clustering problems and should find applicability in various other domains including communication complexity, streaming algorithms, and distributed systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
聚类是数据分析和机器学习中的一项基本技术,涉及对数据点进行分组,以确保组(簇)内的相似性高于簇间的相似性。自 20 世纪初诞生以来,集群已被证明在生物学、经济学、市场营销、统计学、计算机科学和社交网络分析等不同领域具有极高的价值。该 CAREER 项目旨在创建一个统一的框架以及各种工具和技术,为广泛的 NP 难聚类问题设计最佳逼近算法。尽管在开发计算困难聚类问题的有效近似方法方面取得了重大进展,但对大规模和复杂数据集的分析仍然具有挑战性,导致精度不理想并限制了科学和工程中的实际应用。算法的开发和分析预计将对数据科学和生物信息学领域产生直接影响。该项目的教育组成部分包括为高中生举办为期 3 天的夏季研讨会、研究生培训和指导、新课程开发以及促进理论计算机科学领域代表性不足群体的学生参与。该项目包括两个主要目标。第一个重点是解决字符串中的聚类挑战,特别强调基于质心的聚类问题,例如 k 中值和 k 中心。这项研究将涵盖各种度量空间,包括编辑距离、Ulam 和 Kendall tau。目标是制定超越三角不等式限制的方法,产生近似值任意接近于 1 的多项式时间算法。第二个重点关注与层次聚类相关的问题。这里的目标是评估最初为解决字符串聚类问题而设计的技术和框架如何适用于聚合层次聚类。此外,该项目将开发新的方法来构建专门定制的分层集群,以解决与大型数据集相关的挑战。该项目中开发的新工具和组合方法预计将增强聚类问题的算法,并应在其他各个领域找到适用性,包括通信复杂性、流算法和分布式系统。该奖项反映了 NSF 的法定使命,并被认为值得支持通过使用基金会的智力优点和更广泛的影响审查标准进行评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Debarati Das其他文献

AdBERT: An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements
AdBERT:一种有效的少数镜头学习框架,用于将推文与超级碗广告对齐
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Debarati Das;Roopana Chenchu;M. Abdollahi;J. Huh;Jaideep Srivastava
  • 通讯作者:
    Jaideep Srivastava
Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding
改进的 Dyck 编辑距离和 RNA 折叠近似算法
A Linear-Time n0.4-Approximation for Longest Common Subsequence
最长公共子序列的线性时间n0.4近似
  • DOI:
    10.1145/3568398
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    1.3
  • 作者:
    K. Bringmann;Vincent Cohen;Debarati Das
  • 通讯作者:
    Debarati Das
A Near-Optimal Offline Algorithm for Dynamic All-Pairs Shortest Paths in Planar Digraphs
平面有向图中动态全对最短路径的近最优离线算法
Theory of Computing
计算理论
  • DOI:
    10.4086/toc
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alexandr Andoni;Nikhil Bansal;P. Beame;Giuseppe Italiano;Sanjeev Khanna;Ryan O’Donnell;T. Pitassi;T. Rabin;Tim Roughgarden;Clifford Stein;Rocco Servedio;Amir Abboud;Nima Anari;Ibm Srinivasan Arunachalam;T. J. Watson;Research Center;Petra Berenbrink;Aaron Bernstein;Aditya Bhaskara;Sayan Bhattacharya;Eric Blais;H. Bodlaender;Adam Bouland;Anne Broadbent;Mark Bun;Timothy Chan;Arkadev Chattopadhyay;Xue Chen;Gil Cohen;Dana Dachman;Anindya De;Shahar Dobzhinski;Zhiyi Huang;Ken;Robin Kothari;Marvin Künnemann;Tu Kaiserslautern;Rasmus Kyng;E. Zurich;Sophie Laplante;D. Lokshtanov;S. Mahabadi;Nicole Megow;Ankur Moitra;Technion Shay Moran;Google Research;Christopher Musco;Prasad Raghavendra;Alex Russell;Laura Sanità;Alex Slivkins;David Steurer;Epfl Ola Svensson;Chaitanya Swamy;Madhur Tulsiani;Christos Tzamos;Andreas Wiese;Mary Wootters;Huacheng Yu;Aaron Potechin;Aaron Sidford;Aarushi Goel;Aayush Jain;Abhiram Natarajan;Abhishek Shetty;Adam Karczmarz;Adam O’Neill;Aditi Dudeja;Aditi Laddha;Aditya Krishnan;Adrian Vladu Afrouz;J. Ameli;Ainesh Bakshi;Akihito Soeda;Akshay Krishnamurthy;Albert Cheu;A. Grilo;Alex Wein;Alexander Belov;Alexander Block;Alexander Golovnev;Alexander Poremba;Alexander Shen;Alexander Skopalik;Alexandra Henzinger;Alexandros Hollender;Ali Parviz;Alkis Kalavasis;Allen Liu;Aloni Cohen;Amartya Shankha;Biswas Amey;Bhangale Amin;Coja;Yehudayoff Amir;Zandieh Amit;Daniely Amit;Kumar Amnon;Ta;Beimel Anand;Louis Anand Natarajan;Anders Claesson;André Chailloux;André Nusser;Andrea Coladangelo;Andrea Lincoln;Andreas Björklund;Andreas Maggiori;A. Krokhin;A. Romashchenko;Andrej Risteski;Anirban Chowdhury;Anirudh Krishna;A. Mukherjee;Ankit Garg;Anna Karlin;Anthony Leverrier;Antonio Blanca;A. Antoniadis;Anupam Gupta;Anupam Prakash;A. Singh;Aravindan Vijayaraghavan;Argyrios Deligkas;Ariel Kulik;Ariel Schvartzman;Ariel Shaulker;A. Cornelissen;Arka Rai;Choudhuri Arkady;Yerukhimovich Arnab;Bhattacharyya Arthur Mehta;Artur Czumaj;A. Backurs;A. Jambulapati;Ashley Montanaro;A. Sah;A. Mantri;Aviad Rubinstein;Avishay Tal;Badih Ghazi;Bartek Blaszczyszyn;Benjamin Moseley;Benny Pinkas;Bento Natura;Bernhard Haeupler;Bill Fefferman;B. Mance;Binghui Peng;Bingkai Lin;B. Sinaimeri;Bo Waggoner;Bodo Manthey;Bohdan Kivva;Brendan Lucier Bundit;Laekhanukit Burak;Sahinoglu Cameron;Seth Chaodong Zheng;Charles Carlson;Chen;Chenghao Guo;Chenglin Fan;Chenwei Wu;Chethan Kamath;Chi Jin;J. Thaler;Jyun;Kaave Hosseini;Kaito Fujii;Kamesh Munagala;Kangning Wang;Kanstantsin Pashkovich;Karl Bringmann Karol;Wegrzycki Karteek;Sreenivasaiah Karthik;Chandrasekaran Karthik;Sankararaman Karthik;C. S. K. Green;Larsen Kasturi;Varadarajan Keita;Xagawa Kent Quanrud;Kevin Schewior;Kevin Tian;Kilian Risse;Kirankumar Shiragur;K. Pruhs;K. Efremenko;Konstantin Makarychev;Konstantin Zabarnyi;Krišj¯anis Pr¯usis;Kuan Cheng;Kuikui Liu;Kunal Marwaha;Lars Rohwedder László;Kozma László;A. Végh;L'eo Colisson;Leo de Castro;Leonid Barenboim Letong;Li;Li;L. Roditty;Lieven De;Lathauwer Lijie;Chen Lior;Eldar Lior;Rotem Luca Zanetti;Luisa Sinisclachi;Luke Postle;Luowen Qian;Lydia Zakynthinou;Mahbod Majid;Makrand Sinha;Malin Rau Manas;Jyoti Kashyop;Manolis Zampetakis;Maoyuan Song;Marc Roth;Marc Vinyals;Marcin Bieńkowski;Marcin Pilipczuk;Marco Molinaro;Marcus Michelen;Mark de Berg;M. Jerrum;Mark Sellke;Mark Zhandry;Markus Bläser;Markus Lohrey;Marshall Ball;Marthe Bonamy;Martin Fürer;Martin Hoefer;M. Kokainis;Masahiro Hachimori;Matteo Castiglioni;Matthias Englert;Matti Karppa;Max Hahn;Max Hopkins;Maximilian Probst;Gutenberg Mayank Goswami;Mehtaab Sawhney;Meike Hatzel;Meng He;Mengxiao Zhang;Meni Sadigurski;M. Parter;M. Dinitz;Michael Elkin;Michael Kapralov;Michael Kearns;James R. Lee;Sudatta Bhattacharya;Michal Koucký;Hadley Black;Deeparnab Chakrabarty;C. Seshadhri;Mahsa Derakhshan;Naveen Durvasula;Nika Haghtalab;Peter Kiss;Thatchaphol Saranurak;Soheil Behnezhad;M. Roghani;Hung Le;Shay Solomon;Václav Rozhon;Anders Martinsson;Christoph Grunau;G. Z. —. Eth;Zurich;Switzerland;Morris Yau — Massachusetts;Noah Golowich;Dhruv Rohatgi — Massachusetts;Qinghua Liu;Praneeth Netrapalli;Csaba Szepesvári;Debarati Das;Jacob Gilbert;Mohammadtaghi Hajiaghayi;Tomasz Kociumaka;B. Saha;K. Bringmann;Nick Fischer — Weizmann;Ce Jin;Yinzhan Xu — Massachusetts;Virginia Vassilevska Williams;Yinzhan Xu;Josh Alman;Kevin Rao;Hamed Hatami;—. XiangMeng;McGill University;Edith Cohen;Xin Lyu;Tamás Jelani Nelson;Uri Stemmer — Google;Research;Daniel Alabi;Pravesh K. Kothari;Pranay Tankala;Prayaag Venkat;Fred Zhang;Samuel B. Hopkins;Gautam Kamath;Shyam Narayanan — Massachusetts;Marco Gaboardi;R. Impagliazzo;Rex Lei;Satchit Sivakumar;Jessica Sorrell;T. Korhonen;Marco Bressan;Matthias Lanzinger;Huck Bennett;Mahdi Cheraghchi;V. Guruswami;João Ribeiro;Jan Dreier;Nikolas Mählmann;Sebastian Siebertz — TU Wien;The Randomized k ;Conjecture Is;False;Sébastien Bubeck;Christian Coester;Yuval Rabani — Microsoft;Wei;Ethan Mook;Daniel Wichs;Joshua Brakensiek;Sai Sandeep — Stanford;University;Lorenzo Ciardo;Stanislav Živný;Amey Bhangale;Subhash Khot;Dor Minzer;David Ellis;Guy Kindler;Noam Lifshitz;Ronen Eldan;Dan Mikulincer;George Christodoulou;E. Koutsoupias;Annamária Kovács;José Correa;Andrés Cristi;Xi Chen;Matheus Venturyne;Xavier Ferreira;David C. Parkes;Yang Cai;Jinzhao Wu;Zhengyang Liu;Zeyu Ren;Zihe Wang;Ravishankar Krishnaswamy;Shi Li;Varun Suriyanarayana
  • 通讯作者:
    Varun Suriyanarayana

Debarati Das的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Debarati Das', 18)}}的其他基金

Travel: NSF Student Travel Grant for TCS for All Meeting at STOC 2023 and Professional Mentoring Panel at FOCS 2023
旅行:为 STOC 2023 的所有会议和 FOCS 2023 的专业指导小组提供 TCS 的 NSF 学生旅行补助金
  • 批准号:
    2326395
  • 财政年份:
    2023
  • 资助金额:
    $ 64.77万
  • 项目类别:
    Standard Grant

相似国自然基金

基于新一代信息技术的复杂油气储层地震勘探理论和方法
  • 批准号:
    42330801
  • 批准年份:
    2023
  • 资助金额:
    231 万元
  • 项目类别:
    重点项目
主动源与被动源数据联合驱动的多震源地震勘探理论方法研究
  • 批准号:
    42130805
  • 批准年份:
    2021
  • 资助金额:
    290 万元
  • 项目类别:
    重点项目
煤系气高效勘探开发的岩石力学地层理论方法体系研究
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    300 万元
  • 项目类别:
    重点项目
地震波的形态特征属性与智能识别方法研究
  • 批准号:
    41904098
  • 批准年份:
    2019
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目
深度偏移地震数据特征剖析与深度域直接反演方法研究
  • 批准号:
    41874153
  • 批准年份:
    2018
  • 资助金额:
    63.0 万元
  • 项目类别:
    面上项目

相似海外基金

Sibling-Support for Adolescent Girls (SSAGE): A whole-family, gendertransformative approach to preventing mental illness among forcibly displaced adolescent girls
青春期女孩兄弟姐妹支持 (SSAGE):一种全家庭、性别变革的方法,用于预防被迫流离失所的青春期女孩的精神疾病
  • 批准号:
    10730656
  • 财政年份:
    2023
  • 资助金额:
    $ 64.77万
  • 项目类别:
Exploration of universal planet formation processes through the theoretical and observational studies of chemical structures in protoplanetary disks and exoplanetary atmospheres
通过对原行星盘和系外行星大气中化学结构的理论和观测研究,探索普遍的行星形成过程
  • 批准号:
    23KJ0329
  • 财政年份:
    2023
  • 资助金额:
    $ 64.77万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Experimental and theoretical studies of trace species in Earth materials, from mantle geodynamics to environmental applications and mineral exploration
地球材料中痕量物质的实验和理论研究,从地幔地球动力学到环境应用和矿物勘探
  • 批准号:
    RGPIN-2018-04106
  • 财政年份:
    2022
  • 资助金额:
    $ 64.77万
  • 项目类别:
    Discovery Grants Program - Individual
Experimental and theoretical studies of trace species in Earth materials, from mantle geodynamics to environmental applications and mineral exploration
地球材料中痕量物质的实验和理论研究,从地幔地球动力学到环境应用和矿物勘探
  • 批准号:
    RGPIN-2018-04106
  • 财政年份:
    2021
  • 资助金额:
    $ 64.77万
  • 项目类别:
    Discovery Grants Program - Individual
A theoretical exploration of the evolution of wind pollination
风授粉演化的理论探索
  • 批准号:
    566053-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 64.77万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了