CAREER: Towards Trustworthy Analytics

职业：走向值得信赖的分析

基本信息

批准号：
1942429
负责人：
Khairi Reda
金额：
$ 53.81万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-05-01 至 2025-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1942429&HistoricalAwards=false
关键词：
CAREER Towards Trustworthy Analytics

项目摘要

Tools that create visualizations of data are increasingly important for discovery and decision-making in a range of domains, from science and engineering to commerce. Data analysts use these tools to rapidly slice and dice their data, often inspecting a large number of visualizations in the process. Though useful for exploration, these visualizations can also expose random data fluctuations, which could be mistaken for real patterns. If analysts are not careful in interpreting these apparent patterns, they could inadvertently make false discoveries or take incorrect decisions. The goal of this research is to reduce the risk from spurious patterns arising in interactive data analyses. The project comprises three stages: (1) developing techniques for capturing analyst beliefs, expectations, and intentions as they conduct visual analysis; (2) using this data to develop algorithms that forecast the reliability of emerging visualizations; and (3) evaluating strategies for communicating the risk of false patterns. The resulting techniques will be validated and incorporated in tools for detecting RNA modifications from noisy sequencing data, in collaboration with bioinformatics researchers. The expected impact of this project is to aid analysts in assessing the reliability of insights, while guarding against visualizations that seem convincing but that are likely to be misleading. This in turn could broaden the adoption of visual analytics tools, increase the confidence in conclusions, and potentially reduce the incidence of false discovery. As part of this research, the team will develop interactive educational materials for training students in reliable data-driven inference. These learning modules will be disseminated in a format that allows customization by data science instructors for inclusion into existing curricula. Lastly, the project will provide opportunities for graduate research training and incorporate K-12 outreach activities that introduce young learners to data science.The project comprises three main activities: (1) Prototyping techniques to incrementally elicit analysts' belief and prior knowledge as they make sense of data. The elicited knowledge will then be used to distinguish between a gamut of intentions: from planned analyses with substantive hypotheses, to purely exploratory actions with minimal expectations. (2) The project will next develop a model to predict the reliability of apparent patterns and insights unearthed at different points in the analysis cycle. To build this model, the research team will use a variety of features, including the specificity of analyst intents, the degree to which their expectations are borne out in the data, as well as their behavior and interactions with visualizations. The elicitation techniques and the insight reliability model will then be refined in a series of visual analysis studies and through crowdsourced experiments, in which participants' declared priors and discoveries are used to improve the accuracy of the model in forecasting spurious patterns. Lastly, (3) the project will identify and characterize strategies for communicating the risk of spurious insights to analysts in real time. In particular, the team will evaluate techniques for directly visualizing risk indicators, as well as indirect methods whereby the visual encodings of the data will be adjusted depending on how risky it is predicted to be. The developed interventions will be evaluated both in experiments and in a bioinformatics application, to assess whether they reduce the rate of false discovery. The expected results include new methods for eliciting analyst beliefs, techniques to forecast and communicate the trustworthiness of insights, and instructional materials for teaching robust data analytic practices. The products will be disseminated in publications, and in the form of open-source software and learning modules.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

创建数据可视化的工具对于从科学和工程到商业等一系列领域的发现和决策越来越重要。数据分析师使用这些工具快速对数据进行切片和切块，通常在此过程中检查大量可视化。虽然这些可视化对于探索很有用，但也可能暴露随机数据波动，这可能会被误认为是真实模式。如果分析师在解释这些明显的模式时不小心，他们可能会无意中做出错误的发现或做出错误的决定。这项研究的目标是降低交互式数据分析中出现虚假模式的风险。该项目包括三个阶段：(1) 开发在分析师进行可视化分析时捕捉其信念、期望和意图的技术； (2) 使用这些数据来开发预测新兴可视化可靠性的算法； (3) 评估传达错误模式风险的策略。由此产生的技术将与生物信息学研究人员合作进行验证，并纳入用于从嘈杂的测序数据中检测 RNA 修饰的工具中。该项目的预期影响是帮助分析师评估见解的可靠性，同时防止看似令人信服但可能具有误导性的可视化。这反过来又可以扩大可视化分析工具的采用，增加结论的可信度，并有可能减少错误发现的发生率。作为这项研究的一部分，该团队将开发交互式教育材料，以培训学生进行可靠的数据驱动推理。这些学习模块将以允许数据科学讲师进行定制的格式传播，以纳入现有课程。最后，该项目将为研究生研究培训提供机会，并纳入 K-12 外展活动，向年轻学习者介绍数据科学。该项目包括三项主要活动：(1) 原型技术，以逐步引出分析师的信念和先验知识。数据感。然后，所得出的知识将用于区分一系列意图：从带有实质性假设的计划分析，到带有最低期望的纯粹探索性行动。 (2) 该项目接下来将开发一个模型来预测分析周期中不同点发现的明显模式和见解的可靠性。为了构建这个模型，研究团队将使用各种特征，包括分析师意图的特异性、他们的期望在数据中得到证实的程度，以及他们的行为和与可视化的交互。然后，将通过一系列视觉分析研究和众包实验来完善启发技术和洞察可靠性模型，其中参与者声明的先验和发现用于提高模型预测虚假模式的准确性。最后，(3) 该项目将确定并描述向分析师实时传达虚假见解风险的策略。特别是，该团队将评估直接可视化风险指标的技术，以及间接方法，根据预测的风险程度来调整数据的视觉编码。所开发的干预措施将在实验和生物信息学应用中进行评估，以评估它们是否降低了错误发现率。预期结果包括引发分析师信念的新方法、预测和传达见解可信度的技术以及教授稳健数据分析实践的教学材料。这些产品将以开源软件和学习模块的形式在出版物中传播。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（6）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Color Nameability Predicts Inference Accuracy in Spatial Visualizations

DOI：
10.1111/cgf.14288
发表时间：
2021-06
期刊：
Computer Graphics Forum
影响因子：
2.5
作者：
K. Reda;Ameya Salvi;Jack Gray;M. Papka
通讯作者：
K. Reda;Ameya Salvi;Jack Gray;M. Papka

Data Prophecy: Exploring the Effects of Belief Elicitation in Visual Analytics

DOI：
10.1145/3411764.3445798
发表时间：
2021-01
期刊：
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
影响因子：
0
作者：
Ratanond Koonchanok;Parul Baser;Abhinav Sikharam;Nirmal Kumar Raveendranath;K. Reda
通讯作者：
Ratanond Koonchanok;Parul Baser;Abhinav Sikharam;Nirmal Kumar Raveendranath;K. Reda

Rainbow Colormaps: What are They Good and Bad for?

DOI：
10.1109/tvcg.2022.3214771
发表时间：
2023-12-01
期刊：
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
影响因子：
5.2
作者：
Reda，Khairi
通讯作者：
Reda，Khairi

Rainbows Revisited: Modeling Effective Colormap Design for Graphical Inference

DOI：
10.1109/tvcg.2020.3030439
发表时间：
2020-08
期刊：
IEEE Transactions on Visualization and Computer Graphics
影响因子：
5.2
作者：
K. Reda;D. Szafir
通讯作者：
K. Reda;D. Szafir

Visual Belief Elicitation Reduces the Incidence of False Discovery

视觉信念诱导减少错误发现的发生率

DOI：
10.1145/3544548.3580808
发表时间：
2023
期刊：
Proceedings of CHI 2023: ACM Conference on Human Factors in Computing Systems
影响因子：
0
作者：
Koonchanok, Ratanond;Tawde, Gauri Yatindra;Narayanasamy, Gokul Ragunandhan;Walimbe, Shalmali;Reda, Khairi
通讯作者：
Reda, Khairi