Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values

复杂特征数据的统计分析:高维、测量误差和缺失值

基本信息

  • 批准号:
    RGPIN-2018-03819
  • 负责人:
  • 金额:
    $ 2.71万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2020
  • 资助国家:
    加拿大
  • 起止时间:
    2020-01-01 至 2021-12-31
  • 项目状态:
    已结题

项目摘要

As the advancement of modern technology in acquiring data, data with diverse features are becoming more accessible than ever before. The increasing complexity of structures and the large dimension of data have posed an urgent need for the development of novel and flexible modeling and analysis tools. While many complex features may be present in different applications, this research focuses on two prevailing issues commonly present in modern data : the quality and dimensionality of data. I plan to explore important problems in the following areas. (1) High dimensional data with measurement error and missing values In the era of Big Data, large scale data are often available where the dimension of the variables is much larger than the number of subjects in the study. This presents a great challenge to traditional statistical methods which normally require the sample size to be bigger than the dimension of the variables. In addition, we face challenges related to data quality - measurement imprecision and missing observations. This research aims to investigate problems concerning high dimensionality, measurement error, and missing observations. The plan is to examine how measurement error and missing values may interplay in the analysis of high dimensional data. The objectives are to develop valid inference methods to handle data with all these features involved. Applications of the developed methods to survival data, image data and longitudinal data are planned. (2) Causal inference with complex featured data As opposed to association studies, causal inference is often the focus of empirical research. While many research methods are available for various settings, they are vulnerable to poor quality data. Most existing methods require that the data are “perfect” in the sense that no missing observations nor measurement error are present, but these assumptions are often violated in practice. Measurement error and missing observations have been a long standing concern in many studies including epidemiological, nutrition and environmental studies. However, research on causal inference with these features is rather limited and remains unexplored. I plan to explore this exciting area and develop new methods to address complex effects caused by measurement error and/or missing observation on causal inference. Furthermore, I intend to investigate the problems in the presence of large scale data where the dimension of potential confounders is high. My primary goals are to develop original and innovative methodology in advancing foundational work and to facilitate applications. This research is anticipated to provide valuable insights into making the best use of available large scale data and to broaden the scope of existing strategies and research. It is expected to have significant impact on the statistical community as well as other fields including public health, medical studies and data science.
随着现代数据获取技术的进步,具有多样化特征的数据变得比以往任何时候都更加容易获取,结构的复杂性和数据的大维度迫切需要开发新颖且灵活的建模和分析工具。虽然不同的应用中可能存在许多复杂的特征,但本研究重点关注现代数据中常见的两个普遍问题:数据的质量和维度。我计划探索以下领域的重要问题。 (1) 存在测量误差和缺失值的高维数据 在大数据时代,数据规模往往远大于研究对象的数量,这对传统统计方法提出了巨大的挑战,因为传统的统计方法通常需要更大的样本量。此外,我们还面临与数据质量相关的挑战——测量不精确和缺失观测值。本研究旨在研究有关高维度、测量误差和缺失观测值的问题。值在高维数据的分析中可能会相互作用。目标是开发有效的推理方法来处理涉及所有这些特征的数据,计划将所开发的方法应用于生存数据、图像数据和纵向数据。 (2) 复杂特征数据的因果推理 与关联研究相反,因果推断通常是实证研究的重点,尽管许多研究方法可用于各种环境,但它们很容易受到质量较差的数据的影响,因为大多数现有方法都要求数据是“完美的”。观测值缺失和测量误差都不存在,但在实践中这些假设经常被违反。测量误差和观测值缺失一直是包括流行病学、营养学和环境研究在内的许多研究中长期存在的问题。然而,对这些特征的因果推断的研究却相当多。我计划有限且尚未探索。探索这个令人兴奋的领域并开发新的方法来解决由测量误差和/或因果推理缺失观察引起的复杂影响此外,我打算研究存在潜在混杂因素维度较高的大规模数据的问题。 我的主要目标是开发原创和创新的方法来推进基础工作并促进应用,预计这项研究将为充分利用现有的大规模数据并扩大现有策略和研究的范围提供有价值的见解。预计将对统计界以及公共卫生、医学研究和数据科学等其他领域产生重大影响。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yi, Grace其他文献

The Effect of Intimate Partner Violence and Probable Traumatic Brain Injury on Mental Health Outcomes for Black Women
  • DOI:
    10.1080/10926771.2019.1587657
  • 发表时间:
    2019-01-01
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Cimino, Andrea N.;Yi, Grace;Stockman, Jamila K.
  • 通讯作者:
    Stockman, Jamila K.
Assessing trauma and related distress in refugee youth and their caregivers: should we be concerned about iatrogenic effects?
  • DOI:
    10.1007/s00787-020-01635-z
  • 发表时间:
    2021-09
  • 期刊:
  • 影响因子:
    6.4
  • 作者:
    Greene, M. Claire;Kane, Jeremy C.;Bolton, Paul;Murray, Laura K.;Wainberg, Milton L.;Yi, Grace;Sim, Amanda;Puffer, Eve;Ismael, Abdulkadir;Hall, Brian J.
  • 通讯作者:
    Hall, Brian J.

Yi, Grace的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yi, Grace', 18)}}的其他基金

Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
  • 批准号:
    RGPIN-2018-03819
  • 财政年份:
    2022
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual
Data Science
数据科学
  • 批准号:
    CRC-2019-00427
  • 财政年份:
    2022
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Canada Research Chairs
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
  • 批准号:
    RGPIN-2018-03819
  • 财政年份:
    2021
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual
Data Science
数据科学
  • 批准号:
    CRC-2019-00427
  • 财政年份:
    2021
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Canada Research Chairs
Data Science
数据科学
  • 批准号:
    CRC-2019-00427
  • 财政年份:
    2020
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Canada Research Chairs
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
  • 批准号:
    RGPIN-2018-03819
  • 财政年份:
    2020
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual
Data Science
数据科学
  • 批准号:
    CRC-2019-00427
  • 财政年份:
    2019
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Canada Research Chairs
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
  • 批准号:
    RGPIN-2018-03819
  • 财政年份:
    2019
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
  • 批准号:
    RGPIN-2018-03819
  • 财政年份:
    2018
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical Methods on Challenging Issues of Biosciences
生物科学难题的统计方法
  • 批准号:
    239733-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Discovery Grants Program - Individual

相似国自然基金

基于多变量统计分析的复杂装备故障监测与诊断方法研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
基于高维医疗影像和复杂相依生存数据的分析与统计推断
  • 批准号:
    12226416
  • 批准年份:
    2022
  • 资助金额:
    45 万元
  • 项目类别:
    面上项目
基于贝叶斯非参数遗传预测的复杂疾病两阶段全转录组关联分析建模策略与统计方法研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    55 万元
  • 项目类别:
    面上项目
存在干预下复杂生存数据因果推断问题的统计分析
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
复杂网络数据的整合分析和统计推断
  • 批准号:
    12171079
  • 批准年份:
    2021
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Time series clustering to identify and translate time-varying multipollutant exposures for health studies
时间序列聚类可识别和转化随时间变化的多污染物暴露以进行健康研究
  • 批准号:
    10749341
  • 财政年份:
    2024
  • 资助金额:
    $ 2.71万
  • 项目类别:
REU Site: University of North Carolina at Greensboro - Complex Data Analysis using Statistical and Machine Learning Tools
REU 站点:北卡罗来纳大学格林斯伯勒分校 - 使用统计和机器学习工具进行复杂数据分析
  • 批准号:
    2244160
  • 财政年份:
    2023
  • 资助金额:
    $ 2.71万
  • 项目类别:
    Standard Grant
Bayesian genetic association analysis of all rare diseases in the Kids First cohort
Kids First 队列中所有罕见疾病的贝叶斯遗传关联分析
  • 批准号:
    10643463
  • 财政年份:
    2023
  • 资助金额:
    $ 2.71万
  • 项目类别:
Neuroimaging Dimensions at the Extremes of the Schizophrenia Spectrum
精神分裂症谱系极端的神经影像维度
  • 批准号:
    10753887
  • 财政年份:
    2023
  • 资助金额:
    $ 2.71万
  • 项目类别:
New approaches for leveraging single-cell data to identify disease-critical genes and gene sets
利用单细胞数据识别疾病关键基因和基因集的新方法
  • 批准号:
    10768004
  • 财政年份:
    2023
  • 资助金额:
    $ 2.71万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了