SEER RRSS #5 - Constructing Geographic Areas in GIS for Cancer Data Analysis

先知RRSS

基本信息

  • 批准号:
    7952665
  • 负责人:
  • 金额:
    $ 6.35万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2005
  • 资助国家:
    美国
  • 起止时间:
    2005-08-01 至 2010-07-31
  • 项目状态:
    已结题

项目摘要

Compared with other diseases such as cardiovascular disease and diabetes, cancer is a relatively rare disease. The analysis of cancer incidence often suffers from the small population problem manifested in unreliable rate estimates, sensitivity to missing data and other data errors, and data suppression in sparsely populated areas. When creating maps of cancer incidence, the choice of areal unit of analysis (e.g., county or parish, zip code, census tract) and the geographic region of interest determine whether there will be sufficient numbers of cases in each area. For example, on the State Cancer Profiles website, cancer rates are mapped at the county or parish level. A map of Louisiana¿s parish-level incidence rates for cancer of the brain and other nervous system would have rates suppressed for 43 (67%) of 64 parishes while a map of childhood cancer incidence would have rates suppressed for 53 (80%) parishes (see companion proposal from the Louisiana Tumor Registry (LTR)). In contrast, for California, brain/ONS and childhood cancer rates would be suppressed in only 13 (22%) and 21 (36%) of the state¿s 58 counties, respectively. Meanwhile, rate variations within the largest counties or parishes such as Orleans, Jefferson, and East Baton Rouge in Louisiana and Los Angeles, San Diego, Alameda, and Santa Clara in California are not revealed. Rates in these areas have limited value to researchers and concerned citizens interested in describing cancer incidence patterns at finer geographic scales. Furthermore, within these county boundaries are areas with distinct concentrations of racial/ethnic groups and high and low socioeconomic status that may have different rates of cancer. Incidence rates may be generated for smaller and more homogeneous geographic units such as census tracts. The total population in a census tract (year 2000), however, ranges between 1,500 and 8,000 with an optimal size of 4,000, which would make these geographic units insufficient for estimating reliable tract-level incidence rates that would not jeopardize patients¿ privacy and confidentiality. Several geographic strategies have been proposed to mitigate the problem. Spatial smoothing computes average rates for each area of interest by incorporating rates in adjacent areas. Spatial smoothing methods include the floating catchment area method, kernel density estimation, empirical Bayes estimation, locally-weighted-average approaches, and adaptive spatial filtering. While spatial smoothing assists in the revealing of the overall trend of spatial patterns (see www.uiowa.edu/iowacancermaps for an example), the result is an estimate of the average rate derived from the area of interest and surrounding areas, but may not reflect the true rate for the area of interest. This proposal seeks to construct larger geographic areas from smaller areas in order for the total base population to be sufficiently large for generating reliable incidence rates. Geography has a long tradition of grouping areas together for the purposes of ¿regionalization¿ or identifying ¿spatial clustering¿. Traditional methods place the first priority on attribute (e.g., sociodemographic characteristics) similarity within areas, and most are implemented manually or semi-automatically. Attribute information was first used to form initial regions and then applied several subjective rules and local knowledge to further adjust the region boundaries. Advancements in geographic information systems (GIS) technology have enabled researchers to develop methods automating the process. Two other earlier methods emphasized spatial proximity: space-filling curves to measure the nearness or spatial order of areal units and then grouped areas consecutively to reach a capacity constraint, and construction of regions of approximately equal population size by beginning with an area and adding the nearest areas to form each region with the desired threshold population. Neither of these methods however, account for within-area homogeneity of the attribute. Most recent work aims to develop GIS-based automated methods by accounting for spatial contiguity and attribute homogeneity within the derived areas. A preliminary assessment has identified two promising methods. A family of methods has been developed, termed ¿Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP)¿, to identify clusters of areas. Using three distance definitions to measure attribute dissimilarity and two constraining strategies to account for spatial contiguity, REDCAP is a family collection of six methods. REDCAP allows users to specify the desired spatial contiguity, attribute dissimilarity, number of derived regions, and other parameters. A modified scale-space clustering (MSSC) method was devised to form a series of geographic areas. The scale-space theory is based on the notion that an image contains structures at different scales, and its more significant structures can be preserved as the scale of observation becomes coarser. Similar to this operation on an image, the MSSC method merges or melts areas of higher value with surrounding areas of lower values but similar structure to form larger areas. The process is guided by a clear objective of minimizing loss of information. The method does not depend on any probability distribution of the data and is robust for unsupervised hierarchical classification. Like REDCAP, the MSSC method does not guarantee that newly formed areas have a minimum population. Both the REDCAP and MSSC methods account for attribute similarity when grouping contiguous areas together. The major difference lies in the objective functions to be optimized during the clustering process. The REDCAP minimizes the total heterogeneity value (i.e., sum of squared deviations of all regions while the MSSC attempts to preserve the overall spatial structure by grouping around local maxima. Both methods have demonstrated advantages over other existing ones when evaluated for total heterogeneity, region size balance, internal variation, preservation of data distribution and spatial compactness. However, neither method has been applied to cancer studies. Analysis of cancer data merits special attention such as data confidentiality and privacy concerns, and offers unique challenges such as additional constraints (e.g., creating areas above threshold population and respecting important geopolitical boundaries). The proposed project plans to evaluate and modify these two methods to enhance the presentation and visualization of cancer surveillance data by geographic area. The study will combine adjacent similar small areas to mask identity while keeping areas with a sufficient number (e.g., ≥ 15) of cancer incidences and population (≥ 50,000) intact.
与其他疾病(例如心血管疾病和糖尿病)相比,癌症是一种相对罕见的疾病。对癌症发病率的分析通常遭受了较小的人口问题,以不可靠的率估计,对缺失数据和其他数据误差的敏感性以及人口稀少的地区的数据抑制。在创建癌症发病率的地图时,选择分析单位(例如,计数或教区,邮政编码,人口普查区)和感兴趣的地理区域决定了每个区域是否会有足够数量的病例。例如,在州癌症概况网站上,癌症率在县或教区一级映射。路易斯安那州的教区水平发病率的地图将抑制64个教区中43个(67%)的率,而儿童癌症的发病率将抑制53(80%)教区的发生率(请参阅路易斯安那州肿瘤登记(LTR)的同伴提案)。相比之下,对于加利福尼亚州,大脑/ONS和儿童癌症的率仅在13(22%)和21个国家58个县中被抑制。同时,未透露,在路易斯安那州奥尔良,杰斐逊和东巴吞鲁日等最大县或教区的费率变化未透露在加利福尼亚州的洛杉矶,洛杉矶,圣地亚哥,阿拉米达和圣克拉拉。这些领域的速度对研究人员和有兴趣描述更细长的癌症事件模式的研究人员的价值有限。此外,在这些县范围内,是种族/族裔群体不同的地区,社会经济地位高和低的社会经济地位可能具有不同的癌症。可以为较小且更均匀的地理单位(例如人口普查区)产生发病率。然而,人口普查区的总人口(2000年)在1,500至8,000之间,最佳规模为4,000,这将使这些地理单位不足以估算可靠的道路级别的发病率,这不会危害患者的隐私性和机密性。 已经提出了几种地理策略来减轻问题。空间平滑度通过在相邻区域中合并速率来计算每个感兴趣区域的平均率。空间平滑方法包括浮动集水区方法,内核密度估计,经验贝叶斯估计,局部加权平均方法和自适应空间滤波。尽管空间平滑有助于揭示空间模式的整体趋势(例如,www.uiowa.edu/iowacancermaps the示例),但结果是对从感兴趣的地区和周围地区得出的平均速率的估计,但可能不会反映出利率领域的真实利率。 该提案旨在从较小的地区构建较大的地理区域,以使总人口足够大,以产生可靠的发病率。地理有一个悠久的传统将区域分组在一起,以进行区域化或识别空间群集。传统方法将领域内的属性(例如,社会人口统计学特征)的相似性置于优先级,并且大多数是手动或半自动的。属性信息首先用于形成初始区域,然后应用一些主题规则和本地知识,以进一步调整区域边界。地理信息系统(GIS)技术的进步使研究人员能够开发自动化过程的方法。其他两种早期方法强调了空间接近:填充空间曲线,以测量面部单位的近距离或空间顺序,然后连续分组以达到容量限制,并通过从一个区域开始并添加最接近的区域,以构建大约相等人口大小的区域,并添加到每个区域,以与所需的阈值群体形成每个区域。但是,这些方法都没有说明属性的区域内同质性。 最近的工作旨在通过考虑派生区域内的空间连续性和属性同质性来开发基于GIS的自动化方法。初步评估已经确定了两种有希望的方法。已经开发了一个方法家族,称为区域化,并通过动态约束的聚集聚类和分区(REDCAP)»»识别区域簇。使用三个距离定义来测量属性差异和两个约束策略来说明空间连续性,RedCap是六种方法的家庭集合。 REDCAP允许用户指定所需的空间连续性,属性差异,派生区域的数量和其他参数。设计了一种修改的规模空间聚类(MSSC)方法,以形成一系列地理区域。比例空间理论基于以下概念:图像包含不同尺度的结构,并且随着观察尺度变得更粗糙,可以保留其更重要的结构。与图像上的此操作相似,MSSC方法将价值更高的区域与周围值较低的区域融合或融合了较低的区域,但结构与形成较大区域相似。该过程的指导是最大程度地减少信息丢失的明确目标。该方法不取决于数据的任何概率分布,并且对于无监督的层次分类是可靠的。像REDCAP一样,MSSC方法不能保证新形成的地区的人口最少。 当将连续区域分组在一起时,RedCap和MSSC方法都会说明属性相似性。主要区别在于要在聚类过程中优化的目标函数。 The REDCAP minimizes the total heterogeneity value (i.e., sum of squared departures of all regions while the MSSC attempts to preserve the overall spatial structure by grouping around local maxima. Both methods have demonstrated advantages over other existing ones When evaluated for total heterogeneity, region size balance, internal variation, preservation of data distribution and spatial compactness. However, neither method has been applied to cancer studies. Analysis of cancer data merits special注意数据的保密性和隐私问题,并带来了独特的挑战,例如其他约束(例如,创建高于阈值的领域和尊重重要的地缘政治边界)。 拟议的项目计划评估和修改这两种方法,以通过地理区域来增强癌症监视数据的呈现和可视化。该研究将结合邻近的相似小区域与掩盖身份,同时保持足够数量(例如≥15)的癌症事件和人口(≥50,000)完整的区域。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

VIVIEN CHEN其他文献

VIVIEN CHEN的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('VIVIEN CHEN', 18)}}的其他基金

Quality Control of Electronic Pathology (E-Path)ResultsPeriod of Performance: September 15, 2014 - September 14, 2015
电子病理质量控制(E-Path)结果执行期间:2014年9月15日-2015年9月14日
  • 批准号:
    8947591
  • 财政年份:
    2014
  • 资助金额:
    $ 6.35万
  • 项目类别:
Patterns of Care (POC) Quality of Care Dx Yr 2011
护理模式 (POC) 护理质量 Dx 2011 年
  • 批准号:
    8565158
  • 财政年份:
    2012
  • 资助金额:
    $ 6.35万
  • 项目类别:
SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
监测、流行病学和最终结果 (SEER) 计划
  • 批准号:
    8481440
  • 财政年份:
    2012
  • 资助金额:
    $ 6.35万
  • 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
  • 批准号:
    8317507
  • 财政年份:
    2011
  • 资助金额:
    $ 6.35万
  • 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
  • 批准号:
    8317508
  • 财政年份:
    2011
  • 资助金额:
    $ 6.35万
  • 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
  • 批准号:
    8163677
  • 财政年份:
    2010
  • 资助金额:
    $ 6.35万
  • 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
  • 批准号:
    8131534
  • 财政年份:
    2010
  • 资助金额:
    $ 6.35万
  • 项目类别:
RRSS#12 - Patterns of Care- Diagnosis Year 2008
RRRSS
  • 批准号:
    7962444
  • 财政年份:
    2005
  • 资助金额:
    $ 6.35万
  • 项目类别:
Surveillance, Epidemiology and End Results (SEER) Program - LSU
监测、流行病学和最终结果 (SEER) 计划 - 路易斯安那州立大学
  • 批准号:
    7824258
  • 财政年份:
    2005
  • 资助金额:
    $ 6.35万
  • 项目类别:
RRSS #9 - Patterns of Care - Dx 2006 Feasibility Adolescent and Young Adult - LSU
RRRSS
  • 批准号:
    7824260
  • 财政年份:
    2005
  • 资助金额:
    $ 6.35万
  • 项目类别:

相似海外基金

IGF::OT::IGF SEER RRSS IMPROVING OUTPATIENT REPORTING OF CANCER OCCURRENCE AND TREATMENT; 9/19/16-9/18/17
IGF::OT::IGF SEER RRSS 改善癌症发生和治疗的门诊报告;
  • 批准号:
    9361196
  • 财政年份:
    2016
  • 资助金额:
    $ 6.35万
  • 项目类别:
IGF::OT::IGF SEER RRSS IMPROVING OUTPATIENT REPORTING OF CANCER OCCURRENCE AND TREATMENT; 9/19/16-9/18/17
IGF::OT::IGF SEER RRSS 改善癌症发生和治疗的门诊报告;
  • 批准号:
    9361195
  • 财政年份:
    2016
  • 资助金额:
    $ 6.35万
  • 项目类别:
RRSS Evaluate Completeness Liver Cancer Reporting Under New Clinical Guidelines
RRSS 根据新临床指南评估肝癌报告的完整性
  • 批准号:
    8351018
  • 财政年份:
    2011
  • 资助金额:
    $ 6.35万
  • 项目类别:
SEER RRSS Improving SES Data: Linkage State Vital Records, Birth Certificate Data
SEER RRSS 改进 SES 数据:链接状态人口记录、出生证明数据
  • 批准号:
    8351002
  • 财政年份:
    2011
  • 资助金额:
    $ 6.35万
  • 项目类别:
RRSS Improving SES Data: Linkage w State Vital Records, Birth Certificate Data
RRSS 改进 SES 数据:与州人口记录、出生证明数据的链接
  • 批准号:
    8351016
  • 财政年份:
    2011
  • 资助金额:
    $ 6.35万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了