SEER RRSS #5 - Constructing Geographic Areas in GIS for Cancer Data Analysis
先知RRSS
基本信息
- 批准号:7952665
- 负责人:
- 金额:$ 6.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2005
- 资助国家:美国
- 起止时间:2005-08-01 至 2010-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Compared with other diseases such as cardiovascular disease and diabetes, cancer is a relatively rare disease. The analysis of cancer incidence often suffers from the small population problem manifested in unreliable rate estimates, sensitivity to missing data and other data errors, and data suppression in sparsely populated areas. When creating maps of cancer incidence, the choice of areal unit of analysis (e.g., county or parish, zip code, census tract) and the geographic region of interest determine whether there will be sufficient numbers of cases in each area. For example, on the State Cancer Profiles website, cancer rates are mapped at the county or parish level. A map of Louisiana¿s parish-level incidence rates for cancer of the brain and other nervous system would have rates suppressed for 43 (67%) of 64 parishes while a map of childhood cancer incidence would have rates suppressed for 53 (80%) parishes (see companion proposal from the Louisiana Tumor Registry (LTR)). In contrast, for California, brain/ONS and childhood cancer rates would be suppressed in only 13 (22%) and 21 (36%) of the state¿s 58 counties, respectively. Meanwhile, rate variations within the largest counties or parishes such as Orleans, Jefferson, and East Baton Rouge in Louisiana and Los Angeles, San Diego, Alameda, and Santa Clara in California are not revealed. Rates in these areas have limited value to researchers and concerned citizens interested in describing cancer incidence patterns at finer geographic scales. Furthermore, within these county boundaries are areas with distinct concentrations of racial/ethnic groups and high and low socioeconomic status that may have different rates of cancer. Incidence rates may be generated for smaller and more homogeneous geographic units such as census tracts. The total population in a census tract (year 2000), however, ranges between 1,500 and 8,000 with an optimal size of 4,000, which would make these geographic units insufficient for estimating reliable tract-level incidence rates that would not jeopardize patients¿ privacy and confidentiality.
Several geographic strategies have been proposed to mitigate the problem. Spatial smoothing computes average rates for each area of interest by incorporating rates in adjacent areas. Spatial smoothing methods include the floating catchment area method, kernel density estimation, empirical Bayes estimation, locally-weighted-average approaches, and adaptive spatial filtering. While spatial smoothing assists in the revealing of the overall trend of spatial patterns (see www.uiowa.edu/iowacancermaps for an example), the result is an estimate of the average rate derived from the area of interest and surrounding areas, but may not reflect the true rate for the area of interest.
This proposal seeks to construct larger geographic areas from smaller areas in order for the total base population to be sufficiently large for generating reliable incidence rates. Geography has a long tradition of grouping areas together for the purposes of ¿regionalization¿ or identifying ¿spatial clustering¿. Traditional methods place the first priority on attribute (e.g., sociodemographic characteristics) similarity within areas, and most are implemented manually or semi-automatically. Attribute information was first used to form initial regions and then applied several subjective rules and local knowledge to further adjust the region boundaries. Advancements in geographic information systems (GIS) technology have enabled researchers to develop methods automating the process. Two other earlier methods emphasized spatial proximity: space-filling curves to measure the nearness or spatial order of areal units and then grouped areas consecutively to reach a capacity constraint, and construction of regions of approximately equal population size by beginning with an area and adding the nearest areas to form each region with the desired threshold population. Neither of these methods however, account for within-area homogeneity of the attribute.
Most recent work aims to develop GIS-based automated methods by accounting for spatial contiguity and attribute homogeneity within the derived areas. A preliminary assessment has identified two promising methods. A family of methods has been developed, termed ¿Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP)¿, to identify clusters of areas. Using three distance definitions to measure attribute dissimilarity and two constraining strategies to account for spatial contiguity, REDCAP is a family collection of six methods. REDCAP allows users to specify the desired spatial contiguity, attribute dissimilarity, number of derived regions, and other parameters. A modified scale-space clustering (MSSC) method was devised to form a series of geographic areas. The scale-space theory is based on the notion that an image contains structures at different scales, and its more significant structures can be preserved as the scale of observation becomes coarser. Similar to this operation on an image, the MSSC method merges or melts areas of higher value with surrounding areas of lower values but similar structure to form larger areas. The process is guided by a clear objective of minimizing loss of information. The method does not depend on any probability distribution of the data and is robust for unsupervised hierarchical classification. Like REDCAP, the MSSC method does not guarantee that newly formed areas have a minimum population.
Both the REDCAP and MSSC methods account for attribute similarity when grouping contiguous areas together. The major difference lies in the objective functions to be optimized during the clustering process. The REDCAP minimizes the total heterogeneity value (i.e., sum of squared deviations of all regions while the MSSC attempts to preserve the overall spatial structure by grouping around local maxima. Both methods have demonstrated advantages over other existing ones when evaluated for total heterogeneity, region size balance, internal variation, preservation of data distribution and spatial compactness. However, neither method has been applied to cancer studies. Analysis of cancer data merits special attention such as data confidentiality and privacy concerns, and offers unique challenges such as additional constraints (e.g., creating areas above threshold population and respecting important geopolitical boundaries).
The proposed project plans to evaluate and modify these two methods to enhance the presentation and visualization of cancer surveillance data by geographic area. The study will combine adjacent similar small areas to mask identity while keeping areas with a sufficient number (e.g., ≥ 15) of cancer incidences and population (≥ 50,000) intact.
与心血管疾病、糖尿病等其他疾病相比,癌症是一种相对罕见的疾病,癌症发病率的分析经常遇到小群体问题,表现为估计率不可靠、对缺失数据和其他数据错误的敏感性以及稀疏的数据压制。创建癌症发病率地图时,分析区域单位(例如县或教区、邮政编码、人口普查区)和感兴趣的地理区域决定了每个区域是否有足够的病例数。例如,在州癌症概况网站,绘制了县或教区级别的癌症发病率地图。 64 个教区中 43 个 (67%) 的教区级别脑癌和其他神经系统癌症发病率将受到抑制,而儿童癌症发病率地图将抑制 53 个教区 (80%) 的发病率(参见配套提案)来自路易斯安那州肿瘤登记处 (LTR) 的数据)相比之下,在加利福尼亚州,只有 13 例 (22%) 和 21 例 (36%) 的脑癌/ONS 和儿童癌症发病率得到抑制。状态同时,路易斯安那州的奥尔良、杰斐逊和东巴吞鲁日以及加利福尼亚州的洛杉矶、圣地亚哥、阿拉米达和圣克拉拉等最大的县或教区的费率变化尚未公布。对于有兴趣在更精细的地理范围内描述癌症发病率模式的研究人员和相关公民来说,其价值有限。此外,在这些县边界内,有些地区具有不同的种族/族裔群体集中度以及较高和较低的社会经济地位,这些地区的癌症发病率可能有所不同。费率可能然而,人口普查区(2000 年)的总人口范围为 1,500 至 8,000 人,最佳规模为 4,000 人,这使得这些地理单位不足以进行可靠的估计。不会危及患者的呼吸道水平发病率¿隐私和保密。
已经提出了几种地理策略来缓解该问题。空间平滑通过合并相邻区域的速率来计算每个感兴趣区域的平均速率。空间平滑方法包括浮动集水区方法、核密度估计、经验贝叶斯估计、局部估计。加权平均方法和自适应空间过滤虽然空间平滑有助于揭示空间模式的总体趋势(参见 www.uiowa.edu/iowacancermaps 的示例),但结果是对导出的平均速率的估计。来自感兴趣的区域和周边区域,但可能无法反映感兴趣的区域的真实费率。
该提案旨在从较小的区域构建更大的地理区域,以使总基数足够大,从而产生可靠的发病率。地理上有将区域分组在一起的悠久传统。区域化¿或识别 ¿空间聚类¿传统方法首先考虑区域内的属性(例如社会人口特征)相似性,并且大多数是手动或半自动实现的,首先使用属性信息形成初始区域,然后应用一些主观规则和局部知识来进一步调整区域。地理信息系统(GIS)技术的进步使研究人员能够开发出使该过程自动化的方法,这两种早期方法强调空间邻近性:空间填充曲线来测量区域单元的邻近度或空间顺序,然后对区域进行分组。连续达到容量限制,以及通过从一个区域开始并添加最近的区域以形成具有所需阈值人口的每个区域来构建人口规模大致相等的区域,但是这些方法都没有考虑区域内的同质性。属性。
最近的工作旨在通过考虑派生区域内的空间连续性和属性同质性来开发基于 GIS 的自动化方法,初步评估已确定了两种有前景的方法,称为 ¿通过动态约束聚合聚类和分区 (REDCAP) 进行区域化¿ ,使用三个距离定义来测量属性相异性并使用两种约束策略来考虑空间连续性,REDCAP 是六种方法的系列集合,允许用户指定所需的空间连续性、属性相异性、派生数量。设计了一种改进的尺度空间聚类(MSSC)方法来形成一系列地理区域。尺度空间理论基于图像包含不同尺度的结构的概念。随着观察尺度变粗,可以保留其更重要的结构,与图像上的这种操作类似,MSSC 方法将较高值的区域与周围较低值但结构相似的区域合并或融合,以形成更大的区域。该过程以明确的目标最小化信息丢失为指导,不依赖于数据的任何概率分布,并且对于无监督的分层分类来说是稳健的,与 REDCAP 一样,MSSC 方法不保证新形成的区域具有最小人口。 。
REDCAP 和 MSSC 方法在将连续区域分组在一起时都考虑了属性相似性,主要区别在于聚类过程中要优化的目标函数。 MSSC 试图通过围绕局部最大值进行分组来保留整体空间结构,在评估总异质性、区域大小平衡、内部变化、数据分布和空间保存时,这两种方法都比其他现有方法具有优势。然而,这两种方法都没有应用于癌症研究。癌症数据的分析值得特别关注,例如数据保密性和隐私问题,并带来了独特的挑战,例如额外的限制(例如,创建高于阈值人口的区域和尊重重要的地缘政治边界)。 。
拟议的项目计划评估和修改这两种方法,以增强按地理区域划分的癌症监测数据的呈现和可视化。该研究将结合相邻的相似小区域来掩盖身份,同时保留具有足够数量(例如,≥15)的区域。癌症发病率和人口(≥ 50,000)完好无损。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
VIVIEN CHEN其他文献
VIVIEN CHEN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('VIVIEN CHEN', 18)}}的其他基金
Quality Control of Electronic Pathology (E-Path)ResultsPeriod of Performance: September 15, 2014 - September 14, 2015
电子病理质量控制(E-Path)结果执行期间:2014年9月15日-2015年9月14日
- 批准号:
8947591 - 财政年份:2014
- 资助金额:
$ 6.35万 - 项目类别:
Patterns of Care (POC) Quality of Care Dx Yr 2011
护理模式 (POC) 护理质量 Dx 2011 年
- 批准号:
8565158 - 财政年份:2012
- 资助金额:
$ 6.35万 - 项目类别:
SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
监测、流行病学和最终结果 (SEER) 计划
- 批准号:
8481440 - 财政年份:2012
- 资助金额:
$ 6.35万 - 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
- 批准号:
8317507 - 财政年份:2011
- 资助金额:
$ 6.35万 - 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
- 批准号:
8317508 - 财政年份:2011
- 资助金额:
$ 6.35万 - 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
- 批准号:
8163677 - 财政年份:2010
- 资助金额:
$ 6.35万 - 项目类别:
TAS::75 0849::TAS SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS (SEER) PROGRAM
TAS::75 0849::TAS 监测、流行病学和最终结果 (SEER) 计划
- 批准号:
8131534 - 财政年份:2010
- 资助金额:
$ 6.35万 - 项目类别:
Surveillance, Epidemiology and End Results (SEER) Program - LSU
监测、流行病学和最终结果 (SEER) 计划 - 路易斯安那州立大学
- 批准号:
7824258 - 财政年份:2005
- 资助金额:
$ 6.35万 - 项目类别:
RRSS #9 - Patterns of Care - Dx 2006 Feasibility Adolescent and Young Adult - LSU
RRRSS
- 批准号:
7824260 - 财政年份:2005
- 资助金额:
$ 6.35万 - 项目类别:
相似海外基金
IGF::OT::IGF SEER RRSS IMPROVING OUTPATIENT REPORTING OF CANCER OCCURRENCE AND TREATMENT; 9/19/16-9/18/17
IGF::OT::IGF SEER RRSS 改善癌症发生和治疗的门诊报告;
- 批准号:
9361196 - 财政年份:2016
- 资助金额:
$ 6.35万 - 项目类别:
IGF::OT::IGF SEER RRSS IMPROVING OUTPATIENT REPORTING OF CANCER OCCURRENCE AND TREATMENT; 9/19/16-9/18/17
IGF::OT::IGF SEER RRSS 改善癌症发生和治疗的门诊报告;
- 批准号:
9361195 - 财政年份:2016
- 资助金额:
$ 6.35万 - 项目类别:
RRSS Evaluate Completeness Liver Cancer Reporting Under New Clinical Guidelines
RRSS 根据新临床指南评估肝癌报告的完整性
- 批准号:
8351018 - 财政年份:2011
- 资助金额:
$ 6.35万 - 项目类别:
SEER RRSS Improving SES Data: Linkage State Vital Records, Birth Certificate Data
SEER RRSS 改进 SES 数据:链接状态人口记录、出生证明数据
- 批准号:
8351002 - 财政年份:2011
- 资助金额:
$ 6.35万 - 项目类别:
RRSS Improving SES Data: Linkage w State Vital Records, Birth Certificate Data
RRSS 改进 SES 数据:与州人口记录、出生证明数据的链接
- 批准号:
8351016 - 财政年份:2011
- 资助金额:
$ 6.35万 - 项目类别: