The Influence of Data Aggregation on the Stability of Location Model Solutions

Alan T. Murray
Australian Housing and Urban Research Institute
Queensland University of Technology
Brisbane, Queensland, Australia
and
Jonathan Gottsegen
NCGIA

The problem of the effect of data aggregation on results of analyses and models is a classic one in Geography. It is critical to understand this problem when many analyses or models must use aggregated data either because aggregate data is the only data available or because aggregation is necessary to reduce complexity to make a model tractable. Location planning often uses data in an aggregated form without a clear understanding of the consquences. There have been conflicting findings on the stability of location model solutions obtained using aggregate data. The purpose of this research is to investigate the stability of solutions to optimization models (in particular the p-median model) using data aggregated to several resolutions.

We began with census block group data for the Buffalo metropolitan area:

The Buffalo metro. area has 913 census block groups in it. We have chosen it as our study area because previous work by Fotheringham, et al. used it as the basis for a study of p-medan solution stability, and we would like to compare our results with theirs.




Click here or on small map to see a larger image of the census block groups.





We used the elderly cohort (65+ years of age) as the hypothetical population to be served by 10 facilities to be optimally sited.



Click here or on small map to see a larger image of the distribution of elderly population in the metro. area.






We aggregated the census block information to four levels of aggregation: 100, 200, 400 and 800 units using two different aggregation methods. For each aggregation level and each method, we generated 20 realizations. The first method was a random aggregation process where randomly chosen seed polygons were expanded by randomly choosing remaining polygons and merging the chosen polygon with a seed if it was adjacent to it. The second method was a Thiessen Polygon method which began with randomly chosen seeds and produced Thiessen Polygons around the centroid of each seed. Remaining polygons were aggregated merged with the seed polygon whose corresponding Thiessen polygon contained the remaining polygons centroids. The following are samples of the random aggregations at each of the aggregation levels.

100 Units 200 Units 400 Units 800 Units
Click on a map or caption to see a larger image of the map

In addition to the two aggregation methods, we used two methods for locating the centroid of each unit (the centroid is the point that is used as the actual location of facilities and demand served). The two centroid location methods were the geometric centroid and a weighted centroid (Weber centroid). The Weber centroids were located by "weighting" their location according to the demand of the block groups that composed the aggregate units. This was effectively solving the Weber location problem for each aggregate unit. With two methods for locating the centroids in each aggregation method, we had four sets of aggregation instances.

For each aggregation instance at each aggregation level, we generated the p-median solution for 10 facilities.

We also calculated the objective function value as deviation from the optimum possible value using the original block group structure. The innovative aspect of this research is that we calculated the objective function value on the solutions mapped into the closest block group centroid. This is more realistic in terms of the way the models would be used. The objective functions showed considerable stability. While the random centroid aggregations show some fairly significant deviation from optimality at the 100 unit level. The 200 unit levels and above perform reasonably well. In addition, the Thiessen polygon approach shows very small deviations from optimality.


Back to Jon Gottsegen's home page