'Econometric theory is like an exquisitely balanced French recipe, spelling
out precisely with how many turns to mix the sauce, how many carats of
spice to add, and for how many seconds to bake the mixture at exactly 474
degrees of temperature. But when the statistical cook turns to the raw
materials, he finds that hearts of cactus fruits are unavailable, so he
substitutes cantaloupe; where the recipe calls for vermicelli he uses shredded
wheat; and he substitutes green garment dye for curry, ping pong balls
for turtle's eggs, and, for Chalifougnac vintage 1883, a can of turpentine.'
Valavanis 1959: 83, quoted in Kennedy 1979.
Ever since undergraduate days in Bristol in the 1970s, I have felt fully imbued with the quantitative locational analysis tradition in geography - not least because some of the origins to the approach can be traced to Bristol in the 1960s. Yet I share the frustration aired in some of the other position papers that the spatial 'mainstream' to geography has been sidelined in the major geography journals, that it accounts for a reduced real share of intellectual activity in the subject, that its interdisciplinary outreach has been limited, and that today's GIS practice appears to develop largely separately from academia. (At least GIS and RS together make up one of six 'specialisms' that are key to UK central government's ranking of subject performance in what remains a mainstream discipline). I should like to comment on the way that research practice and, in particular, data handling, may contribute to this state-of-affairs, and to suggest how reconfiguring some priorities in spatial analysis might be beneficial.
Spatial analysis, like econometrics, has benefited from the proliferation of digital data sources in recent years. Today's spatial data models allow far 'thicker' depictions of geographical reality to be created than those I cut my own teeth on. The transformational (Martin 1996) or simplifying assumptions entailed in building GIS-based models of real world spatial distributions have become much less heroic as a consequence. And of course it is well known that developments in computer hardware remain more or less commensurate with the increase in available data, making it possible to explore and model spatial interactions in more detail than ever before. Viewed in this context it is paradoxical that, in the UK at least, there is less faith than ever in 'predict and provide' approaches to planning, that business and service planning is turning away from conventional spatial analysis and that some of the spatial analysis community view essentially 'black box' techniques with increasing favour. Why is this the case?
In the socio-economic realm one suggestion might be that any increase in the sophistication of analytical models has been more than outpaced by increases in the complexity of the systems themselves - witness, for example, the scale and pace of change in the physical forms of urban systems, or the fragmentation of household consumption patterns and lifestyles. A variant on this theme is suggested by Curry (1995) and others who seem to suggest that the quality of digital data can never be adequate for the resolution of significant problems of real world concern. A third suggestion, which is the one I will pursue here, is that the research community should refocus effort away from abstract semantic discussion or analytical elegance and towards the messy empirical problems of data integration. This should be done in as rational, orderly and application-centric a fashion as possible.
Goodchild and Longley (1999) appraise of the 'linear project design' as a model for contemporary research in natural and social science. For generations of students, the formulation of research hypotheses has been followed by choice of a data collection method (and designing a survey schedule, as appropriate), identifying a sample design, piloting, field collection of data (with verification and resampling), collation of results, analysis, and report-writing. They reflect that, although this robust and defensible schema has underlain generations of student dissertations, it was never a panacea in practice, for reasons of data resolution, surrogacy and timeliness - and the amount of funding available for scientific research (we're all researchers now!). Today's GIS environment is also characterised by datasets which are collected by many different means and which pass through many hands. Many of the problems of data resolution, surrogacy and timeliness are today less problematic, yet more data are second hand and more data are collected using unscientific research designs (indeed they are often not principally collected for 'research' at all).
2. The developing digital data infrastructure
2.1 Changes in supply, pricing and access
In physical and social science alike, the costs of data have generally
been a (sometimes the) major component of the costs of GIS creation. The
order of magnitude of data costs reflects a number of technological and
secular imperatives which govern the supply, pricing, and access aspects
to data availability.
In the early days of GIS, the ‘data bottleneck’ of (manual or semi-automated) digitising presented a major impediment to the creation of spatially-referenced databases, particularly if the hard copy source documents were complicated or ambiguous. Early software systems provided (by present day standards) fairly unsophisticated procedures for detecting and correcting the results of error-prone digitising. Moreover, ‘framework’ spatial data, such as those created and maintained by national mapping agencies were available only in hard-copy printed form, and in the early days of GIS there was resistance to initiating the task of converting ‘legacy’ hard copy maps to digital form.
A wealth of digital data has since come into existence. First, and as with computer hardware, new technology is playing an important role. In particular, the wide (selective) availability of global positioning systems makes creation of new digital datasets much more straightforward than hitherto. Second, most national mapping agencies have gradually overcome their initial reluctance to create digital versions of their paper records, while at smaller scales private providers have created a range of digital atlas products. And third, computerised logging of the physical and social environment takes place with ever-increasing frequency, and to ever-greater levels of detail—for example through high-resolution remote sensing of the physical and built environments and the digital encoding of consumer purchasing behaviour (through loyalty programmes and the development of ‘relationship marketing’) in the socioeconomic realm.
Yet this has not created a panacea for data modelling. In practice, accurate field recording of data remains an expert task and sound geographical analysis presumes sound data standards. Many national mapping agencies (such as Great Britain’s Ordnance Survey) have only succeeded in ‘going digital’ in the face of increasingly stringent public expenditure constraints by recovering vastly increased proportions of their creation and maintenance costs through user charges: the inevitable consequence is a rationing of framework data on an ‘ability to pay’ basis. Similarly hawkish data pricing regimes may apply to the data products from the new generation of high-resolution satellite sensors, while high royalty charges dissuade many business users from census data and census data products in some countries (such as the UK). At the same time, governments are reluctant to fund even their traditional linear project design-driven surveys, in view of the apparent tide of information created using new data capture technologies. With respect to the academic realm, the rise of interdisciplinary science is leading to a higher incidence of jointly-funded projects, and the commonplace situation in which the creators of spatial data may be widely separated from some of the communities of end users. As creators and users of data become more and more separated, in space, time, and intellectual tradition, the ability to describe data becomes increasingly critical. The creator must be able to tell the user about methods, accuracies, formats, and all of the details needed to transfer, open, and make effective use of the data. Moreover the user must be able to determine whether a given data sets meets or falls short of requirements, and this is increasingly accomplished through metadata.
2.2 The changing remit and requirements of modelling
The early years of the spatial analysis paradigm were associated with
the development of wide-ranging models of physical and social systems.
The remit of such models was avowedly ambitious, yet on reflection the
data infrastructure was not commensurate with the tasks in hand. A number
of commentators have identified reasons for the subsequent demise of large
scale socioeconomic modelling activity, although the innovation of GIS
has brought with it a renaissance in model-building activity. Moreover,
any decline in large-scale modelling of socio-economic systems has been
matched by the rapid growth of environmental modelling, much of it coupled
with or otherwise making use of GIS.
The new is quite different from the old, however. Within the socioeconomic realm, Birkin (1996) has described how the current generation of spatial interaction models, for example, seeks only to model limited (in terms of spatial extent, time frame and attribute range) aspects of urban sub-systems. This in part reflects secular trends in all developed societies away from system-wide planning, yet it also reflects a profound reappraisal of what we now consider to be the appropriate domain and capability of analytical models. Today’s urban models are much more data-rich in two respects. First, the revolution in the supply and availability of geographical information means that data no longer represent coarse zonal aggregations, and thus that the data model of spatial distributions bears a closer correspondence with reality. Second, the first generation of urban models used data derived exclusively from public sector sources and which were thus restricted to the limited range of variables of interest to officialdom. Whilst such data can be used, singly or in combination, to create crude indicators of human behaviour and activity patterns, such indicators bear at best a very imperfect correspondence with reality.
Within the socioeconomic realm, the present status of modelling is rather ambiguous. Within academia, disenchantment with urban modelling leaves it as an area of activity with a significantly reduced real share of intellectual activity compared to, say, twenty years ago. Business applications of data-rich partial models of components of urban systems are buoyant, and today client repeat purchases provide vindication of the validity of spatial interaction and other modelling approaches. Within planning, there has never been a greater need for accurate data and analytical models of urban systems, because the rate, scale, and pace of change has never been greater. Yet, in the UK at least, there is disquiet about the ‘predict and provide’ approach to planning which has hitherto been based upon aggregate modelling approaches.
2.3 Model linkage: towards a new perspective?
The linear project design presumed that resources were available for
a linear, vertically integrated sequence of events. Today’s research environment
is much less straightforward. The strictures of public expenditure make
it less likely that large-scale purpose-specific research will be funded,
while information commerce makes it less than unequivocal that the best
secondary data will be available. Yet data warehouses are bursting with
data that might be combined to create richer profiles of landscapes, morphologies,
households, and activity patterns than have ever been created before. While
the developing geocomputation paradigm presents us with some ‘brute force’
mechanisms for searching out generalisations from large and complex datasets,
we may have no way of knowing whether such generalisations hold any scientific
validity.
A negative view of this research environment would suggest that a price has been put on scientific truth that lies beyond the budget of many researchers. There is some truth in this, yet economic imperatives need also to be viewed in their technological context. In truth, as our retrospective of urban modelling above has illustrated, data collected through the linear project design did not provide a panacea in practice. Today’s digital data infrastructure is more detailed, relevant, and up-to-date than ever before. The problem is that this infrastructure is also more piecemeal, and hence possibly ill-founded and unsafe.
The environment for spatial analysis is GIS, which has always been an applications-led technology. The sophistication of current applications requires a breadth and depth of data that could never have been sustained by established data collection methods. Today’s open and desk-top GIS alike are geared towards the analysis of application-specific ‘horses for courses’ datasets. Such datasets are required to model real-world systems that are dynamic and fast-changing, and thus the timescale between data collection and availability of secondary analysis needs also to be shortened. Our understanding of physical and social systems alike is now of such sophistication that infrequently collected, aggregate, and surrogate spatial data are simply not good enough. These are all crucial considerations, yet they all lie outside the remit of the linear project design. Are we therefore faced with a stark choice between scientific validity and ‘making do’ with inappropriate, overly-aggregate, out-of-date indicators? The rejection of Census-based geodemographics in favour of lifestyles (i.e. data warehouse) analysis in much of business geographics suggests that the road to scientific truth is no simple one-way street, and that proponents of inductive data-led thinking have their supporters in the world of application.
Framed in these terms, one of the big questions for GIS at the turn of the millennium must be: Can the new digital data infrastructure be assembled together in a sufficiently accurate, orderly and rational way to bridge relevance, richness and academic respectability? Goodchild and Longley (1999) use the term ‘concatenation’ to describe the integration of two or more different data sources, such that the contents of each are accessible in the product. The polygon overlay operation is one simple form of concatenation. They use the complementary term ‘conflation’ to describe the range of functions that attempt to overcome differences between data sets, or to merge their contents. Conflation thus attempts to replace two or more versions of the same information with a single version that reflects the pooling, or weighted averaging, of the sources.
3 Model linkage in practice
3.1 RS–GIS concatenation
Census information and satellite imagery are diverse sources of information.
Longley and Mesev (1997) use information from the 1991 UK small area census
statistics as ancillary information to improve the classification accuracy
of a contemporary (LANDSAT TM) image of Bristol. Information from the Census
is used to assist in sample training and post-classification sorting. The
resultant hybridised dataset is designed with a specialised purpose in
mind—to provide detailed data models of the distribution of population
and domestic property. This is used to reappraise conventional analysis
of the density at which urban space is occupied—and through comparisons
Longley and Mesev (1997) develop density gradient profiles for different
categories of urban space filling, such as ‘built form’, ‘residential’,
‘households’, and ‘population’. They demonstrate that the differences between
these apparently similar categories are more than semantic, and can heavily
condition whether and to what extent we might consider density profiles
characteristic of particular settlement types. The optimistic message of
this work is that, once the differences between different conceptions of
‘urbanity’ have been clearly grasped, it is possible to develop a range
of customised indicators of urban morphology. In this way, customised GIS-based
data models are informing our thinking about the ways in which urban settlements
fill space, as well as providing detailed information as to the morphology
of particular settlement structures.
3.2 Conflating geodemographics and lifestyles
‘Lifestyles’ is a broad term that has been used to describe data pertaining
to the consumption of a wide range of goods and services by identifiable
individuals and households. Lifestyles data originate from a diverse range
of sources, such as guarantee card returns, questionnaires attached to
nationally circulated prize draw entries, and market research surveys.
They are usually georeferenced through the postcode system (e.g. in the
UK to the unit postcode, which typically comprises 15 or so addresses in
urban areas). At least one UK ‘data warehouse’ estimates that it holds
up-to-date information on 11 million UK households. Such data have evident
use for direct marketing, for past consumption habits are key guides to
future behaviour. Harris (1999) has analysed the anonymised individual/household
records from one particular lifestyles questionnaire which was mailed out
in October 1996. The number of respondents to this survey constitutes 10.8%
of all households in Bristol, UK (population 636,000): this makes the survey
larger in size than a mini census, yet the characteristics of non-respondents
are likely to be very unrepresentative of respondents. In recent years,
lifestyles approaches have gained some ground as tools for geomarketing
at the expense of the use of census and composite geodemographic indicators,
because the latter are increasingly out of date (the last UK Census was
held in 1991), they are expensive to use because of UK royalty structures
and, perhaps most damning of all, the census contains too few variables
that bear an identifiable correspondence with consumer behaviour (most
notably in the UK, because of the absence of an income question in the
Census).
The ‘geodemographics–lifestyles’ debate thus epitomises the tensions described in Section 2 above. Geodemographics is based on tried and trusted techniques and derives from a dataset (the Census) which has been designed and implemented using the most rigorous research design principles; and yet at the end of the day, it is out of date, and can supply at best only very imperfect indicators of real-world consumer behaviour. Sampling theory tells us that reweighting of largely self-selecting samples on the basis of sub-group response rates is foolhardy; yet survey research practice tells us that quantitative indicators should be direct and transparent, and that survey results are only directly applicable to the population from which the respondents were drawn (few of us would wholly identify with our digital past-selves who filled out a census form at the start of this decade).
A middle path between these two lies in Batey and Brown’s (1995) assertion that lifestyle descriptors can be used as a wrapper to add depth to the labels assigned to different geodemographic groups. Thus, for example, the SuperProfiles category ‘affluent achievers’ has fairly distinctive Census characteristics in terms of house construction type, socio-economic status and car ownership, to which lifestyle labels about theatre and restaurant patronage, share registers, newspaper readership, and credit card usage are added. The data from which these labels are obtained are in many cases collected by unscientific means or strictly pertain only to coarser aggregations of households. Yet Harris’s (1999) cluster analysis of (unweighted) lifestyle data finds some practical validity to this approach: it nevertheless runs rough-shod over conventional views about how scale and aggregation issues should be tackled.
4 The future of spatial analysis
Goodchild and Longley (1999) suggest that the kinds of circumstances
and imperatives presented in the preceding discussion will lead to the
emergence of the following kinds of spatial analysis in the coming years:
This statement has highlighted the way in which the advanced information economy of the late 1990s has multiplied the number of potential sources of (rich) digital information, yet in ways which will be less standardised and project-specific than those implied by the linear project design. A major challenge to the GIS community is to devise methods to reconcile diverse datasets with different data structures or spatial referencing systems. Only in this way will GIS be able to tease out the complex relationships that exist between projects, data sets, and analytic techniques in modern science. The self-perception of rigour amongst spatial analysts has hitherto been misplaced because of the vagaries and inadequacies of data quality, resolution and richness: progress requires us to face up to the fact that the linear project design was never a panacea in practice.
Batey P, Brown P 1995 From human ecology to customer targeting: the
evolution of geodemographics. In Longley P, Clarke G (eds) GIS for business
and service planning. Cambridge, GeoInformation International: 77–103
Birkin M 1996 Retail location modelling in GIS. In Longley P A, Batty
M (eds) Spatial analysis: modelling in a GIS environment. Cambridge,
GeoInformation International: 207–25
Curry M R 1995 GIS and the inevitability of ethical inconsistency.
In Pickles J (ed.) Ground truth: the social implications of geographic
information systems. New York: Guilford Press: 68-87
Goodchild M F, Longley P A 1999 The future of GIS and spatial analysis.
In Longley P A, Goodchild M F, Maguire D J, Rhind D W (eds) Geographical
information systems: principles, techniques, management and applications.
New York, Wiley: 1: 567–80
Harris R 1999 A comparative analysis of a lifestyle and geodemographic
typology. Working paper. Bristol, University of Bristol
Kennedy P 1979 A guide to econometrics. Oxford, Martin Robertson
Longley P A, Mesev V 1997 Beyond analogue models: space filling and
density measures of an urban settlement. Papers in Regional Science
76: 409–27
Martin D J 1996 Geographic information systems: socioeconomic applications.
London, Routledge
Valavanis S 1959 Econometrics. New York, McGraw-Hill
Research Interests
His research interests are grouped around the use of geographical information
systems (GIS) and quantitative methods in urban analysis. They include:
information integration within GIS (notably remote sensing - GIS integration);
fractal geometry;
local taxation;
urban housing markets;
statistical modelling;
social survey research practice.
He is editor of the journal Computers, Environment and Urban Systems,
reviews co-editor of Environment and Planning B: Planning and Design and
an editorial board member of Papers in Regional Science, Geographical Systems,
and GIS Europe. He is co-author (with Michael Batty) of Fractal Cities:
a Geometry of Form and Function (Academic Press 1994). He is co-editor
(with Michael Goodchild, David Maguire and David Rhind) of the second edition
of Geographical Information Systems: Principles, Techniques, Management,
Applications (John Wiley, 1998). Other co-edited works include: GIS for
Business and Service Planning (with Graham Clarke: GeoInformation International
1995), Spatial Analysis: Modelling in a GIS Environment (with Michael Batty:
GeoInformation International
1996) and Geocomputation: a Primer (with Sue Brooks, Rachel McDonnell
and Bill Macmillan). He is a past chairperson of the (then) Institute of
British Geographers Quantitative Methods Study Group and between 1991 and
1996 was European Organising Secretary of the Regional Science Association
International.
Telephone (Direct): +44 (0)117 928 7509
Telephone (Dept. Sec.): +44 (0)117 928 7875
Fax: +44 (0)117 928 7878
Web page: http://www.ggy.bris.ac.uk/staff/pl/pl.htm
Email: Paul.Longley@bristol.ac.uk