Dean Djokic, Andrew Coates, and James E. Ball

GENERIC DATA EXCHANGE - INTEGRATING MODELS AND DATA PROVIDERS

The integration of GIS with modeling software has usually been a one-off exercise, with large amounts of time spent creating and maintaining specialized interfaces and data exchange formats. This paper presents a generic format and method for data exchange (GOODES), which can significantly streamline the integration process between information users and information providers. Information users are usually analysis or modeling packages, such as hydrologic models or statistical packages, but can also include GIS when used as an analysis tool. Information providers are usually data storage and retrieval systems, but can also include models when used to ‘feed back information (results) from the simulation. GOODES stands for generic, object-oriented, open data exchange system, and allows exchange of data between any platform or package through an open and object-oriented structure. Application-specific drivers are used to translate the data to and from the generic format and can be developed for any application as required. Examples of data exchange between GOODES format and ARC/INFO GIS, hydrologic/hydraulic model SWMM, and time-series database HYDSYS are provided to illustrate the system’s implementation and features.


Introduction

Integration of computer models describing various processes occurring in the environment is becoming a major obstacle in development of comprehensive planning and analyses tools that can handle complex process interactions. Most models that have been developed over the last 30 or so years have concentrated on solutions of small portions of the overall problem in a field of engineering.

For example, in hydrology, the models have often been developed for a single component of a hydrological cycle, such as groundwater flow, surface flow, or flow in unsaturated zone. In many cases even these partial solutions are limited in their scope, often based on the dimensionality of the problem (1-D, 2-D, 3-D, time variant or invariant). There have been few attempts to create comprehensive solutions to general problems (e.g. the whole hydrological cycle), and those solutions usually compromise certain aspects of detail of the overall problem.

The main reasons for such approach are:

In last few years, with wider use of remote sensing as data provider for model input and fast development of computing power, two of the three indicated reasons are slowly being removed, allowing analyses of more complex problems. Those interested in such analyses are faced with the dilemma on the approach to take in developing complex models. There are two main options:

When analyzing the requirements for development of modern problem solving tools in environmental engineering, the second approach seems more rational (Djokic, 1993). An opportunity to use the first approach will arise as the need for new solutions to the old problems comes about due to better understanding of the basic underlying principles. In some areas of our endeavor that is already happening, but in some others, the existing techniques will be used in their present form for a long time to come.

Need for Generic Data Exchange Format

A complex decision making tool can consist of several components that are more or less tightly integrated. Depending on the accessibility of the source code for the programs of interest and interface strategy, the two programs can be linked, integrated, or embedded (Djokic et al., 1995a). Integrating computer programs on a data level is a tedious, but not very difficult task.

ARC/HEC2 Interface Schematics

An example of an application in which ARC/INFO GIS and HEC-2 hydraulic model are integrated using several interface programs and temporary data transfer files (Djokic et al., 1994) is provided in figure 1. The figure is a schematic representation of the interface. Actual implementation consists of more than 30 different programs, macros, and temporary files. As the number of programs to be interfaced increases, the complexity of the data exchange system increases significantly. Figure 2 presents a detailed data exchange structure between ARC/INFO GIS, HYDSYS temporal database, and SWMM hydraulic model (Chui, 1995).

ARC/SWMM Interface Schematics - Traditional Approach

Standard practice is that for each pair of programs to be integrated, a separate, program specific, data interface is created. Although convenient for development, such approach makes maintenance of complex interfaces difficult, especially as individual components (programs) go through version changes. Rationalization of the approach can be made by using standard data transfer file formats (Djokic et al., 1995a). Ability to use these standards can greatly reduce required effort in data integration, but is often limiting in the type of the data that can be transferred.

There are several standards available for data exchange between different computer programs and program/data types. Most of them are program/data type specific, such as SDTS (USFIPS, 1992) for exchange of spatially distributed data, DXF (Thomas, 1989) for CAD type of data, HDF (NCSA, 1995) for raster based data, or product developer specific.

The main problem in use of these standards is their inflexibility to handle diversity of data types encountered in environmental models. For example, depending on the problem being analyzed, an apparently same measure will be treated differently by different models. Consider a case of conjunctive surface water, groundwater use. The watershed boundaries, possibly delineated by hand, can be stored in a GIS. Surface water and groundwater boundaries can be different and stored as different coverages. A modeling program can retrieve these areas and use them for detailed hydrologic/hydraulic computations. As a result of the model calibration, it is possible that the initial watershed areas have been changed. We now have several places that hold a (different) number representing the same spatial feature.

Now, another application wants to use the watershed areas for further analyses. Several issues arise:

Such questions make use of existing data exchange standards difficult. This does not mean that the existing standards should not be used. They certainly should. It is important however to realize that for a number of cases they will not be sufficiently flexible.

Exchange Format Definition - GOODES

To accommodate the diversity of data and models that could be integrated a new data exchange method has been proposed (Djokic et al., 1995a). To acknowledge the generality of the method and its structure it has been renamed to GOODES - generic, object-oriented, open data exchange system. GOODES consists of two major components. First is a data exchange file structure, described in detail by Djokic et al., 1995a, with more details published on WWW at http://www.water.civeng.unsw.edu.au/department/hydrology/goodes2.htm. This structure consists of up to three ASCII files that define data structure and character, actual data, and all necessary auxiliary information needed for data interpretation.

ARC/HEC2 Interface Schematics

The second component of GOODES method are the drivers that actually convert GOODES files into application understandable format, or create GOODES files from the application. Figure 3 is a schematic representation of GOODES exchange system. The role of the drivers is more than just reformatting the data, and it will be discussed in detail in the following section.

GOODES Drivers

There are three distinct GOODES driver types. These fulfill all of the requirements for data exchange and are intended to simplify the process of application integration. The driver types are:

Application to GOODES

The application to GOODES driver simply puts data objects from the application into GOODES format. The driver is generally written in the application’s own language if there is one available, as this will usually simplify access to the native data objects. The application to GOODES driver will generally work from the application’s internal data structure if that is exposed to the author, however for some applications, there may be no access to that structure and the driver will have to work from standard output from the application. There is no processing performed on the data objects (other than to specify which are to be exported) as the driver assumes no knowledge of the target application.

The sequence for exporting the application’s data to GOODES format is shown in Figure 4. The user first supplies details of the application’s data objects required for export. This specification can either be via a user interface or a controlling script file. Next, the user provides any additional information required for the class definitions in the header file, which is then written. The global section details are provided and the data file written. Finally any information for an auxiliary file is written.

ARC/HEC2 Interface Schematics

GOODES to Application

The second driver type takes information in GOODES format and converts it to the application’s input data format. This may be the application’s native format if that is available to the author, or may be an intermediate format with which the application is familiar. By choice, the GOODES to application drive will be written in the native language of the application if there is one, as this will generally provide the easiest access to the applications data structure.

The GOODES to application driver will generally have some limited amount of “intelligence”. It should, for example, be able to carry out basic database operations such as single level joins, aggregation, summary, some simple statistical analysis and boolean selection. To achieve this, it will often be required to go through an intermediate data format (such as import into a RDBMS) if the application itself does not support such operations. This intermediate format is not intended to be a standard and its structure is entirely at the discretion of the driver author.

Generally, much more user interaction is required for the GOODES to application driver than for the application to GOODES driver. This suggests that a facility for script file processing of data exchange, as well as manual interaction be provided.

ARC/HEC2 Interface Schematics

The process for the GOODES to application driver is shown in Figure 5. First, the user supplies the name of the data file (either interactively or through the controlling script file) and the driver checks that it, the header and any auxiliary files are available. The structures of the files are then checked for consistency. The performed checks are listed in Table 1.

GOODES to GOODES

The final driver type is used for complex data manipulation tasks. It operates on one or more GOODES files and either produces a new GOODES file or appends its results to an input file. The GOODES to GOODES driver is used when the format of the origin data cannot be resolved into an appropriate structure by the simple operations available through the GOODES to application driver.

The GOODES to GOODES driver supports complex relational database operations, and as such will usually use an intermediate data format such as importing the data objects into tables in a RDBMS. This format is entirely at the discretion of the driver author and is not in itself a standard. The driver should be able to be controlled either interactively or through the use of a script file. Actions performed interactively should be recordable in the script file format for later automatic repetition.

Some of the database operations which could be supported by a GOODES to GOODES driver are :

Case Study

An existing one-off integration exercise was used as a case study for this generic approach to model integration. This exercise, which is described in more detail in Djokic et al., (1995b) and Chui (1995), involved the use of data from ARC/INFO about the spatial elements in an urban catchment and from a time series manager, HYDSYS that held the information about rainfall. The data were combined to run simulations in the EPA’s Storm Water Management Model (SWMM), and to feed the results back into ARC/INFO for display. This was achieved through some custom written integration routines in AML and QBASIC. A schematic diagram of the integration process is shown in Figure 2. As can be seen, several custom-written interface programs were written. The case study involved the replacement of these with new components based on the GOODES format. In particular, the following drivers were written:

A schematic of the resultant process after the generic method was applied is shown in Figure 6. As can be seen from a comparison of the two figures, the generic method produces significantly enhanced results in terms of clarity of the process.

As a result of this generic approach, several benefits have ensued. Firstly, each of the components is now modularised. Should the requirements of the system change such that, for example, a different time series manager would be preferable, only that small part of the diagram relating to the HYDSYS components need to be re-written. In fact, with the generic approach, it is likely that the driver for the new time series manager is already available (perhaps having been developed for some completely unrelated project) and the process of replacing components is extremely simple.

ARC/HEC2 Interface Schematics

The driver development for the case study also clarified the philosophy behind the three driver types (and actually revealed the need for a GOODES to GOODES driver). At first, the application to GOODES drivers were perceived to require some knowledge of the target application in their object structure. While it is still true that a very general knowledge of the purpose of the output data is required, it was decided that a truly generic system would not impose structures on export. The import drivers should manipulate the GOODES data, rather than relying on it being pre-formatted. To reduce the need for extremely complex capabilities for all GOODES to application drivers, the GOODES to GOODES driver was developed. It is a common module that can be used prior to GOODES to application drivers, providing the manipulative, summary, and other capabilities that may be required, without the need for complex development effort by all (GOODES to application) driver authors.

Discussion and Conclusion

GOODES data exchange system facilitates data integration of applications with diverse background. GOODES file type and structure enables easy cross-platform and cross-application access to the data presented for sharing. Additional information about the data helps in proper interpretation of the meaning of the data, which is crucial for development of complex, integrated modeling systems that require little or no user intervention.

Drivers add intelligence to the data exchange process. They are used to interpret the data as a function of the data character and auxiliary information stored in GOODES transfer files. It is envisioned that drivers to and from an application will be developed and maintained by the application developers. Since it is impossible for any developer to envision and accommodate all the possible uses of their product, often the GOODES files will not be complete enough for direct transfer to another application.

A GOODES to GOODES driver, possibly in conjunction with GOODES files derived from some other programs, can be used to create a GOODES file that an GOODES to application driver can interpret directly. This approach allows the application integrator to concentrate on the major issues of the data integration process. That involves fine tuning of the GOODES to GOODES driver for specific problem and programs at hand, but as a benefit, it relieves the application integrator from dealing with details of individual programs’ input and output structure. As the number of programs to be integrated increases, this benefit becomes more pronounced.

The GOODES system is public and open for users’ input. It is documented and will be maintained through WWW entries at http://www.water.civeng.unsw.edu.au/department/ hydrology/goodes2.htm. Comments and submissions are welcome and appreciated. The contact regarding GOODES maintenance is Andrew Coates (A.Coates@unsw.edu.au)

References

Chui, S.K., (1995). “Hydrologic Application of ARC/INFO GIS to SWMM for Modelling Urban Stormwater Problem.” M.Eng.Sci thesis. School of Civil Engineering, University of New South Wales, Sydney, Australia.

Djokic, D., Coates, A., and Ball, J.E. (1995a). "GIS as Integration Tool for Hydrological Modeling: A Need for Generic Hydrologic Data Exchange Format." 15th Annual ARC/INFO User Conference, Palm Springs, CA.

Djokic, D., Ball, J.E., and Chui, S.K. (1995b). “Integration of ARC/INFO and Stormwater Management Model (SWMM).” 9th Annual Australian ARC/INFO User Conference, Sydney, Australia.

Djokic, D., Beavers, M.A., and Deshakulakarni, C.K. (1994). "ARC/HEC2: an ARC/INFO - HEC-2 Interface." Proc. 21th Water Resources Planning and Management Division Annual Specialty Conference, ASCE, New York, N.Y., 41-44.

Djokic, D. (1993). "Towards General Purpose Spatial Decision Support System Using Existing Technologies." NCGIA Second International Conference/Workshop on Integrating GIS and Environmental Modeling, Brackenridge, CO.

NCSA. (1995). “Hierarchical Data Format.” National Centre for Super Computing Applications, WWW, http://www.ncsa.uiuc.edu/SDG/Software/HDF/HDFIntro.html, Urbana-Champaign, Il.

Thomas, R.M. (1989). “Autocad Desktop Companion.” Sybex, Alameda, Ca.

USFIPS. (1992). “Spatial Data Transfer Standard.”US Federal Information Processing Standard Publication 173.


Dean Djokic, Lecturer
Andrew Coates, Professional Officer
James E. Ball, Senior Lecturer
School of Civil Engineering
The University of New South Wales
Sydney, NSW 2052
Australia
Telephone: 61-2-385-5771
Fax: 61-2-385-6139
Email: D.Djokic@unsw.edu.au