An undergraduate interested in teaching high
School students about Earth Science has designed a basic computer model
that can be used to determine the effect of rising air pollution levels
and how it effects the ozone hole over the south pole. It's a simple
model with a friendly interface designed to run on a basic PC. When
she is done with the model, she registers it with a model registry.
At about the same time, a post doctorate researcher
who has dedicated his life to understanding the ozone hole is putting the
finishing touches on a model used to interpolate the interactions of 16
parameters that effect the ozone levels regionally. The model requires
a Unix workstation and a complex understanding of a number of programs.
The input data required needs to be in a specific format obtainable only
from particular sources. Data input and calibration requires months
of man-hours. To share his model (and be recognized by his colleagues)
the post doc registers his model through the same registry.
Several years later, an over-ambitious high
school student, hoping to get into a good college, decides to enter the
regional science fair. He wants to do a project on the ozone hole.
Wanting first prize and being as computer savvy as he is, he decides to
incorporate a computer model in his project, so he goes to a model registry
and looks up models on the ozone hole.
By this time, several other models have been
registered as well as the two mentioned before. The boy looks through
the long list of models and tries to determine the model that he wants
to use.
It's a question of quality. What model is best suited for a particular application, and what tells the intended user that this model, or that model, is fit for their use? What part of the metadata tells how good or how accurate the model is? How can a model registry or digital library tell the possible user what quality the model has?
A few different approaches have been suggested.
The Model Producer
One method of describing a model is to tell where
it came from. In general, there exists a certain understanding of
quality and attention to detail from different producers of products and
information. For example, map data obtained from the USGS is thought
to be more accurate and precise then map data obtained from a no-name commercial
source. Information about the latest political outbreaks in some
foreign country reported by CNN is generally thought to be more accurate
then what is read in a chat room or heard on the street.
The source of a model is similar. Let's take
an example. Prof. A wants to use a model in his research. He
finds one in the subject pertaining to his research, and the model will
work on the system that he has. The model was developed by Dr. X
at the Univ. of Y.
Prof. A missed the last conference where Dr. X presented
his model and doesn't know anything about Dr. X, but he is friends with
another Prof. At The Univ. of Y. He calls his friend and asks if
Dr. X knows what he's talking about. His friend tells him that Dr.
X is a quack, and so Prof. A looks for another model to use.
Prof. A used the knowledge that he could obtain
about the source of the model to determine the quality of the model.
Is this a good approach? Does the quality or popularity of a source
determine the quality of a product? Can we base the fitness for our
use of a model on the creator or source of the model?
Intended User
In metadata collection it would be easy enough
for someone who understands the model to describe the intended user.
If the model registry, where our over-ambitious high school student was
looking for models, had listed that one model was for research and another
model was for high school education purposes, it would be easy for the
boy to know which models are fit for his use, or would it?
What if the data input for the computer model
was in a format that was difficult to obtain or required preprocessing?
What if the model required the use of a computer program that was unfamiliar
or unavailable to a perspective user? Perhaps the writer of the model
thought that any high school student should be able to use his model, but
in reality, it was too complex? What if the time requirements for
operating the model were more then what a user was willing to put into
it? What if the model was intended to be used by one group (of students
for example) but only if an instructor had a higher understanding of the
subject?
Many questions can arise from only supplying
the intended user of the model. Models depend on more variables then just
who should use it. Would the other information offered in the metadata
give enough information about these other variables?
Calibration Procedures
Another purposed idea for reporting fitness of use,
or quality of a model, would be to describe the calibration techniques
used to make a model accurate.
But what if the model didn't require any calibration
techniques? Should we consider that model, to be sub-standard and
not fit for a use? I think not.
The whole goal here is to create a format for metadata
so that many models can be compared and cataloged with one another.
If one model that does require calibration were to be compared with one
that doesn't, how would a perspective user have any idea if the model that
doesn't need calibration were any good? There would be no way to
tell.
Current Users/Applications
Another idea that some have purposed is to
include current users or current applications. This idea isn't new
nor is it unfamiliar in today's society. Just yesterday I heard that
a woman lost 35 pounds of the Jenny Craig Diet, and the other day a woman
on TV was explaining that by using Tide with bleach, her clothes were whiter
and therefore, her father was happy to be living with her and her two filthy
sons.
Quite often we try to make what have to offer, or what we have created
look as good as we can get it. One way of doing this is to describe
our very best user or the most ideal application of what we have.
Perhaps we might even overlook some specific steps or problems in order
to make what we have, look good.
This again, is another reason that the writer of
the metadata is just as important as the content of the metadata.
If the creator of a model was writing the metadata, and he wanted his model
to be presented in a favorable light, he might look for his prized application
to report on in the metadata. However, if a digital librarian was to be
recording the metadata of the model, he or she might look for the most
basic, simple application. If we have one collection where the model
producer creates the metadata and another collection where an unbiased
librarian was to be the metadata writer, how could the fitness of use be
compared between models from these two different collections?
So what is the answer? Well maybe it lies in
the combination of a number of qualities. Maybe it can't be reported,
and the intended user needs to experiment with a model in order to determine
if it's fit for their use. Perhaps if every model came with the differential
equations that it runs on, one could tell if it’s a fit model. Maybe
they should give me a Ph.D. in computer models and I sit around for the
next 20 years and rank models on the Crosier-O-Meter.
I don't know if any of these ideas resolves the
question, but it is a issue that needs to be addressed. The number
of computer models is growing at quite a rate, and the need for a standardized
metadata format for models is growing with each new model. Only through
a standardized format can models be compared and contrasted to find a model
fit for a particular application.