Editorial: Top ten metadata mistakes

Lynda Wayne asked a number of people:

What are the top ten most common metadata errors?

As the 'brain-trust' of the national metadata effort, your input would be
appreciated.  Feel free to interpret the question in your own manner -
specific fields that are commonly misunderstood, bad approaches, general
confusion, or organizational/management issues.

(Lynda got replies from a number of people and has provided an edited version of the compiled responses on the FGDC web site. I've provided my response to the initial inquiry here because some of my concerns aren't reflected in her final version.)

Reply by Peter Schweitzer on 25 July 2000:

I would enlarge the problem beyond the metadata itself to the information processing in general and the process of data management. Here's my list, Letterman-style:

(Later I noticed that I had written 12, not 10, so I've renumbered them here.)

12.

(for Arc/Info users) Taking time to document things that are consequences of the GIS, like making detailed descriptions of AREA, PERIMETER, LPOLY#, RPOLY#, FNODE#, TNODE#, cover#, cover-ID, and the like.

11.

(for Arc/Info users) Simply dumping the results of ITEMS into an Entity_and_Attribute_Overview and calling that enough. People need to know the units of measured variables, and "percent" is not a unit of measure.

10.

Putting too much faith in mp. Human review is the thing that really matters. mp can help, but isn't the arbiter of what is and what is not good metadata. Prioritize errors like this, from most serious (fix) to least serious (understand and let go):

Indentation problems
Unrecognized elements
Misplaced elements
Too many of some element
Missing elements
Empty elements
Improper element values
Warnings and upgrades

Making too many metadata records. People who try to document every GIS coverage or data table can wear themselves out. Some aggregation is good for both the producer and the user. Ancillary coverages can be described as Source_Information.

Not making enough metadata records. Trying to cram all of the information about an entire research program into a single metadata record will drive you and your potential users crazy. Split when sources, processing, or spatial reference varies.

Agonizing over the most difficult elements. These include, but probably aren't limited to

Latitude_Resolution
Longitude_Resolution
Abscissa_Resolution
Ordinate_Resolution
Entity_Type_Definition_Source
Attribute_Definition_Source
Enumerated_Domain_Value_Definition_Source

and, to a lesser extent,

Attribute_Accuracy_Report
Logical_Consistency_Report

Misunderstanding Enumerated_Domain as numerical values. Attributes measure (or count), categorize, or characterize real things. Those functions are expressed as Range_Domain (for measures or counts) and Enumerated_Domain (for categories). The hardest part is how to describe attributes whose values are simply text describing something, or are names like place names. This is a deficiency in the FGDC standard; there should be another type of Attribute_Domain_Values for descriptions; a better alternative might be to make Attribute_Domain_Values mandatory if applicable and not applicable if the values are sufficiently descriptive.

Substituting statements about precision for statements about accuracy. I do this often, because what I know is how variable the values are, and I don't know the true values that they estimate.

Larding the metadata with uninformative values. People can honestly disagree about this, but I find it aggravating to see more than a few N/a, unknown, not applicable, implied, or see above (or below). Reasonable defaults should be assumed. For example, if no Process_Contact is indicated, people should assume that the Point_of_Contact either did the processing or knows who did, and that the people who did are either Originators or are listed in Data_Set_Credit. Likewise, any of the elements

Entity_Type_Definition_Source
Attribute_Definition_Source
Enumerated_Domain_Value_Definition_Source

if missing, should be assumed to have the value "this report" or some similar self-reference.

Choosing a tool because it's free or because it's commercial. Making, maintaining, reviewing, and reading metadata cost so much more time and energy than the tools do that price per se shouldn't direct the choice of tools.

Not recognizing that "the metadata problem" involves not only tools and training, but also work-flow strategy and even the philosophy governing how your organization interacts with the users of its data.

Not asking for help from the community. Beyond all the hype and promises (empty and fulfilled), beyond all the tools, training, and technology, what NSDI has done is bring a common language and common purpose to a highly diverse group of people, and we have found in each other consolation, challenge, and care.