Formal metadata: information and software
Formal metadata information and software
How to fix metadata created by DOCUMENT.aml
This document is a guide for users of ARC/INFO who have already used
DOCUMENT.AML or who have received coverages documented using it. The
problem is that DOCUMENT handles so much of the metadata so poorly
that the metadata must be almost completely rewritten to be usable in
the Clearinghouse. My objective is to make that rewrite easier by
telling you what to do to create good metadata using the information
that was gathered with DOCUMENT.
It is not my purpose here to deride ESRI or the original authors of
DOCUMENT or its users, but rather to explain what the DOCUMENT AML
does that is not, in my opinion, the best practice for creating
usable metadata for the NSDI. For the reasons explained here and on
Dan's report about FGDCMETA I now recommend that people not use DOCUMENT
to create metadata for ARC/INFO coverages. FGDCMETA does a better job,
although I would like to see it enhanced somewhat.
Read the documentation for FGDCMETA.AML
by Dan Nelson of the Illinois State Geological Survey. That page has a
lot of useful information for people who need to document coverages in
Don't create too many files. If you find yourself creating
metadata documents for a whole bunch of nearly-identical coverages,
lump them together into one metadata record. It'll be a lot easier to
refine and update.
Don't document aspects of the data set that are consequences of the
file format or the data model. For example, don't document the
perimeter, area, cover# and cover-ID items in an ARC/INFO polygon
attribute table; any user of ARC/INFO will know what these are and what
they mean, and even if they don't, ArcHelp describes them well enough.
Don't document any items that exist because you have employed a
published data model. If the item names, definitions, and permissible
values are published elsewhere, refer to the published information
instead. That's one of the advantages of using a published data
Don't pad the metadata with meaningless information. If you
don't know anything about attribute accuracy, and don't feel
comfortable substituting for accuracy information about the precision
of the attributes, leave the element out. Never fill in a value with "
Unknown" or "Not available" or direct the reader to another part of the
metadata. Just leave the element out. If you're keeping score with mp,
it'll look worse, but for the user, no data is no data and saying so
Strategies for converting old DOCUMENT output
If you have only a few coverages
The best way to deal with DOCUMENT output is to write the metadata out
using DOCUMENT FILE, then use the information it contains as the basis for
creating an entirely new metadata record. If you're a Unix user, you can
take advantage of the cut-and-paste facilities available in xedit and xtme.
Open the DOCUMENT FILE output in xedit, and open a new xtme window
alongside it. Follow the questions given in Metadata in
Plain Language, using the data from the old DOCUMENT record as basic
information, and filling in by hand where necessary. This means you'll do
some typing and a lot of cutting and pasting from the xedit window to the
xtme window. With some practice, this can be fairly efficient. But you'll
need to know what parts of the DOCUMENT output you need to examine closely;
some useful information is in the wrong places, and lots of useless
information may be included that you can simply ignore.
If you have a lot of coverages
Converting records one-by-one is okay if you have five or fewer of
them. But if you've got 20 or 100 or 200 old records, you'll want to
use some automated procedure for fixing them up. You'll have to make
some changes manually to each file, but these steps can be made easier
by using a multi-file text editor. I'm assuming that you have already
run DOCUMENT FILE to extract the metadata from the INFO tables.
I followed this procedure with 90 files created in DOCUMENT. I automated
the process somewhat by creating a Makefile that
encapsulated the command-line options. If you're familiar with the UNIX
make utility, you might want to try this arrangement out.
- Edit the input files, making the following changes:
- Where Description: occurs within Supplemental_Information under the headings "Revisions" or "Reviews applied to data", change it to read "Description of update".
- Where "Attributes" occurs within "Entity and Attribute Overview", change it to "List of Attributes".
- Where STATUS appears in a list of info table items, put > before the word STATUS. For such lists it is going to be useful to put that character before each list element.
- Where Purpose occurs within Supplemental_Information, put any letter before Purpose.
- Where Point of contact occurs within Supplemental_Information, put any letter before Point.
- Run cns with an alias file and an extension file designed specifically for this problem:
cns -c doc.cfg -a aliases input_file -i info -e leftovers -o cns.out
- Check the leftovers file to see that nothing important is in it.
Check the output file to see that things have been put in
reasonable places. Watch the indentation carefully--it's
supposed to be right now, so any irregularities indicate that cns
didn't do what you wanted it to do. Look especially for any case
in which things that you think are standard elements are aligned
with plain text--that usually means cns thinks those elements are
really just text. Check the info file to get more clues as to what
it was thinking.
- Run mp specifying -fixdoc. Generate only text:
mp -c doc.cfg cns.out -fixdoc -t mp.out -e err
Note that you have to feed mp the config file that brings in
the extensions found in doc.ext. The -fixdoc option
doesn't do that automatically.
Look carefully at the error file and the output file. You'll
probably see lots of "missing element" errors and a few "bad
value" errors. You should not see any "unrecognized"
or "misplaced" errors, and you should look carefully if you see
any "too_many" errors. These indicate that something was
misinterpreted in the process, and the solution will probably
require editing the input file.
Note also that you should not use -fixdoc
unless you are following this procedure. The code it executes
carries out some rather radical surgery on the metadata, which
must be examined closely when it is finished.
Problems in DOCUMENT output
Attributes should be identifiable data items in the info tables.
DOCUMENT creates an attribute without a label, whose definition and
definition source are copied from the corresponding Entity_Type.
Do not include this attribute in the metadata.
Attribute_Accuracy is not informative
DOCUMENT produces a structurally-complete but uninformative
Attribute_Accuracy section in Data_Quality_Information. In general
it looks like this:
Attribute_Accuracy_Report: See Entity_Attribute_Information
Attribute_Accuracy_Value: See Explanation
Attribute accuracy is described, where present, with each
attribute defined in the Entity and Attribute Section.
The element Entity_and_Attribute_Information is misspelled, the
Quantitative_Attribute_Accuracy_Assessment is superfluous,
and the section provides no information. Replace the whole thing with
a simple narrative as follows:
The (features) are identified using (characteristics) and are
questionable where (logical expression, like characteristic <
Logical_Consistency_Report is not informative
DOCUMENT produces a Logical_Consistency_Report that has no practical
value, describing only what kind of topology was built for the coverage,
for example, "Polygon topology present" or "Chain-node topology is
present". A user wants to know whether the relationships between features
and attributes varied through the spatial or the temporal range contained
within the data set and if so, how.
Supplemental_Information mostly contains info that should be elsewhere
- Procedures_Used is at least one Process_Step and should be moved into the Lineage.
- text under Revisions should be a Process_Step,
- Reviews_Applied_to_Data is irrelevant and the results of the reviews should be reported under Data_Quality_Information in the Attribute_Accuracy_Report, Logical_Consistency_Report, and Completeness_Report.
- Related_Spatial_and_Tabular_Data_Sets should be described as Cross_References if the relationship is topical or Sources if the relationship is genetic.
- Other_References_Cited should generally be described in the Lineage as Source_Information unless this is a long list of bibliographic references; cull from that list any references describing data that were used directly to produce the current data set and Describe them as sources in the Lineage. Leave the remaining references in Supplemental_Information.
- Notes belong in Supplemental_Information.
Use Security_Information only if your data are secret
DOCUMENT puts in useless security information into the metadata, like this:
Who cares? If there aren't any legal restrictions on the use of the data,
then you should have
and that should be sufficient. The same holds for security information in
the Metadata_Reference_Information. Unless your metadata are secret, just
leave the Metadata_Security_Information out entirely.
Unless it's imagery where clouds obscure the thing you're trying to see, leave it out!