USGS - science for a changing world

Formal metadata: information and software

Formal metadata information and software

How to fix metadata created by DOCUMENT.aml


This document is a guide for users of ARC/INFO who have already used DOCUMENT.AML or who have received coverages documented using it. The problem is that DOCUMENT handles so much of the metadata so poorly that the metadata must be almost completely rewritten to be usable in the Clearinghouse. My objective is to make that rewrite easier by telling you what to do to create good metadata using the information that was gathered with DOCUMENT.
It is not my purpose here to deride ESRI or the original authors of DOCUMENT or its users, but rather to explain what the DOCUMENT AML does that is not, in my opinion, the best practice for creating usable metadata for the NSDI. For the reasons explained here and on Dan's report about FGDCMETA I now recommend that people not use DOCUMENT to create metadata for ARC/INFO coverages. FGDCMETA does a better job, although I would like to see it enhanced somewhat.

General advice

Strategies for converting old DOCUMENT output

If you have only a few coverages

The best way to deal with DOCUMENT output is to write the metadata out using DOCUMENT FILE, then use the information it contains as the basis for creating an entirely new metadata record. If you're a Unix user, you can take advantage of the cut-and-paste facilities available in xedit and xtme. Open the DOCUMENT FILE output in xedit, and open a new xtme window alongside it. Follow the questions given in Metadata in Plain Language, using the data from the old DOCUMENT record as basic information, and filling in by hand where necessary. This means you'll do some typing and a lot of cutting and pasting from the xedit window to the xtme window. With some practice, this can be fairly efficient. But you'll need to know what parts of the DOCUMENT output you need to examine closely; some useful information is in the wrong places, and lots of useless information may be included that you can simply ignore.

If you have a lot of coverages

Converting records one-by-one is okay if you have five or fewer of them. But if you've got 20 or 100 or 200 old records, you'll want to use some automated procedure for fixing them up. You'll have to make some changes manually to each file, but these steps can be made easier by using a multi-file text editor. I'm assuming that you have already run DOCUMENT FILE to extract the metadata from the INFO tables.
  1. Edit the input files, making the following changes:

    1. Where Description: occurs within Supplemental_Information under the headings "Revisions" or "Reviews applied to data", change it to read "Description of update".
    2. Where "Attributes" occurs within "Entity and Attribute Overview", change it to "List of Attributes".
    3. Where STATUS appears in a list of info table items, put > before the word STATUS. For such lists it is going to be useful to put that character before each list element.
    4. Where Purpose occurs within Supplemental_Information, put any letter before Purpose.
    5. Where Point of contact occurs within Supplemental_Information, put any letter before Point.
  2. Run cns with an alias file and an extension file designed specifically for this problem:
    cns -c doc.cfg -a aliases input_file -i info -e leftovers -o cns.out
    1. Check the leftovers file to see that nothing important is in it.
    2. Check the output file to see that things have been put in reasonable places. Watch the indentation carefully--it's supposed to be right now, so any irregularities indicate that cns didn't do what you wanted it to do. Look especially for any case in which things that you think are standard elements are aligned with plain text--that usually means cns thinks those elements are really just text. Check the info file to get more clues as to what it was thinking.
  3. Run mp specifying -fixdoc. Generate only text:
    mp -c doc.cfg cns.out -fixdoc -t mp.out -e err
    1. Note that you have to feed mp the config file that brings in the extensions found in doc.ext. The -fixdoc option doesn't do that automatically.
    2. Look carefully at the error file and the output file. You'll probably see lots of "missing element" errors and a few "bad value" errors. You should not see any "unrecognized" or "misplaced" errors, and you should look carefully if you see any "too_many" errors. These indicate that something was misinterpreted in the process, and the solution will probably require editing the input file.
    3. Note also that you should not use -fixdoc unless you are following this procedure. The code it executes carries out some rather radical surgery on the metadata, which must be examined closely when it is finished.
I followed this procedure with 90 files created in DOCUMENT. I automated the process somewhat by creating a Makefile that encapsulated the command-line options. If you're familiar with the UNIX make utility, you might want to try this arrangement out.

Problems in DOCUMENT output

Attribute_Label: -
Attributes should be identifiable data items in the info tables. DOCUMENT creates an attribute without a label, whose definition and definition source are copied from the corresponding Entity_Type. Do not include this attribute in the metadata.
Attribute_Accuracy is not informative
DOCUMENT produces a structurally-complete but uninformative Attribute_Accuracy section in Data_Quality_Information. In general it looks like this:
    Attribute_Accuracy_Report:  See Entity_Attribute_Information
      Attribute_Accuracy_Value:  See Explanation
        Attribute accuracy is described, where present, with each
        attribute defined in the Entity and Attribute Section.
The element Entity_and_Attribute_Information is misspelled, the Quantitative_Attribute_Accuracy_Assessment is superfluous, and the section provides no information. Replace the whole thing with a simple narrative as follows:
      The (features) are identified using (characteristics) and are
      questionable where (logical expression, like characteristic <
      critical value).
Logical_Consistency_Report is not informative
DOCUMENT produces a Logical_Consistency_Report that has no practical value, describing only what kind of topology was built for the coverage, for example, "Polygon topology present" or "Chain-node topology is present". A user wants to know whether the relationships between features and attributes varied through the spatial or the temporal range contained within the data set and if so, how.
Supplemental_Information mostly contains info that should be elsewhere
Use Security_Information only if your data are secret
DOCUMENT puts in useless security information into the metadata, like this:
    Security_Classification_System: None
    Security_Classification:  UNCLASSIFIED
    Security_Handling_Description: None
Who cares? If there aren't any legal restrictions on the use of the data, then you should have
  Access_Constraints: none
  Use_Constraints: none
and that should be sufficient. The same holds for security information in the Metadata_Reference_Information. Unless your metadata are secret, just leave the Metadata_Security_Information out entirely.
Unless it's imagery where clouds obscure the thing you're trying to see, leave it out!

Accessibility FOIA Privacy Policies and Notices logo U.S. Department of the Interior | U.S. Geological Survey
Page Contact Information: Peter Schweitzer
Page Last Modified: Thursday, 29-Dec-2016 18:24:53 EST