Frequently-asked questions on FGDC metadata

This list of frequently asked questions (FAQs) carries no official sanction with USGS or FGDC.
Contents

Motivation

What is metadata?
Metadata consist of information that characterizes data. Metadata are used to provide documentation for data products. In essence, metadata answer who, what, when, where, why, and how about every facet of the data that are being documented.

Online systems for handling metadata need to rely on their (metadata is plural, like data) being predictable in both form and content. Predictability is assured only by conformance to standards. The standard referred to in this document is the Content Standard for Digital Geospatial Metadata. I refer to this as the FGDC standard even though FGDC deals with other standards as well, such as the Spatial Data Transfer Standard (SDTS).

Why should I create metadata?
Metadata helps publicize and support the data you or your organization have produced.

Metadata that conform to the FGDC standard are the basic product of the National Geospatial Data Clearinghouse, a distributed online catalog of digital spatial data. This clearinghouse will allow people to understand diverse data products by describing them in a way that emphasizes aspects that are common among them.

Who should create metadata?
Data managers who are either technically-literate scientists or scientifically-literate computer specialists. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don't assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won't see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter.
Why is this so hard?!
While gain need not be proportional to pain, certainly if there is no pain, there will likely be no gain. Library catalog records aren't produced by the authors of books or magazines, and with good reason. To get consistency in documentation that emphasizes the common aspects of highly diverse products, you need some sophistication with MARC. The FGDC metadata effort is quite similar, but asks for more detail about the products themselves.

How do we deal with people who complain that it's too hard? The solution in most cases is to redesign the work flow rather than to develop new tools or training. People often assume that data producers must generate their own metadata. Certainly they should provide informal, unstructured documentation, but they should not necessarily have to go through the rigors of fully-structured formal metadata. For scientists or GIS specialists who produce one or two data sets per year it simply isn't worth their time to learn the FGDC standard. Instead, they should be asked to fill out a less- complicated form or template that will be rendered in the proper format by a data manager or cataloger who is familiar (not necessarily expert) with the subject and well-versed in the metadata standard. If twenty or thirty scientists are passing data to the data manager in a year, it is worth the data manager's time to learn the FGDC standard. With good communication this strategy will beat any combination of software tools and training.


The metadata standard

Why is the metadata standard so complex?
The standard is designed to describe all possible geospatial data.

There are 334 different elements in the FGDC standard, 119 of which exist only to contain other elements. These compound elements are important because they describe the relationships among other elements. For example, a bibliographic reference is described by an element called Citation_Information which contains both a Title and a Publication_Date. You need to know which publication date belongs to a particular title; the hierarchical relationship described by Citation_Information makes this clear.

What about Metadata-Lite?
The most cogent discussion of this topic is from Hugh Phillips, posted to the email list NSDI-L.

Begin excerpt from Hugh Phillips

Over the past several months there have been several messages posted in regard to Metadata 'Core.' Several messages reflected frustration with the complexity of the CSDGM and suggested the option of a simplified form or 'Core' subset of the full standard. At the other end of the spectrum was concern the full standard already is the 'Core' in that it represents the information necessary to evaluate, obtain, and use a data set.

One suggestion has been for the definition of a 'Minimum Searchable Set' i.e. the fields which Clearinghouse servers should index on, and which should be individually searchable. There have been proposals for this set, e.g. the Dublin Core or the recently floated 'Denver Core.' The suggested fields for the 'Denver Core' include:

Theme_Keywords
Place_Keywords
Bounding_Coordinates
Abstract
Purpose
Time_Period_of_Content
Currentness_Reference
Geospatial_Data_Presentation_Form
Originator
Title
Language
Resource_Description
Language (for the metadata) is an element not currently appearing the CSDGM. I have no problem with the Denver Core as a Minimum Searchable Set, it is mainly just a subset of the mandatory elements of the CSDGM, and hence should always be present.

In contrast, I am very much against the idea of defining a Metadata Content 'Core' which represents a subset of the CSDGM. If this is done, the 'Core' elements will become the Standard. No one will create metadata to the full extent of the Standard and as a result it may be impossible to ascertain certain aspects of a data set such as its quality, its attributes, or how to obtain it. I have sympathy for those who feel that the CSDGM is onerous and that they don't have time to fully document their data sets. Non-federal agencies can do whatever parts of the CSDGM they want to and have time for. As has been said, 'There are no metadata police.' However, whatever the reason for creating abbreviated metadata, it shouldn't be validated by calling it 'Core.' 'Hollow Core' maybe.

Okay. Let us cast aside the term 'Core' because it seems like sort of a loaded word. The fact is, there are many people and agencies who want a shortcut for the Standard because "It's too hard" or because they have "Insufficient time."

"It's too hard" is a situation resulting from lack of familiarity with the CSDGM and from frustration with its structural overhead. This could be remedied if there were more example metadata and FAQs available to increase understanding, through the act of actually trying to follow through the standard to the best of ones ability, and metadata tools that insulated the user from the structure. The first data set documented is always the worst. The other aspect to "Its too hard" is that documenting a data set fully requires a (sometimes) uncomfortably close look at the data and brings home the realization of how little is really known about its processing history.

"Insufficient time" to document data sets is also a common complaint. This is a situation in which managers who appreciate the value of GIS data sets can set priorities to protect their data investment by allocating time to document it. Spending one or two days documenting a data set that may have taken months or years to develop at thousands of dollars in cost hardly seems like an excessive amount of time.

These 'pain' and 'time' concerns have some legitimacy, especially for agencies that may have hundreds of legacy data sets which could be documented, but for which the time spent documenting them takes away from current projects. At this point in time, it seems much more useful to have a lot of 'shortcut' metadata rather a small amount of full blown metadata. So what recommendations can be made to these agencies with regard to a sort of 'minimum metadata' or means to reduce the documentation load?

  1. Don't invent your own standard. There already is one. Try to stay within its constructs. Subtle changes from the CSDGM such as collapse of compound elements will be costly in the long run - you won't be able to use standard metadata tools and your metadata may not be exchangeable.
    Don't confuse the metadata presentation (view) with the metadata itself.
  2. Consider data granularity. Can you document many of your data sets (or tiles) under an umbrella parent? Linda Hill and Mary Larsgaard have recently proposed a robust way to accomplish this in a modification of the standard which seems very insightful.
  3. Prioritize your data. Begin by documenting those data sets which have current or anticipated future use, data sets which form the framework upon which others are based, and data sets which represent your organization's largest commitment in terms of effort or cost.
  4. Document at a level that preserves the value of the data within your organization. Consider how much you would like to know about your data sets if one of your senior GIS operators left suddenly in favor of a primitive lifestyle on a tropical island.
End of excerpt from Hugh Phillips
Can I make new metadata elements?
Certainly. These are called extensions and should be used with caution. First of all, you should not add any element that you think is an obvious omission. If you think that FGDC left out something that everybody is going to need, you probably will find a place for the information in the existing standard. But the name or position of the standard element might be different than what you are expecting. Application-specific extensions, on the other hand, will be common. Every scientific discipline has terms and qualities that are unique or shared with only a few others. These cannot be practically incorporated into the Standard. Here are guidelines for creating extensions that will work:
  1. Extensions are elements not already in the Standard that are added to convey information used in a particular discipline.

    Example: in the NBII, Taxonomy is a component of Metadata, and is the root of a subtree describing biological classification.

  2. Extensions should not be added just to change the name of an existing element. Element names are essentially a problem to be solved in the user-interface of metadata software.
  3. Extensions must be added as children of existing compound elements. Do not redefine an existing scalar element as compound.

    Example: Do not add elements to Supplemental_Information; that field is defined as containing free text.

  4. Redefining an existing compound element as a scalar does not constitute an extension, but is an error.

    Example: Description contains the elements Abstract, Purpose, and Supplemental_Information. These components must not be replaced with free text.

  5. Existing elements may be included as children of extensions, but their inclusion under the extensions must not duplicate their functions within the standard elements.

    Example: To indicate contact information for originators who are not designated as the Point_of_Contact, create an additional element Originator_Contact, consisting of Contact_Information. But the element Point_of_Contact is still required even if the person who would be named there is one of the originators.

How do I create metadata?

First you have to understand both the data you are trying to describe and the standard itself. Then you need to decide about how you will encode the information. Normally, you will create a single disk file for each metadata record, that is, one disk file describes one data set. You then use some tool to enter information into this disk file so that the metadata conform to the standard. Specifically,

  1. Assemble information about the data set.
  2. Create a digital file containing the metadata, properly arranged.
  3. Check the syntactical structure of the file. Modify the arrangement of information and repeat until the syntactical structure is correct.
  4. Review the content of the metadata, verifying that the information describes the subject data completely and correctly.
A digression on conformance and interoperability

The FGDC standard is truly a content standard. It does not dictate the layout of metadata in computer files. Since the standard is so complex, this has the practical effect that almost any metadata can be said to conform to the standard; the file containing metadata need only contain the appropriate information, and that information need not be easily interpretable or accessible by a person or even a computer.

This rather broad notion of conformance is not very useful. Unfortunately it is rather common. Federal agencies wishing to assert their conformance with the FGDC standard need only claim that they conform; challenging such a claim would seem to be petty nitpicking. But to be truly useful, the metadata must be clearly comparable with other metadata, not only in a visual sense, but also to software that indexes, searches, and retrieves the documents over the internet. For real value, metadata must be both parseable, meaning machine-readable, and interoperable, meaning they work with software used in the Clearinghouse.

  • Parseable

    To parse information is to analyze it by disassembling it and recognizing its components. Metadata that are parseable clearly separate the information associated with each element from that of other elements. Moreover, the element values are not only separated from one another but are clearly related to the corresponding element names, and the element names are clearly related to each other as they are in the standard.

    In practice this means that your metadata must be arranged in a hierarchy, just as the elements are in the standard, and they must use standard names for the elements as a way to identify the information contained in the element values.

  • Interoperable

    To operate with software in the Clearinghouse, your metadata must be readable by that software. Generally this means that they must be parseable and must identify the elements in the manner expected by the software.

    The FGDC Clearinghouse Working Group has decided that metadata should be exchanged in Standard Generalized Markup Language (SGML) conforming to a Document Type Declaration (DTD) developed by USGS in concert with FGDC.

What tools are available to create metadata?

You can create metadata in SGML using a text editor. However, this is not advisable because it is easy to make errors, such as omitting, misspelling, or misplacing the tags that close compound elements. These errors are difficult to find and fix. Another approach is to create the metadata using a tool that understands the Standard.

One such tool is Xtme (which stands for Xt Metadata Editor). This editor runs under UNIX with the X Window System, version 11, release 5 or later. Its output format is the input format for mp (described below).

Hugh Phillips has prepared an excellent summary of metadata tools, including reviews and links to the tools and their documentation. It is at <http://sco.wisc.edu/wisclinc/metatool/>
What tools are available to check the structure of metadata?
mp is designed to parse metadata encoded as indented text, check the syntactical structure against the standard, and reexpress the metadata in several useful formats (HTML, SGML, TEXT, and DIF).
What tools are available to check the accuracy of metadata?
No tool can check the accuracy of metadata. Moreover, no tool can determine whether the metadata properly include elements designated by the Standard to be mandatory if applicable. Consequently, human review is required. But human review should be simpler in those cases where the metadata are known to have the correct syntactical structure.
Can't I just buy software that conforms to the Standard?
No! Tools cannot be said to conform to the Standard. Only metadata records can be said to conform or not. A tool that claimed to conform to the Standard would have to be incapable of producing output that did not conform. Such a tool would have to anticipate all possible data sets. This just isn't realistic. Instead, tools should assist you in entering your metadata, and the output records must be checked for both conformance and accuracy in separate steps.
Why is Attribute a component of Range_Domain and Enumerated_Domain?
This element appears to be intended to describe attributes that explain the value of another attribute. I have actually seen such a situation in one of the data sets I have studied. In that case the author of the data provided a real-valued number (meaning something like 0.1044327) in one attribute, and another attribute nearby could have the values "x" or not (empty). The presence of the value "x" in the second attribute indicated that the first attribute value was extremely suspect due to characteristics of the measured sample that were observed after the measurement was done. So, for example, we had something like this:
Sample-IDMeasurement1Quality1Measurement2
A10.880201 0.3
B20.910905x0.4
C30.570118x0.2
C30.560518 0.1

So the variable Quality1 exists only to indicate that some values of Measurement1 are questionable. Note that values of Measurement2 are not qualified in this way; variations in the quality of Measurement2 are presumably described in the metadata.

In summary, the Attribute component of Range_Domain and Enumerated_Domain allow the metadata to describe data in which some attribute qualifies the value of another attribute.

I agree with Doug that this describes data with more structural detail than many people expect, and in the case I described there were so many variables (430) in the data set that I quickly gave up on the entire Detailed_Description and provided an Overview_Description instead. If we had some fancy tools (Visual Data++?) that understood relationships among attributes like this, people would be more interested in providing the metadata in this detailed manner. Nevertheless I think the basic idea makes sense.

What if my Process_Step happened over a period of time, not just one day, month, or year?
This is a weakness in the metadata standard. It assumes that the "date" of a process can be described well as a day, a month, or a year. I have encountered process steps that spanned multiple years, and I agree that it seems pointless to attach a single date to such things. It's especially annoying when the single date would probably be the date the process was completed, which is often the same as the publication date of the data set. That date shows up so often anyway in the metadata that it becomes meaningless.

There are two solutions. The first is to "fix the standard" by using an extension. For example, I could define an extension as

Local:
Name: Process_Time_Period
Parent: Process_Step
Child: Time_Period_Information
SGML: procper
Then to describe something that happened between 1960 and 1998, I could write
...
Process_Step:
  Process_Description: what happened over these many years...
  Process_Date: 1998
  Process_Time_Period:
    Time_Period_Information:
      Range_of_Dates/Times
        Beginning_Date: 1960
        Ending_Date: 1998
This is elegant in its way, but is likely to be truly effective only if many people adopt this convention. A more practical solution for the present would be to skirt the rules about the content of the Process_Date element. In this example, I would just write
...
Process_Step:
  Process_Description: what happened over these many years...
  Process_Date: 1960 through 1998
Now see that the value of Process_Date begins with a proper date, and contains some additional text. So any software that looks at this element will see a date, and may complain that there's more stuff there, but will at least have that first date. That's what mp does; if it finds a date, it won't complain about any additional text it finds after the date.

Metadata file format

What is the file format for metadata?
The format for exchange of metadata is SGML conforming to the FGDC Document Type Declaration. This is not generally something you want to make by hand. The most expedient way to create such a file is to use mp, a compiler for formal metadata. That tool takes as its input an ASCII file in which the element names are spelled out explicitly and the hierarchical structure of the metadata are expressed using consistent indentation. A more complete specification of this encoding format is at <https://geology.usgs.gov/tools/metadata/tools/doc/encoding.html>
Could you could explain a little about the rationale behind recommending SGML?
Arguments FOR SGML:
  1. It is an international standard, used extensively in other fields such as the publishing industry.
  2. It is supported by a lot of software, both free and commercial.
  3. It can check the structure of the metadata as mp does. (It can't check the values well, but this isn't a serious limitation because mp doesn't check the values especially well either--it is designed to assist human reviewers by assuring them that the structure is correct. In theory we could use SGML's attribute mechanism to check values, but this will make the DTD more complicated. I think that would be unwise until we have developed a broad base of expertise in using SGML among metadata producers.)
  4. Additional tools (relatively new, unfortunately) allow SGML documents to be reexpressed in arbitrary ways using a standard scripting language, Document Style Semantics and Specification Language (DSSSL), also an ISO standard.
  5. It can handle arbitrary extensions (in principle).
Arguments AGAINST SGML:
  1. The metadata-producing community doesn't have much experience using it to solve problems yet.
  2. We aren't using SGML tools; the only thing we do with SGML is create our searchable indexes with it.
  3. Learning to use SGML effectively adds significantly to the educational cost of handling metadata. Imagine an interested GIS user, struggling to learn Arc/Info. He or she wants to produce well-documented data, and so starts to learn the metadata standard, with its 335 elements in a complex hierarchy. To use SGML effectively, she'll need to know the general principles of SGML, along with some procedures. She'll have to select, locate, install, and learn to use some SGML software too. To create customized reports she'll need to learn DSSSL (300-page manual), which is really written in a subset of LISP called Scheme (another 50-page manual). Until the use of SGML for metadata is pioneered, this is not a satisfactory solution.
  4. Our current DTD doesn't allow extensions yet. I'm the only one working on the DTD, and I don't have enough experience with SGML to really exploit it, although I sort of understand what to do to make the DTD more flexible. There's a shortage of manpower and time needed to solve this problem.
Conclusion:

We should aim to handle metadata using SGML in the future, but I should continue to develop mp and its relatives, ensuring that my tools support the migration to SGML. We need much more expertise devoted to SGML development, and that isn't happening yet. For practical purposes the more complete solution at the moment is xtme->mp or cns->xtme->mp. These tools handle arbitrary extensions already, and mp can create SGML output if needed for subsequent processing. Where possible, we should encourage agencies to invest in the development of tools for handling metadata in SGML, but this isn't a "buy it" problem, it's a "learn it" problem--much more expensive. With the upcoming revision of the metadata standard, we need to build a DTD that can be easily extended.

Why do I have to use indentation?
You don't. What you have to do is communicate by some method the hierarchical nature of your metadata. You have to present the hierarchy in a way that a computer can understand it, without user intervention. The simplest readable way to do this is by using indented text with the element names as tags. You can use SGML directly, but you have to make sure that you close each element properly. The DTD doesn't allow end-tags to be omitted. And mp will generate SGML for you, if you feed it indented text.
Why shouldn't I use the section numbers from the Standard?
  1. They will probably change. They are essentially like page numbers; with a revision of the standard, both the page numbers and the section numbers will change.
  2. They aren't meaningful. Readers will generally be less aware of the metadata standard's structure than will data producers, and they won't understand the numbers at all.
  3. They express the hierarchy but not the instance information. For elements that are nested and repeated, the numbers show the nesting but not the repetition. Thus they don't really convey the structure well.
  4. It isn't easier to use the numbers. The long names can be pasted into your metadata using the dynamic data exchange of your window system, so you don't have to type them. Better still, start with a template that contains the long names, or use an editor that provides them.
But I have already been using a template for metadata that mp can't read. What do I do with the records?
Put them through cns. This is a pre-parser that will attempt to figure out the hierarchical structure from metadata that aren't properly indented. This job is tricky, and cns isn't likely to understand everything you've done. So you'll have to look carefully at its output, and merge information from its leftovers file in with the parseable output that it generates. Then you should run the results through mp.
How does mp handle elements that are "mandatory if applicable"?
"Mandatory if applicable" is treated by mp the same as optional. Remember that mp is a tool to check syntactical structure, not accuracy. A person still has to read the metadata to determine whether what it says about the data is right.

In principle, you could create elaborate rules to check MIA dependencies, but I think that would complicate mp too much, making it impossible to support and maintain.

Can I start an element's value right after the element name and continue the value on subsequent lines?
Yes! Previously not permitted, this form is now supported:
    Title: Geometeorological data collected by the USGS Desert Winds
      Project at Gold Spring, Great Basin Desert, northeastern
      Arizona, 1979 - 1992
Can I vary the indentation in the text
Yes! But the variations in indentation won't be preserved in the output files. Don't try to maintain any formatting of the text in your input files; the formatting will not survive subsequent processing. We hope eventually to be able to exploit the DTD of the TEI for this purpose, but at the moment those tags will be passed through as is. The variation of indentation that is permitted looks like this:
    Title:
      Geometeorological data collected
           by the USGS Desert Winds
         Project at Gold Spring, Great Basin Desert, northeastern
      Arizona, 1979 - 1992

Running mp, xtme, and cns

Help!
Help is available. Please email pschweitzer@usgs.gov (Here I used to mention an mp-users email list, but IT security concerns have made it difficult for me to maintain a specific list for this software. Questions that might be of interest to others can be directed to metadata@geocomm.com, and you're welcome to contact me for assistance or advice.
Why do I get so many messages?
Sometimes a single error will produce more than one message. If you put in too many of something, you'll get a message at the parent element and you'll get a similar message at the offending child element.
What are these line numbers?
The numbers correspond to lines in your input metadata file. Use a text editor that can indicate what line you're on (or better, can jump to any particular line by number) to help you understand the message.
What are these errors "(unknown) is not permitted in name"?
You've got something in the wrong place. If you think it is in the right place, look closely--you may have omitted a container such as Citation_Information, Time_Period_Information, or Contact_Information. Mp requires that the full hierarchy be included even when the structure is clear from the context.
Can I just ignore warnings?
Always read them to understand what they mean. Sometimes a warning is just an unexpected arrangement. Other times a warning may indicate that your metadata are not being interpreted the way you think they should be.
What are these warnings "Element name 1 has child name 2, expected (unknown); reclassifed as text"?
An official element name appears at the beginning of a line in the text of the element name 1. mp is telling you that it considers this to be plain text rather than a node of the hierarchy. Ignore the warning if it really is plain text. If it isn't, see if it was supposed to go somewhere else.
How does mp handle URLs and other HTML code?
(Revised 25-March-1998) mp now recognizes URLs in all contexts and makes them live links in the HTML output. You should not use HTML code in your element values because there's no reason to believe that in the future the metadata will be processed by systems that understand HTML. If you must add HTML to the resulting documents, I recommend that you hack the HTML output of mp for this purpose.

Note that mp now provides "preformatting" in which groups of lines that begin with greater-than symbols will be rendered preformatted, prefaced with <pre> and followed by </pre> in the HTML output. The leading >'s will be omitted from the HTML output. For example, the following metadata element

  Completeness_Report:
    Data are missing for the following days
    >19890604
    >19910905
    >19980325
will be rendered as follows in HTML:
<dt><em>Completeness_Report:</em>
<dd>
<pre>
19890604
19910905
19980325
</pre>
Why does cns choke when an element name appears at the beginning of a line in the text?
This is a limitation of cns. It's not an automatic procedure. The logic that it uses to determine what's in the file cannot cope well with some of these situations. The reason why it does this is that it's trying to divine the hierarchical structure in text that isn't structured hierarchically. It has to make assumptions about where standard element names will be, so that it can recognize them properly when they are in the right places. When you're using cns, you have to look carefully at both the input and the output. Always look at the leftovers file, because it will show where really severe problems occur. But be aware that some less obvious problems sometimes occur; sometimes an element that's spelled wrong will be lumped into the text of the previous element.
Can you forecast the fate of mp? A number of my colleagues here have expressed concern about committing to tools that "go away."
In the long run this is an argument in favor of SGML. In the short run that doesn't carry much weight, because we haven't developed the capability to do with SGML what mp now does with indented text. Moreover, I don't see anybody working on that problem yet.

Also, I would point out that during the two years of its existence mp has a better support history than many of the other tools for producing metadata (see mp-doc). Corpsmet and MetaMaker are probably the next-best-supported tools. The PowerSoft-based NOAA tool was created by contractors who have since disappeared. USGS-WRD tried to pass maintenance of DOCUMENT off to ESRI, and ESRI hasn't made needed improvements; Sol Katz (creator of blmdoc) still works for BLM but has been assigned to other work. None of the other tools seems to have gotten wide acceptance. Paying contractors to write software seems to carry no guarantee that the software will be adequately supported. Home-grown software carries no guarantee either. Whether you "pays your money" or not, you still "takes your chances".

On the other hand...

The source code of mp is freely available. It has been built for and runs on many systems--I support 6 different varieties of Unix, MS-DOS, and Win95+NT, and I know it is working on several other Unix systems. The task of updating it might be daunting for an individual not conversant in C, but if I were hit by a truck tomorrow, the task wouldn't likely fall to an individual--it would be a community effort because lots of people have come to depend on it.

And remember...

The most fundamental thing we can do to make progress is to create parseable, structured documentation. The key to the whole effort is to emphasize what is consistent about our diverse data sets, and to exploit that consistency as a way of making it easier to discover and use spatial data of all types. You can always combine metadata elements to fit a more general schema; the difficult operation (because it requires a sophisticated person devote attention and time to each record) is to go the other way, searching through an unstructured text to cull out key facts.
Are mp, xtme, Tkme, and cns year-2000 compliant?
Yes, dates are handled using the standard ANSI C date structures and functions. On most UNIX systems dates are stored internally as signed 32-bit integers containing the number of seconds since January 1, 1970, so the problems, if any, would not occur until 2038. None of these programs bases any decision on the difference between two dates.
Do mp, Tkme, and cns run on Windows 2000? XP?
Yes. These run on 95, 98, ME, NT, 2000, and XP.
How can I make the text output fit within the page?

This shouldn't be necessary, since metadata are best printed from one of the HTML formats, and the web browser will wrap the text to fit the screen and page. However, for those who really want to have the plain text version fit within an 80-column page, there is a way to do it. Use a config file, with an output section, and within that a text section. Within output:text, specify wrap 80 like this:

output
text
  wrap 80
You don't have to use 80. I think it looks better with a narrower page, like 76. mp factors in the indentation of each line, assuming 2 spaces per level of indentation. Blank lines are preserved. Any line beginning with the greater-than sign > is preserved as is.

Note that this affects only the text output. Neither mp nor cns ever modifies the input file. But if you like the resulting text file, you can replace your input file with it.


Metadata storage and management

How do I put FGDC metadata into my relational database?
This turns out to be a fairly complicated problem. I had originally answered this question with a simplistic assumption that it could not be easily done in a general way, but I now defer to others who know much more about relational database management systems than I do.
Jim Frew writes:
You can easily represent recursion in a relational model. For example:
CREATE TABLE attribute (
  pk_attribute          key_t  PRIMARY KEY
  fk_enumerated_domain  key_t  REFERENCES enumerated_domain

  attribute_stuff ...
  )

CREATE TABLE enumerated_domain (
  pk_enumerated_domain  key_t  PRIMARY KEY
  fk_attribute          key_t  REFERENCES attribute

  enumerated_domain_stuff ...
  )
where key_t is a type for storing unique identifiers (e.g., Informix's SERIAL).

The tricky part, of course, is getting the information back OUT again. It's true, you can't write a query in standard SQL-92 that will traverse the tree implicit in the above example (i.e., will ping-pong between fk_enumerated_domain and fk_attribute until fk_attribute is NULL.)

However, most (all?) DBMS vendors support procedural extensions (e.g., looping) to SQL, which make the query possible. Additionally, some vendors have extended SQL to directly support tree-structured information (e.g., Oracle's CONNECT BY.)

Ultimately, you have to consider why you're storing FGDC metadata in a relational database. As we learned on the Alexandria Project:

  1. Attributes that are likely to be searched (e.g. Bounding_Coordinates) can be managed differently from attributes that will only be regurgitated for an occasional report (e.g. Metadata_Security_Handling_Description)
  2. Some nooks and crannies of the standard (e.g. Dialup_Instructions) just aren't worth supporting, period. Often these are the pieces that add the most complexity.
In other words, while it's possible to do everything with a fully-normalized relational schema, it may not be desirable.
Examples of recursive SQL queries (references from Jim Frew)
  • Celko, Joe (1995) Joe Celko's SQL for Smarties : Advanced SQL programming. Morgan Kaufmann, San Francisco. [see chapter 26, "Trees"]
  • Date, C. J. (1995) An Introduction to Database Systems, 6th ed. Addison-Wesley, Reading, MA. [see pp. 266..267]
  • Informix Software, Inc. (1996) Informix Guide to SQL (Tutorial, Version 7.2). Informix Press, Menlo Park, CA. [see pp. 5-27..5-29]
  • Koch, George, and Kevin Loney (1997) ORACLE8: The Complete Reference. Osborne/McGraw-Hill, Berkeley, CA. [see pp. 313..324]
Some other references
I ran DOCUMENT in ARC/INFO. Now what do I do?
Run DOCUMENT FILE to extract the metadata from the INFO tables, then rewrite the metadata using what DOCUMENT FILE supplies as input. More details are in How to fix metadata created by DOCUMENT.aml.
How do I handle data that already has metadata?
When we acquire a GIS map layer that was created by some other entity, and that entity has already created metadata for the layer, how should that layer be documented in our metadata? Should that metadata be part of, or referenced, in the metadata we create for it?

I think how you handle it depends on what you do with the data:

  1. You use the data layer pretty much as is, maybe changing projection. You don't intend to distribute the modified layer to the public.

    Use their metadata. No real need to change it, but if you do some non-destructive change like reprojection, just add a Process_Step to the metadata indicating what you did. You can even add a Process_Contact with your info so that anyone who has questions about that particular operation can ask questions.

  2. You modify the data and repackage it for distribution to the public, perhaps as part of a group of layers making up a map set.

    Start with their metadata. Take the Contact_Information in Point_of_Contact, and move it to all of the Process_Steps that don't already have a Process_Contact. Replace Point_of_Contact with yourself. Take Metadata_Contact, move it into a new Process_Step whose description is "create initial metadata", where Process_Date is the previous value of Metadata_Date. Modify other parts of the metadata to reflect your changes to the data (document these in your own Process_Step, too), then make yourself the Metadata_Contact. Tag--you're IT!

  3. You use it as a basis for a study of the same information, adding and changing features and attributes as you make new observations.

    Use the existing metadata record to create a Source_Information which you will annotate (Source_Contribution) to describe how you incorporated this layer in your own work. Put this Source_Information into a new metadata record that describes your data; it will thus properly attribute the work of the people who created the source data.

What about these errors with metadata from ArcCatalog?

It depends on what sort of errors they are. ArcCatalog, like Tkme, must allow you to create metadata with errors such as missing elements and empty elements. If I'm using a metadata editor, I don't want it to refuse to work if I merely leave something out--I might want to work in stages, adding some information now and more information later.

What's more important, of course, is that ArcCatalog has no way to know whether what people type into it is actually correct (meaning what you say about the data--is it right?). So we don't want people to rely on mp alone to judge the correctness of metadata. We should instead use mp to help us find out what we've left out or done wrong in the structure of the metadata, and then we have to read the metadata itself to figure out whether it actually describes the data well.

There is one way that valid metadata from ArcCatalog might be judged incorrect by mp, however. If I create metadata in ArcCatalog, then read it with mp but without telling mp that the metadata record uses ESRI extensions, then mp will complain that some of the elements aren't recognized. For example, ESRI includes in the metadata an element called Attribute_Type that tells whether a given attribute is an integer, character, or floating-point variable. This isn't in the FGDC standard, so mp will complain when it sees this element in the metadata. The fix is to tell mp you're using the ESRI extensions. A config file can be used for this purpose.


Metadata dissemination

How do I become a Clearinghouse node?
I defer to FGDC. Specifically, look at Doug Nebert's December 1995 discussion paper What it means to be an NSDI Clearinghouse Node, also his on-line training materials for the preparation, validation, and service of FGDC metadata using the spatial Isite software.