Formal metadata: information and software
Formal metadata information and software
Reply by Peter Schweitzer on 8 Feb 2001:
This is a good question. First of all, thanks for taking the task seriously. I think it helps all of us to look carefully for quality in all that we do, and ensuring that we have good metadata helps to ensure that the data are usable and understandable.
I recommend three strategies. First, you'll want to get the record into good shape structurally. Second, review some specific groups of elements that people tend to have trouble with. Third, generate FAQ-style HTML output using mp and read it as though you were a non-expert user, looking for unanswered questions.
In general, I recommend that you familiarize yourself with "Metadata in Plain Language: A guide to authors and reviewers", which is at < http://geology.usgs.gov/tools/metadata/tools/doc/ctc/>
It helps to know how the author generated the metadata record. If the author used a word-processor like Microsoft Word, the record may have all kinds of problems and you might need to use cns to help clean it up. Most authors nowadays know to avoid these troubles, so this is becoming a less common problem.
When it all looks okay, run mp. The web
version of mp is handy for first attempts.
If you're using the downloaded version, generate an error file
-e switch (
mp input_file.met -e err).
There are two ways to look at the error file. You can just read it
or you can run
err2html to generate a more friendly-looking
report. The things to fix first are "ambiguous indentation" and
"Extraneous text following..." which indicate hard-to-spot
indentation problems. Next focus on any messages that say "no
element recognized", then on any that say "is not permitted in".
These are the most important structural problems and all need to
be fixed. Then look carefully wherever you get a "too many" error;
these can almost always be done better.
Don't worry too much about missing or empty elements or improper values at this stage of the review. Deal with those later.
U.S. Geological Survey Open-File Report U.S. Geological Survey Miscellaneous Investigations Map U.S. Geological Survey Miscellaneous Field Studies Map U.S. Geological Survey Professional Paper U.S. Geological Survey Digital Data Seriesfor Issue_Identification, I use MF-xxxx and I-xxxx and DDS-xxx but for OFRs and Professional Papers I use only the numbers, not OF or PP.
Procedures Used Reviews_Applied_to_Data Related_Spatial_and_Tabular_Data_Sets Other_References_CitedMost of the information in these "subsections" really goes into Process_Description, Cross_Reference, or Source_Information.
What really goes here is how the authors checked the attribute data. Certainly if you know there are errors and you aren't going to fix them before release, write that up too. But that rarely happens; most of the time we think we have it right, so here's the place to say what we did to review them.
A common question is whether to include as sources all of the references given in the report. I don't. I would include only those references from which data were taken directly, so for example if you can point to a line or attribute value and say "that came from Smith's 1997 map", make Smith's 1997 map a source.
ITEMS cover.PATand paste that into the file. People need to know what the field names mean in real terms, what the values mean individually if they are abbreviations, and what the units of the numbers are. Also people need to know what value is used to indicate missing data. All of these are better expressed in a Detailed_Description. As a reviewer, you might not be able to persuade the author to do a Detailed_Description, but you should insist that the information that real data users need be there.
If their value would be "author" or "this report", omit both
Read Metadata in Plain Language for help on how to do Attribute_Domain_Values. These are often done wrong, but doing them right means you end up checking the values, so it's a really good idea to do them right.
For shapefiles and other DBF files, I wrote a program called
dbfmeta that will generate a Detailed_Description
that you can use to document the data. You still have to put in the things
that users need to know, but
dbfmeta will give you the framework
to put that in. Also the Enumerated_Domain helper web form at
can make some of this work a lot easier.
Format_Name is the data format. Common values are
Arc/Info export (.e00)
1.0if it's a shapefile
Format_Version_Date should not be used unless a date is how that particular format is distinguished from other formats. This rarely occurs.
Format_Specification should be skipped unless you're dealing with a non-standard format, in which case describe it in detail here.
Transfer_Size is supposed to be in megabytes. I always try to include the word "megabytes" after the number anyway.
What formats and files should be documented in this way? I prefer to see the main data files or packages of them done. Often authors will make individual files available for download, and will include PDF's or PostScript versions as well, and additional text. The metadata needs to focus on the data, but it's okay to describe how to get these ancilliary files too.
Go back through the record and remove any empty elements. Tkme can do this for you if you choose Prune from the Edit menu.
mpand generate FAQ-style HTML. Do this with the
-fswitch, like this:
C:> mp myfile.met -e myfile.err -f myfile.faq.html
Now look at myfile.faq.html with a web browser. Look for questions that don't have answers, and ask whether there really should be an answer for these. Check to see whether the links work. Read the answers to see whether anything comes out strangely. Let me know if you see something that really doesn't make sense. This part is more art than science, so there's room for improvement.
There's a cheat-sheet that you can use to find out how mp is making the answers to the questions. It's on the web at <http://geology.usgs.gov/tools/metadata/tools/doc/plain.faq.html> This file was generated using mp, of course!
Don't hesitate to ask me for help. I want this to work for all of us. Of course, the down side is that I will probably reply, and it might turn out as a 5-page report. I know this is a lot to wade through, but I hope if you take it a little at a time it will be helpful.