- Tools for creation of formal metadata
The config file
Introduction
See also Simple uses of the config file.
This note is intended to explain how to create and use a configuration file for
cns,
mp,
xtme, and
tkme. It's a good idea to look at the
simple example and the
deluxe example. A configuration file allows you to customize the following aspects of the behavior of these programs:
- When reading metadata
- Recognize extensions and check their hierarchical structure
- Recognize registered profiles
- Show user-specified tips for extensions (in Xtme and Tkme)
- Require strict indentation
- Ignore blank lines
- Disable upgrade of the metadata to CSDGM version 2
- Prune empty subtrees from the metadata
- Help cns recognize element names
- When writing errors
- Name the error file
- Output the parse tree (used for debugging mp)
- When writing text
- Name the text output file
- Use an indent string other than the default (two spaces)
- Include the top level Metadata element
- Prepend a string to each element name
- Wrap lines to fit page width
- When writing SGML
- Name the SGML output file
-
Use 10-character ASTM tags for SGML rather than 8-character tags (no longer supported)
- Indent SGML output (not recommended)
- Output a specified string for blank lines in SGML (not recommended)
- Output only FGDC standard elements in SGML, omitting extensions
- When writing HTML
- Name the outline-style HTML output file
- Name the FAQ-style HTML output file
- Pass HTML code found in element values unchanged to the output
- Disable preformatted lists
- Disable Dublin-core <META> tag generation
- Specify the URL where you'll store your metadata (makes local links work)
- Give options for the <BODY> tag
- Specify markup to precede the metadata (a "header")
- Specify markup to follow the metadata (a "footer")
- Change the line that links to alternative forms of metadata
- Specify default markup for element names
- Specify markup for blank lines
- On an element-by-element basis, specify markup for the
- element name
- element value, with the option to break lines as in the input file
- When writing DIF
- When writing XML output
- Name the XML output file
- Specify the order of the elements
- Exclude extensions
Config file basics
- General syntax
The config file is indented text (what a surprise!). It consists of an input section and an output section, either of which may be left out. Within the output section there are subsections for each of the possible output file formats plus one for the parse tree and one for the errors. Anything that isn't recognized is going to be ignored, so make sure you've spelled the element names correctly. If you want to put comments in the file, I recommend you put them all at the beginning of the file and start each comment line with a pound character (#). You can put comments at other places but you have to maintain your indentation.
- Getting the programs to read the config file
cns and mp have to be given a command-line argument -c cfile where cfile is the name of the config file. Xtme can take its config file name that way or through the X resource xtme*configFile. Choose Version from the Help menu in xtme for details.
- Errors in the config file
These programs will complain if your config file has ambiguous indentation or is empty or cannot be found, but they won't complain if you have misspelled, misplaced, or omitted configuration elements. If you have a hard time getting them to recognize your config elements, you can insert a call to list_children() after the call to unify_scalars() in config.c and recompile the source code; this will dump the config file's parse tree to stderr, and should indicate what elements, if any, are not recognized. Recognized elements that aren't in the right places (for example, obeylines under SGML) are simply ignored; you'll just have to read carefully to find mistakes like that. Start with a template and you probably won't have this problem.
Elements of the config file
In general the file may contain an
input element and an
output element. Either of these may be left out. Letter case doesn't matter; you can capitalize if you want to. For some consistency, this description and all of the examples use all lower-case letters.
Compound elements
input = (profile) + (indent) + (blanks) + (1{extensions}n) +
(codeset) + (1{tips}n) + (upgrade) + (prefix) + (prune) +
(1{ext}n) + (language)
output = (errors) + (tree) + (text) + (sgml) + (xml) + (html) +
(dif) + (info)
text = (file) + (cns) + (indent) + (top_level) + (prefix) + (wrap)
sgml = (file) + (tags) + (indent) + (blanks) + (skip_extensions)
xml = (file) + (indent) + (blanks) + (doctype) + (stylesheet) +
(skip_extensions) + (order)
html = (file) + (faq) + (err) + (translate) + (preformat) +
(meta) + (base) + (body) + (header | header_file) +
(footer | footer_file) + (link) + (keyword) + (blanks) +
(1{element}n)
info = (file)
dif = (file)
stylesheet = type + href
link = (label) + (link_faq) + (link_html) + (link_text) +
(link_sgml) + (link_xml) + (link_dif)
keyword = (prefix) + (suffix)
element = key + (name) + (value)
name = (prefix) + (suffix)
value = (prefix) + (suffix) + (obeylines)
Data elements
-
-
file
-
Under output, the value given will be used as the name of the output file for the format under which the keyword appears. If the characters "%s" appear in the value, they will be replaced with the name of the input file as given on the command line. So, for example,
text:
file: %s.txt
will result in the textual output being created with the name input_file.txt. If the input file ends with a standard suffix, that suffix will be clipped off before creating the output file name. Standard file name suffixes are
.txt
.sgml
.sgm
.xml
.text
.met
.bin
Additional suffixes can be recognized if they are specified using the ext element under input.
What appears in the configuration file for this option is overridden by the command-line option. Thus it is possible to overwrite the input file by specifying its name as one of the output files using a command-line option -e, -t, -h, -s, or -d.
-
faq
-
Under output:html, this element specifies the file name to be used for the FAQ-style HTML output. The syntax of the value is the same as for file above.
-
err
-
Under output:html, this element specifies the file name to be used for the HTML error output of err2html. The syntax of the value is the same as for file above. This element is noticed only by err2html.
-
cns
-
Under output:text, this element specifies the file name to be used for the indented text output of cns. The syntax of the value is the same as for file above.
-
ext
-
Under input, this element specifies a file name suffix that should be removed if found on the end of the input file name. This allows the user to augment the list of standard file name suffixes. This element is repeatable. Specify one ext element per file name suffix.
-
indent
-
Under input, the value strict causes indented text to be processed as hierarchical rather than linear. The default is to interpret indented unrecognized keywords as plain text.
Under output, the value given will be used once for each level in the hierarchy, preceding the output data on each line. This is a string that may be delimited by quotation marks ("). Backslash is used to include a character literally.
-
top_level
-
Under output:text, the value skip causes the tree to be output at the first node immediately below Metadata. This is a convenience since, presumably, everything in the file is contained in Metadata, so if this is not done, everything is indented except the keyword Metadata.
The default is to skip the Metadata keyword in text output, so the appearance of the keyword top_level WITHOUT the value skip causes the Metadata keyword (and its indentation throughout the file) to be retained.
-
blanks
-
Under input, the value ignore causes blank lines to be ignored in the metadata. Otherwise blank lines are included and processed as data. The parser attempts to prevent the assignment of children to blank lines.
Under output, the value associated with the keyword is output for every blank line in the input file, unless blanks are being ignored, in which case there won't be any in the parse tree when the output routine is called. For SGML output, the default is "" (empty); for HTML output, the default is "<P>\n".
-
extensions
-
Under input, the argument is the full path to a local file containing extensions to the standard that are used in the current document.
-
codeset
-
Under input, the argument is one of the values ISO8859-1, DOS, or MAC (letter case is ignored). This indicates the encoding of characters beyond ASCII 127. Default is ISO8859-1.
-
tips
-
Under input, the argument is the full path to a local file containing tips to be used by xtme.
-
tree
-
If specified, this keyword causes the parse tree to be dumped to stdout. If an argument is present, it is interpreted as the name of a file into which the parse tree should be dumped.
-
errors
-
The argument is like the argument to the file options under text, sgml, xml, html, dif, and binary.
-
tags
-
Under output:sgml, the value astm
causes caused the ten-character tags described in the proposed ASTM D18.01.05 standard to be used. The use of ASTM tags is no longer supported. By default the eight-character tags in sgmltags.txt are used.
-
skip_extensions
-
Under output:sgml, this keyword causes the sgml code generator to skip elements that are not part of the 19940608 version of the metadata standard. By default extensions are included.
-
translate
-
Under output:html, textual strings in metadata values are translated into HTML. Specifically, the characters <, >, ", and & are converted into the corresponding entities (<, >, ", and &) and URLs of the form http://theURL or ftp://theURL are rendered as <a href="theURL"><theURL></a> in the output. If you don't want this behavior, specify translate off.
-
preformat
-
Under output:html, preformat causes any groups of one or more lines that begin with > to be enclosed in <pre></pre> tags. An optional single-character value associated with this tag allows the user to determine what character will be used for indicating the lines that should be rendered in this manner. Default is >.
-
meta
-
Under output:html, the value off causes no meta tags to be generated. Otherwise Dublin Core meta tags will be generated.
-
body
-
Under output:html, the argument is used to replace %s in <body %s> allowing the user to modify the background color of the HTML output.
-
base
-
Under output:html, the argument is used to create a proper <BASE> tag in all HTML output. The argument should be the URL of the directory that will contain all of the output files that are generated by mp (except the error file) on the same run. This causes both relative links and the links to the other files to work in HTML output whether the record is accessed directly over the web or through the clearinghouse.
-
header
-
Under output:html, the argument is written at the beginning of the body in the HTML code, before the title and table of contents.
-
header_file
-
Under output:html, the argument is the name of a file whose contents are read and used as the header, as if they had been given as the value of the header element. Ignored if header has been specified.
-
footer
-
Under output:html, the argument is written at the end of the body in the HTML code, after the "generated by mp" line.
-
footer_file
-
Under output:html, the argument is the name of a file whose contents are read and used as the footer, as if they had been given as the value of the footer element. Ignored if footer has been specified.
-
link
-
At the request of a user I have recently rewritten mp's handling of the HTML files so that the line linking alternate versions of the metadata record can be written with a little more control. Specifically, there is now a config file option output:html:link containing the following components:
output
html
link
label (text)
link_faq (text)
link_html (text)
link_text (text)
link_sgml (text)
link_xml (text)
link_dif (text)
header, footer, etc.
The text is optional in each case. If any of the text values is omitted, the default link will be given. That's a relative link to the file if the file was requested as one of the outputs. If the element (for example,
link_sgml) is omitted, no link is provided for that format. To omit the link line altogether, just write the link element alone:
output
html
link
header, footer, etc.
If you don't have a link element within html of the output section of the config file, a link line is created with whatever formats you requested as output. Remember that this depends on having a proper <base> tag in the HTML files, so use the output:html:base element to specify the document root URL for your metadata.
So if you like the links as they are but would rather not have links to the SGML and DIF files, you can run with a config file that omits link_sgml and link_dif, and leaves out the text values:
output
html
link
link_faq
link_html
link_text
link_xml
header, footer, etc.
The text values provided for each type of output are used as the URL in the link line. So if you really want to generate your FAQ-style HTML on the fly using a CGI, you can write something like this:
output
html
link
link_faq https://geo-nsdi.er.usgs.gov/cgi-bin/getmeta?form=faq&rec=%s
link_html https://geo-nsdi.er.usgs.gov/cgi-bin/getmeta?form=html&rec=%s
Note the %s in the text value. mp will replace that %s with the name (the name only, with the path and extension clipped off) of the input file. So for example, I could run as follows:
$ mp -c config_file /wherever/metadata/echinoid.met
and if config_file were written as in the last example, the link line would look like this:
Available as [Questions & Answers] - [Outline]
The thing to note is that only the word "echinoid" was spliced into the URL provided in the config file. Both the "/wherever/metadata/" and the ".met" were omitted. This would generate a GET request to the HTTP server on geo-nsdi, passing the variables "form" and "rec" to the CGI program "getmeta".
Let me reiterate that because disk space is so inexpensive, I question the need to do on-the-fly generation of metadata. But people seem to want to try it, so perhaps this will provide some flexibility in how it is done.
-
label
Under html:link this allows you to specify replacement text for the phrase "Metadata also available as".
-
-
key
-
Under output:html:element, the argument is the name of the element for whom an HTML prefix and suffix may be associated with the name, the value, or both.
-
prefix
-
Under input, the argument is an unusual character string that some other metadata-generating program has used to identify CSDGM elements. It is used only by cns to distinguish element names that form part of the metadata structure from those that may occur within the text of an element's value.
Under output:text, the argument is an unusual character string that will be prepended to each element name. This could be used in conjunction with input:prefix which cns uses to distinguish between element names that are intended to denote the structure and those that merely appear at the beginning of a line in a text value.
Under output:html:element:name, the argument is the HTML code to be output immediately before the name of the element indicated by the associated key keyword.
Under output:html:element:value, the argument is the HTML code to be output immediately before the value associated with the element indicated by the associated key keyword.
Under output:html:keywords, the argument is the HTML code to be output immediately before each element name.
-
suffix
-
Under output:html:element:name, the argument is the HTML code to be output immediately following the name of the element indicated by the associated key keyword.
Under output:html:element:value, the argument is the HTML code to be output immediately following the value associated with the element indicated by the associated key keyword.
Under output:html:keywords, the argument is the HTML code to be output immediately following each element name.
-
obeylines
-
Under output:html:element:value, the presence of this element causes <br> tags to be emitted at the end of every line of the element's value. This element takes no modifiers.
-
upgrade
-
Under input, the value "no" causes mp to not upgrade the metadata to conform to the 1998 version of the CSDGM. By default mp will upgrade metadata if the input format is text, and not if the input format is XML.
-
prune
-
Under input, the presence of this element causes mp to remove all empty subtrees from the metadata. By default this is not done. Its effect is the same as running Xtme, selecting the top Metadata element, and choosing Prune from the Edit menu.
-
wrap
-
Under output:text, this directive causes the lines in the element values to be wrapped to fit a particular page width. Give a number as the value of this element; that will be the number of columns on the page. A good choice is 76. Blank lines are preserved, as are any lines beginning with a greater-than sign '>'. Note that the input file is not changed, and only the indented text output file is modified, not the SGML, XML, or HTML files.
-
doctype
-
href
-
Under output:xml:stylesheet, this element allows you to specify the hypertext reference (URL) for the stylesheet.
-
language
-
link_dif
-
link_faq
-
link_text
-
link_html
-
link_sgml
-
link_xml
-
order
-
profile
-
Appearing within input, this element identifies the official profile of the FGDC metadata standard to which the record aspires to conform. Valid values are
-
bio
-
Biological Data Profile
FGDC-STD-001.1-1999
-
sh
-
Metadata Profile for Shoreline Data
FGDC-STD-001.2-2001
-
rs
-
Extensions for Remote Sensing Metadata
FGDC-STD-012-2002
-
stylesheet
-
Appearing within output:xml, this element groups type and href, which together cause mp to include in the XML output a stylesheet reference.
-
type
-
Under output:xml:stylesheet, this optional element allows you to specify the type attribute of the xml-stylesheet element in XML output. If omitted, the value output will be "text-xsl".