USGS - science for a changing world

Formal metadata: information and software

Formal metadata information and software

The config file

Introduction

See also Simple uses of the config file.

This note is intended to explain how to create and use a configuration file for cns, mp, xtme, and tkme. It's a good idea to look at the simple example and the deluxe example. A configuration file allows you to customize the following aspects of the behavior of these programs:

Config file basics

  1. General syntax

    The config file is indented text (what a surprise!). It consists of an input section and an output section, either of which may be left out. Within the output section there are subsections for each of the possible output file formats plus one for the parse tree and one for the errors. Anything that isn't recognized is going to be ignored, so make sure you've spelled the element names correctly. If you want to put comments in the file, I recommend you put them all at the beginning of the file and start each comment line with a pound character (#). You can put comments at other places but you have to maintain your indentation.

  2. Getting the programs to read the config file

    cns and mp have to be given a command-line argument -c cfile where cfile is the name of the config file. Xtme can take its config file name that way or through the X resource xtme*configFile. Choose Version from the Help menu in xtme for details.

  3. Errors in the config file

    These programs will complain if your config file has ambiguous indentation or is empty or cannot be found, but they won't complain if you have misspelled, misplaced, or omitted configuration elements. If you have a hard time getting them to recognize your config elements, you can insert a call to list_children() after the call to unify_scalars() in config.c and recompile the source code; this will dump the config file's parse tree to stderr, and should indicate what elements, if any, are not recognized. Recognized elements that aren't in the right places (for example, obeylines under SGML) are simply ignored; you'll just have to read carefully to find mistakes like that. Start with a template and you probably won't have this problem.

Elements of the config file

In general the file may contain an input element and an output element. Either of these may be left out. Letter case doesn't matter; you can capitalize if you want to. For some consistency, this description and all of the examples use all lower-case letters.

Compound elements

input = (profile) + (indent) + (blanks) + (1{extensions}n) +
  (codeset) + (1{tips}n) + (upgrade) + (prefix) + (prune) +
  (1{ext}n) + (language)
output = (errors) + (tree) + (text) + (sgml) + (xml) + (html) +
  (dif) + (info)
text = (file) + (cns) + (indent) + (top_level) + (prefix) + (wrap)
sgml = (file) + (tags) + (indent) + (blanks) + (skip_extensions)
xml = (file) + (indent) + (blanks) + (doctype) + (stylesheet) +
  (skip_extensions) + (order)
html = (file) + (faq) + (err) + (translate) + (preformat) +
  (meta) + (base) + (body) + (header | header_file) +
  (footer | footer_file) + (link) + (keyword) + (blanks) +
  (1{element}n)
info = (file)
dif = (file)
stylesheet = type + href
link = (label) + (link_faq) + (link_html) + (link_text) +
  (link_sgml) + (link_xml) + (link_dif)
keyword = (prefix) + (suffix)
element = key + (name) + (value)
name = (prefix) + (suffix)
value = (prefix) + (suffix) + (obeylines)

Data elements

file
Under output, the value given will be used as the name of the output file for the format under which the keyword appears. If the characters "%s" appear in the value, they will be replaced with the name of the input file as given on the command line. So, for example,
text:
  file: %s.txt
will result in the textual output being created with the name input_file.txt. If the input file ends with a standard suffix, that suffix will be clipped off before creating the output file name. Standard file name suffixes are
.txt
.sgml
.sgm
.xml
.text
.met
.bin
Additional suffixes can be recognized if they are specified using the ext element under input.

What appears in the configuration file for this option is overridden by the command-line option. Thus it is possible to overwrite the input file by specifying its name as one of the output files using a command-line option -e, -t, -h, -s, or -d.

faq
Under output:html, this element specifies the file name to be used for the FAQ-style HTML output. The syntax of the value is the same as for file above.

err
Under output:html, this element specifies the file name to be used for the HTML error output of err2html. The syntax of the value is the same as for file above. This element is noticed only by err2html.

cns
Under output:text, this element specifies the file name to be used for the indented text output of cns. The syntax of the value is the same as for file above.

ext
Under input, this element specifies a file name suffix that should be removed if found on the end of the input file name. This allows the user to augment the list of standard file name suffixes. This element is repeatable. Specify one ext element per file name suffix.

indent
Under input, the value strict causes indented text to be processed as hierarchical rather than linear. The default is to interpret indented unrecognized keywords as plain text.

Under output, the value given will be used once for each level in the hierarchy, preceding the output data on each line. This is a string that may be delimited by quotation marks ("). Backslash is used to include a character literally.

top_level
Under output:text, the value skip causes the tree to be output at the first node immediately below Metadata. This is a convenience since, presumably, everything in the file is contained in Metadata, so if this is not done, everything is indented except the keyword Metadata.

The default is to skip the Metadata keyword in text output, so the appearance of the keyword top_level WITHOUT the value skip causes the Metadata keyword (and its indentation throughout the file) to be retained.

blanks
Under input, the value ignore causes blank lines to be ignored in the metadata. Otherwise blank lines are included and processed as data. The parser attempts to prevent the assignment of children to blank lines.

Under output, the value associated with the keyword is output for every blank line in the input file, unless blanks are being ignored, in which case there won't be any in the parse tree when the output routine is called. For SGML output, the default is "" (empty); for HTML output, the default is "<P>\n".

extensions
Under input, the argument is the full path to a local file containing extensions to the standard that are used in the current document.

codeset
Under input, the argument is one of the values ISO8859-1, DOS, or MAC (letter case is ignored). This indicates the encoding of characters beyond ASCII 127. Default is ISO8859-1.

tips
Under input, the argument is the full path to a local file containing tips to be used by xtme.

tree
If specified, this keyword causes the parse tree to be dumped to stdout. If an argument is present, it is interpreted as the name of a file into which the parse tree should be dumped.

errors
The argument is like the argument to the file options under text, sgml, xml, html, dif, and binary.

tags
Under output:sgml, the value astm causes caused the ten-character tags described in the proposed ASTM D18.01.05 standard to be used. The use of ASTM tags is no longer supported. By default the eight-character tags in sgmltags.txt are used.

skip_extensions
Under output:sgml, this keyword causes the sgml code generator to skip elements that are not part of the 19940608 version of the metadata standard. By default extensions are included.

translate
Under output:html, textual strings in metadata values are translated into HTML. Specifically, the characters <, >, ", and & are converted into the corresponding entities (&lt;, &gt;, &quot;, and &amp;) and URLs of the form http://theURL or ftp://theURL are rendered as <a href="theURL">&lt;theURL&gt;</a> in the output. If you don't want this behavior, specify translate off.

preformat
Under output:html, preformat causes any groups of one or more lines that begin with > to be enclosed in <pre></pre> tags. An optional single-character value associated with this tag allows the user to determine what character will be used for indicating the lines that should be rendered in this manner. Default is >.

meta
Under output:html, the value off causes no meta tags to be generated. Otherwise Dublin Core meta tags will be generated.

body
Under output:html, the argument is used to replace %s in <body %s> allowing the user to modify the background color of the HTML output.

base
Under output:html, the argument is used to create a proper <BASE> tag in all HTML output. The argument should be the URL of the directory that will contain all of the output files that are generated by mp (except the error file) on the same run. This causes both relative links and the links to the other files to work in HTML output whether the record is accessed directly over the web or through the clearinghouse.

header
Under output:html, the argument is written at the beginning of the body in the HTML code, before the title and table of contents.

header_file
Under output:html, the argument is the name of a file whose contents are read and used as the header, as if they had been given as the value of the header element. Ignored if header has been specified.

footer
Under output:html, the argument is written at the end of the body in the HTML code, after the "generated by mp" line.

footer_file
Under output:html, the argument is the name of a file whose contents are read and used as the footer, as if they had been given as the value of the footer element. Ignored if footer has been specified.

link
At the request of a user I have recently rewritten mp's handling of the HTML files so that the line linking alternate versions of the metadata record can be written with a little more control. Specifically, there is now a config file option output:html:link containing the following components:
 output
 html
   link
     label (text)
     link_faq (text)
     link_html (text)
     link_text (text)
     link_sgml (text)
     link_xml (text)
     link_dif (text)
   header, footer, etc.
The text is optional in each case. If any of the text values is omitted, the default link will be given. That's a relative link to the file if the file was requested as one of the outputs. If the element (for example, link_sgml) is omitted, no link is provided for that format. To omit the link line altogether, just write the link element alone:
 output
 html
   link
   header, footer, etc.

If you don't have a link element within html of the output section of the config file, a link line is created with whatever formats you requested as output. Remember that this depends on having a proper <base> tag in the HTML files, so use the output:html:base element to specify the document root URL for your metadata.

So if you like the links as they are but would rather not have links to the SGML and DIF files, you can run with a config file that omits link_sgml and link_dif, and leaves out the text values:

 output
 html
   link
     link_faq
     link_html
     link_text
     link_xml
   header, footer, etc.

The text values provided for each type of output are used as the URL in the link line. So if you really want to generate your FAQ-style HTML on the fly using a CGI, you can write something like this:

 output
 html
   link
     link_faq  http://geo-nsdi.er.usgs.gov/cgi-bin/getmeta?form=faq&rec=%s
     link_html  http://geo-nsdi.er.usgs.gov/cgi-bin/getmeta?form=html&rec=%s

Note the %s in the text value. mp will replace that %s with the name (the name only, with the path and extension clipped off) of the input file. So for example, I could run as follows:

 $ mp -c config_file /wherever/metadata/echinoid.met
and if config_file were written as in the last example, the link line would look like this:
Available as [Questions & Answers] - [Outline]

The thing to note is that only the word "echinoid" was spliced into the URL provided in the config file. Both the "/wherever/metadata/" and the ".met" were omitted. This would generate a GET request to the HTTP server on geo-nsdi, passing the variables "form" and "rec" to the CGI program "getmeta".

Let me reiterate that because disk space is so inexpensive, I question the need to do on-the-fly generation of metadata. But people seem to want to try it, so perhaps this will provide some flexibility in how it is done.

label

Under html:link this allows you to specify replacement text for the phrase "Metadata also available as".

key
Under output:html:element, the argument is the name of the element for whom an HTML prefix and suffix may be associated with the name, the value, or both.

prefix
Under input, the argument is an unusual character string that some other metadata-generating program has used to identify CSDGM elements. It is used only by cns to distinguish element names that form part of the metadata structure from those that may occur within the text of an element's value.

Under output:text, the argument is an unusual character string that will be prepended to each element name. This could be used in conjunction with input:prefix which cns uses to distinguish between element names that are intended to denote the structure and those that merely appear at the beginning of a line in a text value.

Under output:html:element:name, the argument is the HTML code to be output immediately before the name of the element indicated by the associated key keyword.

Under output:html:element:value, the argument is the HTML code to be output immediately before the value associated with the element indicated by the associated key keyword.

Under output:html:keywords, the argument is the HTML code to be output immediately before each element name.

suffix
Under output:html:element:name, the argument is the HTML code to be output immediately following the name of the element indicated by the associated key keyword.

Under output:html:element:value, the argument is the HTML code to be output immediately following the value associated with the element indicated by the associated key keyword.

Under output:html:keywords, the argument is the HTML code to be output immediately following each element name.

obeylines
Under output:html:element:value, the presence of this element causes <br> tags to be emitted at the end of every line of the element's value. This element takes no modifiers.

upgrade
Under input, the value "no" causes mp to not upgrade the metadata to conform to the 1998 version of the CSDGM. By default mp will upgrade metadata.

prune
Under input, the presence of this element causes mp to remove all empty subtrees from the metadata. By default this is not done. Its effect is the same as running Xtme, selecting the top Metadata element, and choosing Prune from the Edit menu.

wrap
Under output:text, this directive causes the lines in the element values to be wrapped to fit a particular page width. Give a number as the value of this element; that will be the number of columns on the page. A good choice is 76. Blank lines are preserved, as are any lines beginning with a greater-than sign '>'. Note that the input file is not changed, and only the indented text output file is modified, not the SGML, XML, or HTML files.

doctype
href
Under output:xml:stylesheet, this element allows you to specify the hypertext reference (URL) for the stylesheet.
language
link_dif
link_faq
link_text
link_html
link_sgml
link_xml
order
profile
Appearing within input, this element identifies the official profile of the FGDC metadata standard to which the record aspires to conform. Valid values are
bio
Biological Data Profile
FGDC-STD-001.1-1999
sh
Metadata Profile for Shoreline Data
FGDC-STD-001.2-2001
rs
Extensions for Remote Sensing Metadata
FGDC-STD-012-2002
stylesheet
Appearing within output:xml, this element groups type and href, which together cause mp to include in the XML output a stylesheet reference.
type
Under output:xml:stylesheet, this optional element allows you to specify the type attribute of the xml-stylesheet element in XML output. If omitted, the value output will be "text-xsl".

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://(none)/tools/metadata/tools/doc/config.html
Page Contact Information: Peter Schweitzer
Page Last Modified: Tuesday, 11-Dec-2012 15:49:45 EST