MQ: A Tcl interface to query formal metadata

In the course of my work with the National Geospatial Data Clearinghouse, I find I need to query metadata records to get a variety of information from them. For example, it is helpful to have a list of records with the title of each record, not just its file name. Similarly the keywords might need to be extracted so that browse interfaces can be built for the collection of records. After doing some of these tasks with custom programs written in C, based on the guts of mp, it became clear that a simpler high-level programming interface would be a lot easier to use. After working with the XSL transformation language, I have come to the conclusion that the operations I need to do are better suited to working in a different high-level programming language such as Tcl.

About mq

mq is an extension of Tcl/Tk that provides the Tcl script writer a number of capabilities for handling metadata. In addition to the standard Tcl language, the following procedures may be called.

config commands

config read config_file
This command reads a standard config file, which will apply to all metadata read in this Tcl session. You can read only one config file, and you have to do it before you read any metadata.
config find_first option
This command returns the address of the first occurrence of the given option in the config file. If the result is zero, the option was not found in the config file.
config find_next address ?option?
This command returns the address of the next occurrence of the given option in the config file, starting from the config node whose address is given. The address has to come from a config find_first command. If no option is specified, the search is for the next option having the same name as the one whose address is given.
config find_in address option
Return the address of the first occurrence of option within the config subtree headed by the address given. The address has to come from a config find_first, find_in, or find_next command.
config value_of address
Return the value of the config option whose address is specified as the argument. The address has to come from one of the config find_ commands.

metadata commands

metadata m -parse input_file
metadata m -create ?file_name?
Here m is a Tcl variable name. On output it is given a unique value that allows mq to keep track of it. This command returns 1 if the parsing was successful, 0 otherwise.
$m find_first element_name
Here m is a Tcl variable previously passed to the metadata command above, and element_name is a standard or extended element name. This command returns the address of a matching element in hex, or zero. Use the address in subsequent commands.
$m find_in address element_name
Here address is the value returned from the find_first subcommand above or a similar subcommand below. This subcommand returns the address of the first node at or below the given address that matches the target element_name.
$m insert ?-before | -after | -child? address element
The element specified as the last argument is inserted into the metadata record before, after, or as a child of the element whose address is given as the next-to-last argument. If no placement is specified, the new element is inserted as a child of the existing element. Note that mq does NOT verify that the FGDC standard allows the inserted element to appear in this place, nor does mq prevent you from putting elements in the wrong places.
$m delete address
The subtree whose address is specified is removed from the metadata record.
$m detach address
The subtree whose address is specified is detached from the metadata record. It can be subsequently reattached.
$m copy address
A copy is made of the element and all of its descendants at the address given. The copy is not attached to the metadata. Its address is returned by this function. That address can be used in the attach function to place the copy at another location or in another metadata record.
$m attach ?-before | -after | -child? address address
The final argument is the address of a previously detached subtree. The next-to-last argument is the address of an element in the current metadata record. The loose subtree will be attached to the current metadata record before, after, or as a child of the specified element. If the placement is not specified, the subtree is attached as a child of the specified element.
$m paste ?-before | -after | -child? address text-subtree
The final argument is the text (or XML) of a snippet of metadata. The next-to-last argument is the address of an element in the current metadata record. The text of the snippet will be parsed, then attached to the current metadata record before, after, or as a child of the specified element. If the placement is not specified, the subtree is attached as a child of the specified element.
$m prune address
The argument is the address of a metadata within the current parse tree. All empty branches of the subtree headed by this element will be removed. This has the same effect as the Prune choice of the Edit menu of Tkme.
$m value_of ?-nonewline | -list? address
If the address given matches a data element, this returns its value. If -list is not specified, the value is returned as a string. In this case, if you specify -nonewline, then the lines are combined into one, with each line of the input separated from the next by a single blank space. With neither -list nor -nonewline, each line is separated by the newline character (ASCII 10).
If -list is specified, the result is returned as a list of strings, where each such string contains the text in one line of the metadata record. Using this method, it should be easier to recognize groups of lines that begin with > and handle them differently than lines that do not. Note that blank lines in the input will appear as empty list elements.
$m value_set address text
If the address given matches a data element, its current value is deleted and replaced with the specified text.
$m contains address
This returns 1 if the address corresponds to one of the elements in the tree, 0 otherwise.
$m name address
This returns the name of the element at the given address.
$m is_compound address
If the element at the given address is compound, 1 is returned. If that element is not compound, 0 is returned.
$m next address
$m prev address
$m parent address
$m child address
These subcommands return the address of the next, previous, parent, or child node in the tree, relative to the given address. Zero is returned if there is no corresponding element. These provide a way for your Tcl program to walk through the tree.
$m line address
The line number in the input file where the element at the given address begins.
$m count ?address?
This subcommand returns the number of nodes in the tree that are at and below the address specified. If no address is given, it returns the total number of nodes in the tree.
$m write ?-format text | sgml | xml? file_name
The metadata record is written to the output file specified. If no format is specified and the output file name ends with .sgml or .sgm, the output file will be written as SGML. If no format is specified and the output file name ends with .xml, the output file will be written as XML. Otherwise the output file will be written as indented text.
$m forget
This command releases the memory used to store the metadata record. I cannot recommend using the Tcl command unset for this purpose because it seems to take a long time to complete.

element commands

element name | tag
returns 1 if recognized name, 2 if recognized tag, 0 if not recognized
element name | tag -name
returns long name of the element
element name | tag -tag
returns XML tag of the element
element name | tag -source
Tells where the element name or tag was defined, according to the following table:
Return valueMeaning
standardElement defined in the FGDC metadata standard (FGDC-STD-001-1998)
profileElement defined in a recognized profile of the FGDC standard. At this writing the Biological Data Profile is the only profile supported by these tools.
extensionElement defined in a file loaded using an extensions directive in the config file.
element name | tag -type
returns compound or data
element name | tag -parents
returns long names of elements within which the given element may occur
element name | tag -parenttags
returns XML tags of elements within which the given element may occur
element name | tag -children
returns long names of elements who can be hierarchically within the given element
element name | tag -childtags
returns XML tags of elements who can be hierarchically within the given element
version
returns the version of mq that is being run

Note that this interface allows you to read and manage more than one metadata record at a time. If you're going to read a lot of records, you will probably want to use the forget for this purpose command when you're done with each one.

Installation

You'll need Tcl/Tk installed on your system in order to make use of mq. You can download Tcl/Tk from the Tcl Developer Xchange, which currently provides the core distribution:

<http://www.tcl.tk/>
If you're running Microsoft Windows, you will download a file whose name is something like tcl846.exe (that's correct for the 8.4.6 version of Tcl/Tk; as Tcl version numbers go up, the file name will change accordingly). Execute this self-extracting package and remember where it installs itself. It will create a directory called Tcl in one of your existing directories like "Program Files". Within that Tcl directory are subdirectories bin and lib. To install mq, find the file mq26.dll from the metadata subdirectory /usgs/tools/bin and copy that dll into Tcl/bin. Then create a folder in Tcl/lib. Name the folder "mq" and copy into that folder the file pkgIndex.tcl which is also found in the metadata subdirectory /usgs/tools/bin. To test, open a Tcl shell and type the command package require mq. You should get a response that is the version number of the mq package, which was 2.6.11 at this writing.

Examples

  1. Replace all incorrect Fees element values with a reference to USGS product pricing page

Technical contact:

    Peter N. Schweitzer
    Mail Stop 954, National Center
    U.S. Geological Survey
    Reston, VA 20192

    Tel: (703) 648-6533
    FAX: (703) 648-6252
    Email: pschweitzer@usgs.gov