- Log in to the system
login: meta01
password:
If you're using the standard Bourne shell, the system prompt will be a
dollar sign, and if you're running the C shell or one of its variants, the
prompt will be a percent sign.
- Look at the files in your home directory
$ ls -laF
- Find the directory examples/kygs
$ cd examples/kygs
$ ls -laF
- Find the file hazard; this is Digital Geology of the Hazard 30´x 45´ Quadrangle, Kentucky
$ ls -laF hazard
- Determine whether cns is properly installed
$ cns
Chew and Spit, v 1.4 19970610
Usage: cns [-c config_file] [-i info_file] [-a aliases] [-e leftovers] [-o output_file] input_file
- Parse the metadata file with mp, displaying errors on the screen
$ mp hazard
Ouch! 1179 errors!
- The screen wraps the text and there are many messages. Redirect them
to a file so that you can examine them more closely with a text editor.
$ mp hazard -e hazard.err
- Look at the error file and the metadata using a text editor.
$ xed hazard.err hazard
In xed, the top line of the window is a "status line" that
shows you what line you're on, what column your cursor is in, whether
you're in Insert and Autoindent mode, the width that tabs will be
expanded, and the name of the file you're editing (a '+' before the name
means you've made changes to the file).
- Note that mp gives a lot of warnings about ambiguous
indentation along with messages like "element Originator found in textual
value, some information may be lost". Despite their being tagged as
warnings, these messages indicate serious problems with the file's
format and must not be ignored.
- Click the mouse on the status line to bring up the menus.
- Click File to open the File menu, and click Next file
twice to switch to the next file. The file hazard appears.
- The indentation in the file is inconsistent. For example, the text of
the abstract is flush with the left margin; it should be indented more
than the element name Abstract. Similar problems abound
throughout the file. However, the elements are generally in the right
order and the values appear to be reasonable.
- You could attempt to fix this file with a text editor. Knowing the
standard well, you could add indentation and, where needed, extra
container elements such as the Citation_Information that should
be inserted between lines 2 and 3. With an iterative process of editing
and running mp, you can amend the format of this file to allow
mp to analyze it more sensibly.
There is a better way.
- cns
was designed to help people rearrange files like this. Its job is to
guess the structure of the metadata from the occurrence of recognizable
elements in the file, setting aside those parts it cannot understand,
and providing a detailed record of its analysis.
- Run cns, specifying the input file, and output file names for
leftovers, information, and the cleaned-up metadata.
$ cns hazard -e leftovers -i info -o output
- Examine the input and output files
$ xed leftovers info output hazard
- Each line in the leftovers file has a number showing what
line of the input file hazard the information came from.
- Look for groups of lines. In this case we see lines 12 through 17
in the leftovers file. Switch to the info file, and for lines
12 through 17 we see the message "text could not be placed".
- Switch to the input file hazard and look at lines 12-17.
These look like standard metadata, but the elements shown here,
Principal_Investigator and Digital_Compilers, aren't
standard metadata elements.
- We'll assume that these elements are intended as extensions
of the FGDC standard, so we'll create a file that lets cns know
their names and where they should appear in the metadata. This
information is stored in a separate file that cns,
xtme, and mp all know how to read.
- We won't get into the structure of the extensions file in detail
here. You'll find the right file in the same directory. It is called
kygs.ext. Open it in xed by selecting Open
from the File menu, press Enter, and select kygs.ext
from the popup list.
- Notice that the extensions file also describes the extensions
Coverage_name and Coverage_description. Look at
the leftovers file again and find these element names. Which lines
of the input file will be properly recognized when these elements are made
known to cns?
- cns needs to be told where the extensions file is, and this
is done through a configuration file. Since these extensions
are the only unusual thing we need to tell cns, our configuration
file will contain only a reference to the extensions file. Open the file
kygs.cfg to see how this is specified.
- Close all of the files by repeatedly pressing F2 until the xed
window disappears.
- Run cns again, specifying the configuration file along with
the other files that were specified before.
$ cns -c kygs.cfg hazard -e leftovers -i info -o output
- Examine the input and output files
$ xed leftovers info output hazard
- Look at lines 21 and 70 in the input file. These lines contain the
element names Description and Bounding_Coordinates. The
other information contained on these lines is found elsewhere. In the
case of Description the extra text is the title of the data set.
With Bounding_Coordinates we see a hint probably used to help the
metadata author enter the proper values. The extraneous information on
these lines can be ignored, so don't worry that cns has discarded
it.
- The next group of lines in the leftovers file is 895-903. These begin
with what looks like an element name, Horizontal Coordinate System.
Check the
Alphabetical List of Compound Elements and Data Elements, part of the
FGDC metadata standard, to determine that this is not one of the elements.
It has been misspelled. But the lines that follow it contain elements
that should be part of the element Grid_Coordinate_System.
Let's assume that the metadata author has misspelled the element
Grid_Coordinate_System as Horizontal Coordinate System: grid.
Does cns have a way to handle misspelled elements? Of course!
- Open the file alias. On each line, the first word is the
correctly-spelled name of a standard element. Following that word are
one or more spaces and some text that will be recognized as an
alias of that element. You'll see that I found nine misspelled
elements in the file hazard.
- Go to line 4 of the file alias. What element name spelling
will this line correct? Now go to line 1016 of the file output.
You can see that because the name of a major section of the metadata was
misspelled, all of the elements of that section have been considered by
cns to be part of the value of the element Ellipsoid_Name.
This is not what the metadata producer intended. With the alias
list, cns will properly recognize this section. Let's try it.
- Close xed by repeatedly pressing F2.
- Since cns is the only program that understands aliases, the
alias file is not specified in the configuration file but is named on the
command line with the -a switch.
Run cns yet again, specifying the alias file and the
configuration file along with the other files that were specified before.
$ cns -c kygs.cfg hazard -a alias -e leftovers -i info -o output
- Examine the input and output files
$ xed leftovers info output hazard
- Note that the leftovers file now contains only two lines, 21 and 70,
which we have already determined that we don't need.
- Scan through the file output and notice that it seems to be
in good order. Time to try mp again!
- Run mp again, specifying the name of the configuration file,
which lets mp know about the extensions.
$ mp output -c kygs.cfg -e hazard.err
- Examine the file hazard.err
$ xed hazard.err
We're down to 36 errors, mostly missing, empty, or improper values.
Not bad, coming from 1179. And we didn't have to modify the actual input
file at all. The number can be further reduced using the prune
function of xtme, but that's part of the next session.
- This completes the exercise.