MP: A compiler for formal metadata

Schweitzer, Peter N. 1995 MP: A compiler for formal metadata 2.9.50 computer program Reston, Virginia U.S. Geological Survey https://geology.usgs.gov/tools/metadata/ MP is a program for validating the syntactical structure of formal metadata, testing the structure against the Content Standards for Digital Geospatial Metadata devised by the Federal Geographic Data Committee (FGDC). MP is described as a compiler because it contains not only a lexical parser but also code to analyze the tree that the parser generates, and code to output the metadata in several different formats. It is written in Standard C (i.e. ANSI C) and runs on Linux, UNIX, and all versions of Microsoft Windows (95 and later, including Windows 10). MP generates a textual report indicating errors in the metadata, primarily in the structure but also in the values of some of the scalar elements (i.e. those whose values are restricted by the standard). Output formats include text (the same as the input format), Hypertext Markup Language (HTML), Standard Generalized Markup Language (SGML), Extensible Markup Language (XML) and Directory Interchange Format (DIF). MP has the ability to recognize and process elements that are not part of the FGDC standard, provided these elements are properly described in a local file. Throughout the 1980s and early 1990s, the improving capability of desktop computers to carry out complex analyses has increased the popularity of geographic information systems (GIS). As they became familiar with GIS technology, people at all levels of government, in industry, and in academia have been calling for better access to publically-available geospatial information and more general use of standard terms of reference and of standard formats for the exchange of geospatial data and information. Answering this need is the goal of the National Spatial Data Infrastructure (NSDI), a government-wide coordination effort initiated at the Federal level through Executive Order 12906, which was signed by President Clinton in April of 1994. A key component of NSDI is the development of a National Geospatial Data Clearinghouse, a general source of information about geospatial data that are available to the public. With the Clearinghouse a user can determine whether geospatial data on a region of interest exist and are appropriate for solving the problem at hand. The Clearinghouse is a distributed network of internet sites providing metadata (information about geospatial data) to users in approximately the same way. Its success depends on the overall consistency of the metadata that are made available, because users are expected to evaluate metadata from numerous sources in order to determine which data meet their needs. To promote consistency in metadata, the Federal Geographic Data Committee (FGDC), an interagency council charged with coordinating the Federal implementation of NSDI, has produced the Content Standards for Digital Geospatial Metadata (CSDGM). That document provides standard terms describing elements common to most geospatial data, and encourages people who document geospatial data sets to use these terms. The Content Standards for Digital Geospatial Metadata (hereafter referred to simply as "the standard") describes not only the terms of reference but also specifies the relationships among those terms. The relationships, many of which are hierarchical, are complex and a formal syntax is provided to specify them. Because the syntax of the standard is complex and the number of descriptive elements is fairly large (341), creating metadata that conform to the standard is not an easy task. In addition to the problem of assembling the information needed to properly describe the subject data sets, data producers must arrange that information using the terms given in the standard and arrange the terms using the syntactical rules given in the standard. The resulting metadata are formally structured and use standard terms of reference, hence the term "formal metadata" in the title of this report. It is important to be able to say with confidence that metadata conform to the structure of the standard. Human review is still required--no software can determine whether metadata are accurate--but human review of the content is easier if the syntactical structure is determined to be correct. Tools whose purpose is to facilitate the creation of metadata cannot generally be said to conform to the standard, since they typically allow, but do not encourage, the user to create incomplete metadata or to enter improper values. Conformance testing should be carried out on the output of metadata generation tools rather than on the tools themselves. MP is designed to carryout conformance testing on formal metadata of this type. 1. Command line options MP is started by issuing a command to the operating system. Its behavior is controlled both by switches on the command line and through a configuration file. Command-line switches are as follows: > option resulting action > -------------------------------------------------------- > -l code use language indicated by code (en,es,id,fr) > -c cfile obtain configuration information from cfile > -e efile direct syntax errors to efile > -t tfile create text output in tfile > -h hfile create html output in hfile > -f ffile create FAQ-style html in ffile > -s sfile create sgml output in sfile > -d dfile create DIF output in dfile The name of the input file that contains formal metadata is given on the command line but is not preceded by a switch. The input file name must not begin with a hyphen. Example: > mp -c config.dat foo -e foo.err -t foo.txt -h foo.html -f foo.faq.html -s foo.sgml -d foo.dif This command causes mp to read options from the file config.dat and metadata from the file foo. Errors are directed to the file foo.err, and output is directed to the following files: > foo.txt ASCII, using the input format > foo.html Hypertext Markup Language > foo.faq.html FAQ-style Hypertext Markup Language > foo.sgml Standard Generalized Markup Language > foo.xml Extensible Markup Language > foo.dif Directory Interchange Format Names of output files may be omitted if the corresponding information is included in the configuration file, which is described in section 3 below. Errors are printed to the error file or, if none was specified, to the standard error device (stderr), which is usually the console or the terminal from which the program was launched. Error messages and warnings refer to the line in the input metadata file by number (i.e. "Error (line 1):" refers to the first line in the input file). Warnings are issued when conditions that are considered unusual are detected; errors indicate a condition that runs counter to the dictates of the FGDC standard. 2. Input file format Since the FGDC Content Standards for Digital Geospatial Metadata, as the name implies, specifies only the contents of metadata files and not their encoding, it is necessary to choose or devise a specification for metadata encoding in order to create formal metadata. The encoding format interpreted by this compiler is purely textual. It describes the hierarchical structure of the metadata using indentation, in which the members of compound elements are indented more than their parent element and all of the elements at the same level in the hierarchy are indented alike. The full hierarchy is specified, even elements like Contact_Information that exist only to contain other elements and would not normally be needed by a human reader. The format is designed to be both parseable and human- readable, with a minimum of unnecessary jargon and code, but the requirement to be parseable makes it a lot more structured than ordinary text. Precise rules governing the format of the input file are as follows: Terms: tab := ASCII 9 space := ASCII 32 element name := A sequence of bytes consisting of alphanumeric characters, the underscore, hyphen, and forward slash. This sequence is one of the formal names given to metadata elements in the standard. Examples: >Citation >Identification_Information >Data_Set_G-Polygon_Outer_G-Ring >Range_of_Dates/Times value := A text string associated with an element by the originator of the metadata. Arrangement: a. Metadata files are plain text without markup. Non-ASCII character may be used and will be passed through to the output files but in the HTML output these may be interpreted as having ISO-8859-1 encoding by subsequent software. b. The number of characters per line is not limited. c. Indentation is accomplished using tabs, spaces, or a combination of the two, but for purposes of determining indentation level, one tab equals one space. d. Blank lines may occur anywhere in the file. e. Element names are spelled out in the metadata file exactly as in the syntax rules of the metadata content standard. Where the descriptive portion of the standard differs from the syntactical rules, the syntactical rules are regarded as authoritative, e.g. Attribute_Units_of_Measure is correct. f. A single colon or equal sign may follow each element name but is not required. g. Whitespace may occur between element name and colon or equal sign, and may occur after the colon or equal sign. h. Values are associated with an element in one of three ways: (1) The value begins at the first nonblank following the element name (or following colon or equal sign) and extends to the end of the line. (2) The value begins on the line following the element name. It is indented more than the element name, i.e. there are more spaces or tabs preceding the value than precede the element name. (3) The value begins on the line containing the element name. It extends onto subsequent lines, where it is indented more than the element name, i.e. there are more spaces or tabs preceding the value on lines following the element name than precede the element name. i. Values of compound types occur on successive lines using the same degree of indentation. Example: > Citation_Information: > Originator: > Publication_Date: > Publication_Time: > Title: > Type_of_Map: > Serial_Information: > Serial_Name: > Issue_Identification: > Publication_Information: > Publication_Place: > Publisher: > Other_Citation_Details: > Online_Linkage: > Larger_Work_Citation: 3. Configuration file format Described in https://geology.usgs.gov/tools/metadata/tools/doc/config.html 4. Local extension file format The local extension file is encoded like the metadata; hierarchical structure is indicated using indentation. >local: > name <element name> > sgml <tag name> > parent <element name> > child <element name> Compound types: > local = name + (sgml) + 1{parent}n + 1{child}n Scalar types: name The name of the element as it will appear in the metadata. This must consist only of upper- and lower-case letters, numbers, and the underscore character. sgml The element tag to be used for output in SGML. If omitted, the full name is used. parent The name of the element under which this element may appear. These names may be standard elements or local extensions. child The name of an element which may appear under this element. 1995 initial release date; later revision is ongoing and continuous Complete As needed None FGDC Metadata Software none none Peter N. Schweitzer mailing address

Mail Stop 954 National Center U.S. Geological Survey 12201 Sunrise Valley Drive

Reston VA 20192 USA (703) 648-6533 (703) 648-6252 pschweitzer@usgs.gov https://geo-nsdi.er.usgs.gov/mp.gif Diagram showing the processing of a fairly complex metadata record from the editing process (on the left) through formal parsing, syntactical analysis, and generation of human- and computer-readable reports (on the right). Boxes indicate files; their labels show the format of those files. Arrows indicate the processing done by specific computer programs. GIF Spanish-language element names kindly provided by Dr. Ing. Carlos López of the Clearinghouse Nacional de Datos Geográficos, Uruguay <http://www.clearinghouse.gub.uy/> Indonesian-language element names kindly provided by the Indonesian National Coordination Agency for Surveys and Mapping BAKOSURTANAL French-language element names kindly provided by Environment Canada (John Cree) German-language element names kindly provided by Peter Korduan (University of Rostock) Portuguese-language element names kindly provided by Luis Cavalcanti (Bahiana Pesquisador em Informações Geográficas, IBGE- Coordenação de Geografia Brazil) The software has been developed using UNIX systems of various versions and suppliers. Versions for Microsoft Windows have been compiled using GNU gcc that comes with the MinGW tools. Federal Geographic Data Committee 1994 Content Standards for Digital Geospatial Metadata <https://geology.usgs.gov/tools/metadata/standard/metadata.html> Federal Geographic Data Committee 1998 Content Standard for Digital Geospatial Metadata <https://www.fgdc.gov/metadata/csdgm/> Source code and make description files are included. Where discrepancies exist between the syntactical and semantic descriptions of elements in the Content Standards for Digital Geospatial Metadata, the syntactical descriptions are regarded as authoritative. The compiler embodies the syntax of the 1998 version of the Content Standard for Digital Geospatial Metadata, known as FGDC-STD-001-1998 and also supports the following formal profiles of that standard: >Biological Data Profile FGDC-STD-001.1-1999 >Shoreline Data Profile FGDC-STD-001.2-2001 >Remote-sensing Extensions FGDC-STD-012-2002 In addition the following extension is included by default: >Extension_Information: > Extended_Element_Name: Metadata_Language > Short_Name: metalang > Parent: Metadata_Reference_Information Local extensions to the standard are permitted and a mechanism is provided that allows these extensions to be described to the compiler. Federal Geographic Data Committee 19940608 Content Standards for Digital Geospatial Metadata https://www.fgdc.gov/ paper and digital text 19940608 approval date CSDGM The element names, syntactical rules governing their use, and the domains and encoding of certain values are given in this source. Federal Geographic Data Committee 1998 Content Standard for Digital Geospatial Metadata https://www.fgdc.gov/metadata/csdgm/ paper and digital text 1998 approval date FGDC-STD-001-1998 A revision of the 1994 version of the metadata standard. Specifies modifications in the production rules and adds a few new elements. Biological Data Working Group, FGDC USGS Biological Resources Division 1999 Content Standard for Digital Geospatial Metadata--Biological Data Profile, FGDC-STD-001.1-1999 Washington, D.C. Federal Geographic Data Committee https://www.fgdc.gov/standards/documents/standards/biodata/biodatap.html paper and digital text 1999 approval date FGDC-STD-001.1-1999 A registered profile of the 1998 version of the metadata standard. Specifies modifications in the production rules and adds new elements to improve consistency of documentation of biological data. MP was used to process metadata for the opening of the USGS node of NSDI. CSDGM 19950123 Modified html.c to cope better with titles that span more than one line. The separate lines of the title are concatenated for both the document title and the top-level heading. 19950222 Modified html.c to always output two spaces after the </em> tag in a keyword. 19950223 Modified sgml.c to use eight-character tags that are different from the ASTM tags. The ASTM tags cannot be used in sgml because many are ten characters long, and SGML names are restricted to eight characters. Created a new module astm.c containing the ASTM tags, Z39. 50 numbers, and the code to go between those and FGDC keywords. This module is not currently linked into the executable. 19950320 Modified astm.c to include tags for map projection names taken from Doug Nebert's DTD: > ALBERSCEA > AZIMUTHAL > EQUIDISTC > EQUIRECT > GENERALVNP > GNOMONIC > LAMBERTAZ > LAMBERTCC > MERCATOR > MILLER > MODSALASKA > OBMERCATOR > ORTHO > POLYCONIC > PSTEREO > ROBINSON > SINUSOIDAL > SPOBLMERC > STEREO > TM > VANDERG Excised sgml tags from sgml.c; transferred the eight- character tags to a new file called ps8.c. Choice of tags to use in generating SGML output is governed by a local variable called do_astm in sgml.c; currently this is always set to 1, causing the astm tags to be used. 19950403 Modified syntax.c to issue warning rather than error when a scalar value is missing, and to include text of unknown keywords in error messages. 19950425 Modified text.c to not output the error message about too much indentation. The code already functions correctly for files that are indented consistently within sections. 19950526 Modified text.c to output a warning if the indentation is ambiguous, as in >A: > B: > C: > D: Here the parser assumes that D is a member of A, not of B. But the user might have intended D to be a member of B. The parser cannot tell, so it issues a warning. 19950526 Modified mp.c to add functions for node handling: > void deallocate_item (struct item *p); > struct item *insert_item_after (struct item *r); > struct item *insert_item_before (struct item *q); > struct item *add_child (struct item *p); and added comments explaining these functions. Added code to main in mp.c to insert a parent Metadata node if one is not already present. This causes the syntax checker to report missing major sections that are required. 19950531 Created module extend.c to handle lookup and translation of local element names (i.e. element names not part of the FGDC standard). Modified keyword.c to allow the functions key_of and text_of to return valid data when extensions are given as their arguments. Modified astm.c and ps8.c to report as the sgml tag of an extension the tag name returned from text_of_extension() in extend.c. This is not entirely satisfactory, because extensions might be structured like FGDC keywords, i.e. a long form for textual reports and a short (8- or 10-character) tag name for sgml. 19950610 Added functions in html.c to handle translation of reserved characters in html output. This converts <URL:theURL> in the input to <a href="theURL"><tt><URL:theURL></tt></a> in the output, thus activating URLs embedded in the metadata. It also converts other occurrences of < and > to < and > respectively. This is not necessarily desirable in all cases; there is an internal variable called do_text_translation that controls whether or not this gets done. If do_text_translation is set to 0, textual values are conveyed to the html file as they appear in the input file. If do_text_translation is nonzero, textual values are translated as described. 19950622 Added code to text.c, syntax.c, html.c, and sgml.c to handle blank lines. Modified item structure to include a prev pointer within compound types. 19950623 Modified html.c to correctly skip the Metadata tag at the top of the tree. Added code to correctly handle ampersands and double-quotes. 19950626 Modified text.c to handle blank lines in a more logical and comprehensive fashion. Blank lines are now assigned indentation prior to the disruption of links that forms the overall parse tree. Indentation assigned to a blank line is the larger of the indentation of the previous non- blank element or the next non-blank element in the file. This ensures that blank lines will not have children (if they did, the children would not appear in the output) and only occur as members of lists. 19950626 Created config.c and config.h to handle configuration issues through a configuration file, containing key words in the same general form of the metadata. Modified mp.c, text.c, html.c, and sgml.c to consult the information contained in the config file. 19950627 Created local.c to replace extend.c for handling local extensions to the standard. This module provides a mechanism for user-specified element names, with corresponding SGML tags, to be handled properly by the parser and by the code generators. Syntax checking is relatively primitive, and is based a parent list and child list associated with each local element. Essentially this means that local extensions are always optional and repeatable and their children, if any, are always optional and repeatable. 19950630 Modified sgml.c and html.c and config.c to allow users to specify a string that will be output wherever blank lines occur in the input file. The default for SGML is ""; the default for HTML is "<P>\n". This option is specified by putting the keyword blanks under the keyword sgml or html under output in the configuration file. 19950630 Modified ps8.c to reflect the following tag name changes: > tempkeyt -> tempkt > accscons -> accconst > accsinst -> accinstr > columns -> colcount > vertcnt -> vrtcount Rebuilt makedtd and rebuilt ps8.dtd. 19950711 Modified config.c to allow the keyword skip_extensions under output:sgml. Modified sgml.c to look for this keyword in the configuration info. If present, then elements that are not part of the 19940608 CSDGM will not be included in the sgml output. 19950714 Modified sgml.c to make 8-character tags the default, as specified in the GEO attribute set for Z39.50. 19950926 Modified mp.c to remove the generic tree-handling routines to tree.c, which was added to the Makefiles. 19951101 Modified syntax.c to permit more than one Entity_and_attribute_Overview in an Overview_Description. Thanks to Chuck Stein for pointing out this bug. 19960201 Modified config.c to correctly recognize the component "top_level" of "text"; this controls whether the top-level Metadata element is preserved on output or omitted. 19960215 Modified html.c and config.c to allow new syntax for specifying the prefix and suffix tags of element names and element values in html. The new syntax is >output > html > element > name > prefix <html code> > suffix <html code> > value > prefix <html code> > suffix <html code> This allows specific html code to precede and follow both the element name and the element value. Created a new file called deluxe.cfg in doc that shows how this new syntax can be used to link every element name back to the correct section of my hypertext rendition of the standard. 19960328 Modified astm.c to remove the information about z39.50 numbers. Added the module z3950.c to carry this information appropriately, and modified local.c to allow users to include in the description of extensions a characteristic z3950 whose value is the numeric tag assigned to the element. 19960405 Modified html.c to fix bug in which default prefix for element names was taken to be "prefix", and default suffix "suffix". Thanks to Hugh Phillips for pointing this out. 19960521 When a line in a text value was indented relative to those preceding it, the line was correctly recognized by the parser as being a part of the textual value. But when such an indented line immediately followed a blank line, the indented line was considered a sibling of the blank and a child of the preceding text line. This caused the line to be omitted from the output. Not good. The fix is to modify check_unknown in syntax.c so that this case is recognized and the topology is rearranged to fit the situation. I'm not sure whether this is the right place to fix this problem. I think we could also fix it in the parser, by assigning the indent value of lines that follow blanks differently if the preceding nonblank line has key Wunknown. This would require more sophisticated look-back at that point in the parser, however, and I don't know whether it would correctly handle more complicated situations. 19960623 The code described in the previous process step has been excised. Instead, the function equalize_indented_scalars was added to try to fix up textual values that have indentation. The basic problem is that you don't want blank lines to be parents of text values, nor for that matter do you want text values to be parents of text values. This code is not comprehensive, and probably needs more work. 19960629 Modified main() in xtme.c, mp.c, and cns.c to read more than one local extensions file. This should enable people to choose more carefully which extensions will apply to a given input file. 19960705 Modified equalize_indented_scalars() in text.c to avoid writing into p->next->prev when p->next is NULL. Modified sgml.c to avoid warning on some systems casting char * to unsigned char *. 19960719 Modified all source files containing #ifdef's so that these preprocessor directives occur only at the beginnings of lines, to work with non-standard compilers such as the one supplied with OSF/1. The ANSI standard came out in 1987. Haven't these vendors *read* that document by now? 19960806 Modified html.c to remove the default formatting of elements. These should be controlled by a configuration file. Also modified this file so that it correctly recognizes when no value is given as the default prefix or suffix of element names. 19960806 Modified config.c to recognize and html.c to use the configuration element output:html:element:value:obeylines which, when present, causes html.c to append a line break <br> to each line of the value of the specified element. 19960806 Modified text.c to output {single scalar followed by blank} the same way as {single scalar}; immediately following the element name. Modified syntax.c to strip enveloping quotes from scalars that have restricted domains. 19960809 Modified syntax.c to limit the number of Enumerated_Domain to one per Attribute_Domain_Values. This bug spotted by Gerry Daumiller. 19960826 Modified syntax.c to not test blank lines in scalar values. 19960906 Modified sgml.c to include a rudimentary SGML parser. It has essentially no flexibility and no error recovery. Modified mp.c to use this SGML parser if the input file's name ends with .sgml or .sgm (case not sensitive). 19960907 Modified sgml.c to swat bug. If a configuration file was used and the output:sgml:blanks was also used, a seg fault could occur. 19960918 Modified config.c to recognize options output:html:header and footer. Modified html.c to output text of header before the title in the body of the html, and to output text of footer after the "generated by mp" line at the end of the html. 19960926 Modified dif.c to output a newline where Wblank appears in the parse tree rather than "(blank):" 19960927 Modified syntax.c to classify errors and maintain counts of six different kinds of errors: unrecognized elements, misplaced elements, missing elements, superfluous elements, empty elements, and elements with the wrong sort of values. Modified mp.c to write a one-line report to stderr showing the number of each type of error. Fixed a bug in syntax.c that caused the less-than-intuitive error message "(unknown) is not permitted in <element>". The new text is more informative. 19960930 Modified config.c to fix bug in unify_strings() where allocated block dst was not initialized before a call to strcat (crashed on Linux). Modified sgml.c to output Eric's suggested public identifier for DTD 1.0. 19961017 Modified config.c to understand element 'body' in output:html; the text given for 'body' will be appended to the <body> tag in the html output, allowing the user to specify background color for the html. 19961028 Modified keyword.h to define the values of enum fgdc_keyword, so that binary files will have more of a chance at portability. Modified config.c to recognize the element "binary" in the config file. Modified mp.c to read and write binary files on request. Added module binary.c to carry out encoding and decoding of binary files. 19970108 Modified decode_tree() in text.c to allow scalar elements to begin on the line that contains the element name. 19970130 Modified write_contact_info() in dif.c to use more stack space for lname, mname, and fname (was 32 bytes in each case). 19970317 Modified write_html() in html.c to allocate more scratch space for the title. This caused rare crashes that could not be easily anticipated or duplicated because they depended on the granularity of heap space, on the number of lines in the title, and on the lengths of the title lines. 19970401 Modified parse_sgml() in sgml.c to not aggregate lines in the input. Modified write_html_item() in html.c to use <dd> for single-item data values of length 64 or greater. Previously these would be put into the <dd> tag. Modified translate() in html.c to use a managed dynamic buffer rather than a static array. 19970507 Modified parse_sgml() in sgml.c to fix bug in which unrecognized SGML tags were not processed properly. Changed parse_sgml() to permit execution to proceed when unrecognized tags occur. Added code to decode ISO 8859-1 entities in incoming SGML text. 19970519 Modified print_item() in tree.c to fix bug--was missing the variable to be printed in one of the printf calls. 19970610 Modified parse_sgml() in sgml.c to not output the warning about extraneous text if the extra text is entirely composed of whitespace. 19970708 If you specified the html prefix or suffix for an element value but not for the element name, it didn't use the default element name prefix and suffix, it used nothing. In this change html.c was modified to use the default name prefix or suffix if the output:html:element:name: prefix or :suffix are not specified. If ...:name:prefix is specified but is blank, then the default name prefix is not used. This allows you to specify that no prefix be used even if there is a default prefix (same for suffix). 19970821 Modified keyword.c, config.c, and local.c to avoid overflowing the local variable string when trying to recognize an element name. 19970912 Modified config.c to recognize preformat and meta elements under output:html. preformat causes <pre> tags to enclose sections of textual values whose lines all begin with >. To use a different character, give it as the value of preformat. It is on by default, so specify "preformat off" to disable this feature. meta enables the generation of dublin-core meta elements in html output. It is on by default. To disable dublin-core metadata, use "meta off". 19971031 Modified write_html_item() in html.c to simplify code and properly preformat all sections preceded by >. 19971106 Modified write_html_item() in html.c to not display the preformat indicator character in preformatted sections. 19971112 Modified key_of_ps8_tag() in ps8.c and key_of_astm_tag() in astm.c to call a new function extension_of_sgml() in local.c that returns the key of a local extension. This causes mp and stomp to recognize extensions in sgml input as well as in text input, using the same mechanism. Thanks to Lisa Peoples for helping to find this bug. 19980114 Modified html.c to translate bare URLs into live links. This recognizes ftp://url and http://url as forms of urls as well as the older format <URL:theURL>. Note that <http://theURL> will now become live as well. 19980217 Fixed minor bug in text.c. If a textual value began on the same line as the element name but that line was followed by a blank, it always discarded the text following the element name. Workaround was to close up the blank line. It now looks foward in the list of elements to the next non-blank element; if that is Wunknown or EOF, the text following the element name is retained as part of the value of the element. If the next non-blank line is a recognized element, the text following the original element name is discarded with a warning. 19980220 Reordered the Process_Step elements in this document to be monotonic with time. 19980225 Modified html.c to fix bug (found by Susan Stitt) in which the order of options under output:html was significant (the first of {preformat, meta, translate} was interpreted properly, the others ignored). 19980311 Modified dif.c to conform to version 6 (19980202) of the Directory Interchange Format. Thanks to Lynn Halpern (STX) for prompting and assisting in this upgrade. 19980316 Modified write_html() in html.c to place the name tag around only the <hr> at the top of each major section. The concern was that poor parsers of HTML might not properly match the </A> with the most recent <A ...> tag, which would result in an error where links appear within major sections. Thanks to Curtis Price of USGS WRD for pointing this out. 19980501 Modified several points within dif.c to fix bugs that caused it to crash when generating dif records for the examples. 19980624 Modified syntax.c to use CSDGM version 2. Added upgrade() in upgrade.c to automatically restructure those portions of the metadata that need to change to fit version 2. Added version strings in revision.c, declared in revision.h. Set version number to 2.0 for mp, cns, xtme, tkme, and stomp. FGDC-STD-001-1998 19980826 Modified text.c to test whether top_level has an argument rather than simply to use the value in a stricmp, which caused core dump when the value was not present. Thanks to John Heuer for pointing this out. 19980909 Renamed metadata.dtd as csdgm1.dtd, copied into file csdgm2.dtd. Modified csdgm2.dtd to reflect syntactical structure of version 2 of the CSDGM. Added a file "catalog" to relate public identifiers to the SGML files included with the software, tested with nsgmls on one metadata record. Updated sgml.c to produce a DOCTYPE that refers to the version 2 DTD. 19980917 Modified upgrade.c to print out slightly more verbose informational messages when adding elements to upgrade the metadata to CSDGM v2. 19980928 Created xml.c from sgml.c; modified mp.c to read and write XML using code in xml.c. 19980928 Modified sgml.c and xml.c to print newlines for blanks within text values by default, but only when the blank lines occur within the body of the text value, not when they occur at the end of the value, and not when a blank line occurs between elements. 19980929 Modified syntax.c to correctly flag errors in Multiple Dates/Times (was looking at number of Calendar_Date within this element rather than number of Single_Date/Time within this element). Thanks to Kerie Hitt for pointing this out. 19980930 Modified mp.c and cns.c to not assign variable "out" to stderr until runtime; in cygwin32 stderr is not a constant. 19981019 Modified syntax.c to fix bug in code that reports errors in composition of Digital_Transfer_Option. Thanks to Kerie Hitt and Curtis Price for finding this bug. 19981020 Modified find_key() in tree.c to search only the given node and its children, not its siblings. 19981027 Created a new module fixdoc.c designed specifically to fix some specific problems created by DOCUMENT. Used in conjunction with a set of local extensions, assuming also that cns has been run on the output of DOCUMENT FILE using the same local extensions (and some aliases). This is a fairly complicated procedure but one that I hope will save people from a lot of aggravation as they try to recover the information they entered using DOCUMENT. 19981027 Modified local.c to properly handle cases where more than 64 extensions are added (bug fix--thanks to Matthew Skala for finding this). This bug affected mp, cns, xtme, and tkme, so I have incremented the versions of all of these to 2.3. 19981119 Modified syntax.c to allow free text in Address_Type as per CSDGM v2. Thanks to Matthew Skala for pointing this problem out. 19981121 Modified upgrade() in upgrade.c to avoid following bad pointers when decoding G-Ring polygon data. This bug arose when users ran mp on the template, which has no data. 19981124 Modified config.c to include option "prune" within "input". This causes mp to prune the whole tree after fixdoc and upgrade. Not that anyone should do prune in combination with fixdoc, but that's where it is in the code. Default is not to prune. 19981130 Modified mp.c to strip common extensions off the input file name when composing output file names from templates specified in the config file. This means that if you specified in your config file: >output > errors %s.err > html > file %s.html and process the input file "stuff.txt", the errors will be put into a file called "stuff.err" and the html output will be put into a file called "stuff.html". The file option can be specified for html, sgml, xml, text, and dif output, and has the same effect in each case. The extensions that will be stripped from the input file name are currently only the following, upper or lower case: >.txt >.sgml >.sgm >.xml >.text >.met >.bin 19981207 Modified config.c to recognize the keyword "ext" under "input". That element, which may be repeated, is used to indicate the file extension used for the input metadata file. This allows the user to expand the set of file extensions beyond the default set mentioned in the previous process step. 19981208 Modified translated() in html.c to recognize a right parenthesis as the end of a URL. 19981210 Modified xml.c to output the entire XML declaration in lower case. Thanks to Joel Register for suggesting this. 19981214 Modified html.c to add a new function write_html_faq(), which like the function write_html(), writes an HTML output file. The "faq" version casts the metadata in the form of a FAQ list. FAQ here stands for Frequently Anticipated Questions rather than Frequently Asked Questions, since we have no way to know whether anybody will actually ask these questions, but at least we think we can answer them! ;-) At this writing (19990217) it works on all the examples but doesn't handle more than a few of the questions, and so leaves out quite a bit of the metadata. Expect changes soon. 19990217 Modified sgml.c to recognize the entity "&break;" while parsing sgml. When it finds this entity, it will add a Wblank element after the text in which the entity was found is converted into a Wunknown. So if the line contains nothing beyond the "&break;", the Wblank will be inserted where the "&break;" is. If any text follows the "&break;" on the same line, the Wblank will be inserted after that text. This allows people who are using sgml as input to specify where blank lines should occur within text values, since in sgml blank lines are normally ignored. 19990225 Completed function write_html_faq() in html.c to generate plain-language output in HTML format. This output format will likely be refined in the near future, but the basic idea is to write the metadata into an HTML file according to a series of plain-language questions. 19990303 Modified html.c to correct a brace nesting problem near line 1879. 19990308 Fixed minor bug in write_html_faq() in which the internal links to labels how.1 and how.2 were not coded with the #, making them external instead. 19990316 Modified write_html_faq() to output the Format_Information_Content using write_html_value() rather than simply munge(). Modified write_citation() to include the Online_Linkages, if present, in an unordered list. 19990316 Modified many of the C source files so that they do not explicitly declare the functions stricmp and memicmp, but instead include a new header file stricmp.h that declares these functions. The header file stricmp.h wraps the declarations in conditional code. If the system on which you want to compile already has the function stricmp, for example, then define the symbol HAVE_STRICMP in the CFLAGS of the makefile. Likewise if your system already has memicmp, define HAVE_MEMICMP. Also I modified the declaration of memicmp to match that given in the mingw32 headers; specifically I made its pointer arguments to be of type const void * and its integer argument to be size_t instead. This was all motivated by my discovery that I could compile using the cygwin tools on Windows NT but specify -mno_cygwin and thereby not need to pack the cygwin1.dll into the distribution. 19990329 Modified upgrade.c so that when it changes the value of Metadata_Standard_Version to FGDC-STD-001-1998, it also deletes any additional children of Metadata_Standard_Version. This occurred when people had multiline responses for Metadata_Standard_Version, which is caused by wrapping the element's text to the next line. Thanks to Susan Stitt for pointing this out. 19990401 Modified translate() in html.c to encode high-bit characters using the long names (for example é rather than the character 0xE9. Added a META tag at the beginning of the HEAD element to specify encoding using ISO-8859-1. I know these two actions are probably contradictory, but maybe it won't hurt, and anyway we can always undo the first change and stick the character codes in as is. 19990401 Modified parse_name in html.c to cancel the parse if the first name is the string "U.S." Modified some of the questions output in FAQ-style HTML for clarity. 19990402 Modified write_faq_html() in html.c to handle the various ways in which an attribute's Enumerated_Domain elements may be nested within its Attribute_Domain_Values elements. The new method puts them all into a single table (one per attribute). Each Range_Domain, Codeset_Domain, or Unrepresentable_Domain gets its own table. 19990402 Modified write_html_faq() in html.c to print Non-digital_Form info when that is part of the Standard_Order_Process. Modified the question for which Logical_Consistency_Report is the answer to make it more compatible with topological information. Modified main() in mp.c to remember the names of the output files. Each output file is stored along with its type in a private structure. Modified html.c to output below the data set title a line indicating the other formats in which the metadata are available. Modified write_html_faq() in html.c to output dates in a friendlier format if it is possible to do so. 19990405 Modified config.c to accept keyword dif under output. This was a long standing oversight. Modified config.c to accept keyword base under output:html. The value given to this is a URL that will be put into a <base href="url"> tag near the top of the head element in html output. This causes relative links to work when the html document is retrieved through the clearinghouse (otherwise those links are relative to the clearinghouse gateway, usually not what you want). Modified the links to alternative formats given in the html files; these should now work as intended. 19990406 Modified html.c to include the output file name in the BASE HREF. 19990407 Fixed several missing-brace errors in write_html_faq() and write_dif(). These caused seg faults when running the template, because the template has no actual data. 19990414 Modified upgrade.c to account for the possibility that someone might have more than one SDTS_Terms_Description, and within each, more than one SDTS_Point_and_Vector_Information. 19990420 Modified text.c to make some static procedures public. This enables Doug Bakewell's MetaMerge program to be more easily compiled with the distributed source code of mp. 19990427 Modified write_date() in html.c to output date as is whenever the first character of the value is not a digit. Modified parse_name() in html.c to output the name as is when the first character is '<'. These changes made it possible to run a modified template through mp to show how the elements are used to compose the FAQ-style output. 19990427 Modified write_date() in html.c to correctly output dates where only the year is specified. 19990430 Modified a variety of routines to remove unused variables and avoid potential use of uninitialized variables pointed out by gcc -Wall. 19990430 Modified xml.c to include an EncodingDecl in the XML declaration. Modified entities.c to supply a numeric version of ISO-8859-1 character encodings, so that it will be possible to automatically output   instead of   in xml output. 19990506 Modified translate() in html.c to recognize "mailto:address" and process it the same way as it does "http://" and "ftp://", making the link a live hypertext link. (2.4.11) 19990524 Modified write_text() and write_text_item() in text.c to output an element prefix if one is specified in the config file. So if the config file contains output:text:prefix, then its contents will be output immediately preceding every element name in the text output. This would be helpful if a user wanted to allow an unsophisticated person edit the text file before re-processing with mp. The trick is that if you put a strange prefix like @@ before each element name, then you can run cns on the edited file, and this makes it easier for cns to distinguish an element name from a text value that starts with one of the element names. (2.4.12) 19990527 Modified xml.c to output the elements in the order given in actions.c This is necessary because although mp does not require elements to appear in any order, the XML DTD does. Of course it does this only because XML makes it needlessly difficult to specify a content model in which the order of elements is not significant but the elements themselves are. (2.4.13) 19990528 Modified config.c so that find_option() looks only at and below the node given as the first argument. This also required adding an overarching root node to the config tree so that find_option (NULL,key) would find things in the output subtree if the input subtree were present. (2.4.13) 19990528 Modified write_html_faq() in html.c to distinguish non-digital form a little more clearly. Removed the <hr> between Standard_Order_Processes. Fixed bug in output of Range_of_Dates/Times within Time_Period_of_Content in which write_html_item(q) was being called for each child of q, where it should have been called as write_html_item(r) where r loops through the children of q. Fixed a nearby bug where Ending_Date was being written with the prefix "Beginning_Date". (2.4.14) 19990618 Modified parse_sgml() in sgml.c so that if a blank line is encountered in the middle of a text value, a blank element is added to the parse tree at that point. (2.4.15) 19990621 Modified parse_xml() in xml.c to do the same thing as the previous change did to parse_sgml(); when a blank line is encountered in the middle of a text value, insert a Wblank into the parse tree. (2.4.16) 19990622 Modified xml.c to read and store attributes of XML elements read as XML. The primitive XML parser here still doesn't allow an element, including its attributes, to span multiple lines. This means that an element's start tag cannot begin on one line and end on another. This is not a severe limitation if attributes are not used. However, if attributes are allowed, this could easily become intolerable. (2.4.17) 19990623 Modified write_xml_item() in xml.c to fix bug in element-ordering code where one of the grandchildren of an element could be output before its children. There may have been only one place this could occur, in Time_Period_Information where a Multiple_Dates/Times was used. (2.4.18) 19990707 Modified write_citation() in html.c to output a colon following the title only if there is either a Series_Information or a Publication_Information. 19990708 Modified upgrade() in upgrade.c to fix bug in which the first Distributor was identified as being the first child of Distribution_Information. That isn't true if there's a blank line between them. So mp was creating an additional Distribution_Information for the first Distributor. Thanks to Scott Barnwell for pointing this out. (2.4.19) 19990709 Modified parse_name() and write_date() in html.c to better handle cases in which (a) an ancestral suffix follows the name and (b) the year is followed by non-numeric characters. (2.4.20) 19990712 Modified html.c to encapsulate the code that creates links to related files in a separate static procedure write_links(). This procedure looks at the configuration for a section "output:html:link". Within that section, if there are any of the elements link_faq, link_html, link_text, link_sgml, link_xml, or link_dif, then a link line is written to the output HTML file. For each link_type element, if a value is given it is taken as the URL for the link to the related file, with %s in the value being substituted for the name of the input file, not including its extension or its path. Modified check_extension() in syntax.c to provide the same slightly more informative error messages as allow() does when an element is unrecognized. (2.4.21) 19990714 Modified write_links() in html.c to output the link even if the related file isn't generated when a value is specified for the elements link_faq, link_html, etc. Modified upgrade() in upgrade.c to fix a bug introduced recently when I changed it to properly handle multiple Distributors within a Distribution_Information. (2.4.22) 19990715 Modified xml.c to parse the whole XML input file at once rather than one line at a time. This permits start-tags that have attributes to span multiple lines in the input file. Note that the parser code still assumes that the input data are ISO-8859-1; this is not good because XML uses UTF-8 by default. Modified config.c to parse the config file using copies of the functions that are used in text.c, illustrating the potential benefits of C++. (2.4.23) 19990723 Modified decode_xml() in xml.c to free space allocated for the text; that was temporarily used by the parser but wasn't returned to the system. 19990727 Modified write_links() in html.c to better handle the case in which the input metadata are in directories other than the one from which mp is run. It was implicitly duplicating part of the file path, because the HREF given in the BASE tag contained some of the path elements also present in the value returned from related_file(). The problem shows up only if you specify a BASE tag in output:html and you don't specify a value for link_faq, link_html, link_xml, etc. The current code still won't work well if you specify a BASE tag but you try to put the various output files in directories other than the one that contains the input file. If you want to put alternative output formats in different directories, you must use the config options link_faq, link_html, link_xml, etc. (2.4.24) 19990802 Modified html.c to write the lead text in the links line as "Metadata also available as" rather than "Available as" and to not output a link to the current file. (2.4.25) 19990819 Modified write_date() in html.c to correctly handle cases where a month is given but not a day in an ISO 8601 date. (2.4.26) 19990823 Modified write_links() in html.c to look for an element called "text" within output:html:link in the configuration. If there is one, the value given to text, if present, becomes the lead text in the line containing links to the other formats. The default value of this text is "Metadata also available as". (2.4.27) 19990826 Modified local.c to add a new function extension_is_compound() that returns 1 if the integer given as the argument is the key value of a compound element, 0 otherwise. The simplest way to do this was to add a member to the internal extension_list structure. That member is an integer 0 or 1 that gets set to 0 by default and then gets set to 1 when a child element is recognized. Modified check_extension() in syntax.c to properly handle element names at the beginning of a line of text in the text of an extension. Previously it marked these as errors and threw away the text following the element name. (2.4.28) 19990827 Modified parts.ext and parts2.ext so that a Data_Set_Part can have other Data_Set_Part elements within it; this allows data set structure to be described hierarchically. Modified write_html_faq() in html.c to handle this hierarchical structure. (2.4.28) 19990827 Modified check_scalar_children() of syntax.c to clarify (?) the famous "reclassified as text" message. (2.4.29) 19990830 Modified check_extension() in syntax.c to improve the error message generated when an element name is encountered at the beginning of a line of text in another element's value. (2.4.30) 19990909 Modified write_html_faq() in html.c to output additional information if some of the biological profile elements are present. (2.4.31) 19990913 Modified decode_tree() in text.c to generate an additional hint to check the indentation when extraneous text is discarded. (2.4.32) 19990915 Modified mp.c to remove the code that replaces whitespace in output file names with underscores when running under MS Windows (as determined by defined symbol _WIN32). Created Makefile.vc for compiling mp and cns with Visual C++. (2.4.33) 19991007 Modified html.c to insert </dt> and </dd> tags. This may help in the rare circumstance that someone wants to edit the HTML output using an editor that understands HTML tags. (2.4.34) 19991104 Implemented word-wrap in mp's text output. Added functions wrap_text() and wrap_subtree() to tree.c. Modified config.c to recognize config word "wrap". Modified write_text() in text.c to look for "wrap" within output:text, and if the value given is greater than zero, to wrap all text values to that width. For example, > output > text > wrap 72 in the config file causes all text in the text output file to be wrapped in paragraphs so that it fits in the first 72 characters of the page. This all assumes 2-space indentation in the text output, which is the default. I also had to modify mp.c to call write_text() last among the output formats so as not to muck up any of the other formats. (2.4.34) 19991104 Modified write_xml() in xml.c to not include a DOCTYPE declaration by default. Modified config.c to recognize the element "doctype". If specified under output:xml, the value associated with "doctype" will be output after the XML declaration. Typically that will be a DOCTYPE declaration, but you could really screw things up and use something else instead. If the config file has an empty element output:xml:doctype, then the default DOCTYPE is output. Modified xml.c to not encode characters on output. When XML is input, the default encoding is assumed to be UTF-8. That can be overridden on input by specifying the encoding in the XML declaration. When plain text or SGML is input, the default encoding is ISO-8859-1. Either way, characters above &127; are output AS IS in XML, and the only characters that are encoded as entities are <, >, and &. (2.4.35) 19991116 Modified config.c to recognize element "info". Within output:info, a file element is used to specify the name of the info file that cns generates. (cns 2.3.2) 19991118 Modified html.c to fix bug where, if map projection parameters were specified but no Map_Projection_Name was present, a null pointer was dereferenced. (2.4.36) 19991217 Fixed bug in xml.c in which UTF-8 characters were being decoded on input. I want to pass these through unmolested, keeping track of the input encoding, and output them with the same encoding. (2.4.37) 20000105 Modified config.c to recognize "stylesheet", "type", and "href". Modified xml.c to consult config for "output:xml:stylesheet"; if present, looks for "output:xml:stylesheet:type" and "output:xml:stylesheet:href". Puts the values of those strings into an xml-stylesheet processing instruction. (2.4.37) 20000105 Modified xml.c to fix bugs in parser's handling of comments and unrecognized elements. One side effect is that now both XML and SGML tags will be case sensitive. XML is case sensitive, but SGML is not; this may cause trouble if there are any applications out there that input metadata to mp with uppercase tag names. (2.4.38) 20000106 Modified decode_xml() in xml.c to increment line_number at the right time, and not at the wrong time. Error messages parsing XML should now be more logical. Thanks to Frank Roberts for pointing this out. (2.4.39) 20000121 Modified decode_text() in config.c and in local.c to explicitly look for and then disregard comments. Here a comment is any line whose first non-blank character is the pound sign (#). This change keeps comments from disrupting mp's understanding of the indentation of the config file and the extensions files, so that, for example, if you have >output > html ># this is a comment; note that it is indented less than the ># element immediately above it. > file %s.html > faq %.faq.html Then the "file" and "faq" elements will now be properly recognized as children of the "html" element. I think it used to consider them children of the comment, and thus were silently ignored. I bumped all minor version numbers by one for this fix. (2.4.40) 20000124 Significant modification of memory management in all programs. In tree.c, the allocate_item() and deallocate_item() routines have been replaced with routines that allocate items in large chunks and manage the chunks themselves. This produces fewer calls to malloc() and free(), which are costly on some systems. Also in the process I think I found and fixed a number of less obvious bugs in the code, particularly in text.c. (2.5) 20000128 Introduced a new program "mq", built from guts of mp and tkme. This program reads a Tcl script provided by the user and executes it. In addition to the standard Tcl language, the following procedures may be called: >read_config <config_file> >parse_text <input_file> >find_first <element_name> >find_in <address> <element_name> >find_next <address> <element_name> >value_of <address> >forget find_first returns the address of an element in hex. This should not be modified, but should only be used as input to find_in, find_next, and value_of. forget causes the entire metadata record to be removed from memory, so you can read another record. (mq 1.0.0) 200001 Major change to mq. The syntax is now more like that of Tk widgets: >read_config config_file This command reads a standard config file, which will apply to all metadata read in this Tcl session. You can read only one config file, and you have to do it before you read any metadata. >metadata m -parse input_file Here m is a Tcl variable name. On output it is given a unique value that allows mq to keep track of it. This command returns 1 if the parsing was successful, 0 otherwise. >$m find_first element_name Here m is a Tcl variable previously passed to the metadata command above, and element_name is a standard or extended element name. This command returns the address of a matching element in hex, or zero. Use the address in subsequent commands. >$m find_in address element_name Here address is the value returned from the find_first subcommand above or a similar subcommand below. If the given address is an element whose name matches the target element_name, the same address is returned. Otherwise it returns the address of the first child, grandchild, or more distant descendant node whose name matches the target element_name. >$m value_of ?-nonewline? address If the address given matches a data element, this returns its value as a string. If you specify -nonewline, then it comes as one line, with each line separated by a single blank space. Otherwise each line is separated by the newline character (ASCII 10). >$m contains address This returns 1 if the address corresponds to one of the elements in the tree, 0 otherwise. >$m name address This returns the name of the element at the given address. >$m next address >$m prev address >$m parent address >$m child address These subcommands return the address of the next, previous, parent, or child node in the tree, relative to the given address. Zero is returned if there is no corresponding element. These provide a way to walk through the tree manually. >$m forget This frees all of the memory used for a metadata record. I cannot recommend using the Tcl command unset; it seems to take a long time to complete. Note that this interface allows you to read and manage more than one metadata record at a time. If you're going to read a lot of records, you will probably want to use the unset command when you're done with each one. (mq 2.1.0) 20000202 Modified mq.c to express the config-file handling in a different and more useful way. Now we have the following commands >config read config_file reads the named config file, returns 1 if successful, 0 if not >config find_first option returns the address of the named option in the config tree. This is something you use in other config commands. >config find_next address ?option? returns the address of the next option after the one whose address is given as the address argument. If no option name is specified, it looks for an option with the same name as the one whose address is given. >config find_in address option returns the address of the first option of the given name that is within the subtree headed by the node at the address given. >config value_of address returns the value of the option at the address given. (mq 2.2.0) 20000211 Fixed bug in write_citation() in html.c that, when a Larger_Work_Citation was present, passed its address, rather than the address of its child Citation_Information, to write_citation() for formatting. The result was that no Larger_Work_Citation elements were being output. (2.5.2) 20000217 Modified mq.c to include some new functions: >value_set <address> <text> This assigns the given text to the element at the address given if that element can contain a value. An error is reported if the address is that of a compound element. >insert ?-before | -after | -child? <address> ?<element>? An element of the specified type is added to the tree before, after, or as a child of the element whose address is specified. If no placement is specified (that is, neither -before, nor -after, nor -child is given), the element is added as a child of the node whose address is specified. This function does NOT check to see whether the element so inserted is permitted to be there by the FGDC metadata standard. >delete <address> The element at the specified address is removed from the tree and cannot be recovered. >copy <address> A copy of the element at the specified address is made and its address is returned as the result of this operation. The copy has no links upward, forward, or back, and can thus be attached to any other metadata record using the paste function. >paste ?-before | after | -child? <address> <subtree> The subtree given as the final argument is attached to the current metadata record before, after, or as the last child of the element at the address given. This works only if the address argument is a compound element in the tree. >write ?-format text | sgml | xml? <filename> The metadata record is written out to the disk file whose name is specified as the final argument. If no format is specified using "-format <format>", then the format is text unless the output file ends with ".sgml" or ".sgm" or ".xml" in which cases the file will be written as SGML or XML. (mq 2.3.0) 20000222 Modified process_metadata() enabling a -list option to the function value_of. When value_of is used with -list, the result is returned as a list of lines rather than as a single block of text. It makes no sense to have both -list and -nonewline, so writing the value_of command with both options results in an error. > is_compound <address> This function returns 1 if the element at the address given is a compound element, and 0 if the element is not compound. (mq 2.3.1) 20000224 Modified syntax.c to accept the value "infinite" for the element Denominator_of_Flattening_Ratio. This allows people to use the sphere as a geodetic model. Thanks to Aleta Vienneau for pointing out this problem. (2.5.3) 20000329 Modified html.c to use write_html_value() instead of a simple munge() for the text associated with a Browse_Graphic_File_Description. In most cases, this text is short and unadorned and a simple output would do fine. However if the metadata writer put any >'s in the value, they should be respected as preformat indicators. (mp 2.5.4) 20000503 Modified xml.c to write the apostrophe out in XML as ' and the quotation mark as ". Modified entities.c to include these symbols in the ISO-8859-1 encoding table, so they are generated in the SGML output and recognized in SGML input as well. Archie Warnock indicates some XML or SGML parsers need the apostrophe and quotation marks to be encoded in this manner. (mp 2.5.5) 20000703 Modified xml.c to call strcpy() rather than decode_entities() because the entities are decoded inline at an earlier step in the parsing process; the subsequent call to decode_entities() affected only the ampersand character, which was then ignored. Thanks to Frank Roberts for having the perseverance to lead me to find this bug. (mp 2.5.6) 20000707 mp crashed if you had an empty Data_Set_G-Polygon element. Fix was to modified upgrade.c to look for Data_Set_G-Polygon_Exclusion_Ring only if a Data_Set_G-Polygon_Outer_G-Ring was found. Thanks to Jennifer Lenz for helping me find this bug. (mp 2.5.7) 20000803 Added limited support for foreign languages. This is implemented in mp, cns, xtme, and tkme through a command-line option -l <code> where the code is "es" for Spanish and "id" for Indonesian. Preferred language can also be specified in the config file using >input > language es Replace es with id for Indonesian; en would be for English, but if the value is unrecognized or missing the software will use English element names. Spanish-language element names were kindly provided by Dr. Ing. Carlos López of the Clearinghouse Nacional de Datos Geográficos, Uruguay. <http://www.clearinghouse.gub.uy/> Indonesian-language element names were kindly provided by the Indonesian National Coordination Agency for Surveys and Mapping BAKOSURTANAL French-language element names were kindly provided by the Canadian Center for Remote Sensing, Natural Resources Canada Coincidentally added an extra element to the standard, at end of Metadata_Reference_Information I have added Metadata_Language. This element is not required, of course, but is permitted by mp. (mp 2.6) (mq 2.4) 20000804 Modified full_text() in html.c to check q->d before calling strlen() on it. Avoids a crash that I believe occurred when reading XML files and generating faq.html from them. This change made the same day as the change to 2.6.0 documented in the previous Process_Step, so I'm not bumping the version number for it. 20000804 Modified html.c to rephrase the question for which the answer is taken from the Process_Steps. Previously the question was "How were the data processed and modified?" The question is now written "How were the data generated, processed, and modified?" My hope is that this will help people to be comfortable describing data creation as process steps. (mp 2.6.1) 20000901 Modified add_related_file() in both mp.c and cns.c to add a slash to the end of the result of getcwd. Thanks to Robert Wilhite of NOAA-CSC for pointing this out. (2.6.2) 20000915 Modified local.c to output more informative messages when a problem is encountered in building the extensions list. Since this does not affect the program's normal operation, I'm not changing its version number. Applies to mp, cns, xtme, and tkme. 20000925 Added direct support for the Biological Data Profile (FGDC-STD-001.1-1999) to all programs. This involved modifications to keyword.h, keyword.c, actions.c, config.c, syntax.c, mp.c, tkme.c, cns.c, and xtme.c. Activate support for this profile by specifying "profile bio" under "input" in the config file. (mp 2.7.0) FGDC-STD-001.1-1999 20001006 When I included the bio profile elements, I made it impossible to use the same elements as extensions even when the bio profile was not being used. This is because the bio profile elements are kept by mp in the same bucket that it uses for standard elements, and you can't use an extension that has the same name as a standard element. Some of my geological data sets use the geologic age extensions, which are taken from the bio profile, but they don't use the rest of the bio profile. One solution would be to simply use the bio profile for these records. In this case the geologic age elements are recognized properly, but a spurious error message is generated because the bio profile includes one mandatory element, Description_of_Geographic_Extent within Spatial_Domain. The missing element is flagged as an error. The correct solution is for mp and friends not to know the bio profile elements unless the bio profile is used by choice. That way the same elements can be introduced as extensions in the usual way. So I modified keyword.c and made some changes to the main programs as well, to introduce a function use_element_names (language,profile). This function selects standard element names using the requested language, and adds to them the profile element names (any profile you want, as long as it's the bio profile). (mp 2.7.1) (mq 2.5.1) (cns 2.6.1) 20001010 Modified keyword.c to correct two of the spanish-language element names, doubling the second 'r' Prerequisitos_Técnicos (new spelling Prerrequisitos_Técnicos) and including an acute accent over the i in Cuadricula (in 3 element names, now spelled Cuadrícula. (mp 2.7.2) 20001027 Modified translated() in html.c to not include trailing punctuation in a URL except for '/' or '?'. Modified write_html() and write_faq_html() in html.c to give the internal hyperlinks unique names. (mp 2.7.3) 20001208 Modified mq.c to enable commands detach and attach. $m detach $p detaches the element p from the metadata record m. $m attach $q $p attaches the detached element p to the metadata record m as a child of q. Full syntax for attach is like that for insert: $m attach ?-child | -before | -after? <address> <detached address> and it only works if the detached item has NULL for its parent, next, and prev links. (mq 2.5.3) 20001214 Modified syntax.c to allow free text in Map_Projection_Name. This reflects a change in the domain of that element that was introduced with CSDGM2 (FGDC-STD-001-1998). The 1994 version of the standard restricted the names to a specific set. Thanks to Leslie Bearden for pointing this out. (mp 2.7.4) 20010202 Modified check_Identification_Information() in syntax.c to permit Spatial_Domain to be missing if using the Biological Data Profile. In the BDP, Spatial_Domain is mandatory if applicable. Thanks to Diane Schneider and Terry Giles for pointing this out. (mp 2.7.5) 20010207 Modified check_Citation_Information() in syntax.c so that if the biological profile is being used, Geospatial_Data_Presentation_Form is mandatory. Thanks to Terry Giles for pointing this out. (mp 2.7.6) 20010209 Modified parse_sgml() in sgml.c to read the whole input file at once, rather than line-by-line. This avoids a limitation imposed on line length in the input file. Thanks to Margaret Lyszkiewicz for pointing this out. Also added a question to the FAQ-style HTML output, "What similar or related data should the user be aware of? for which the answer is taken from Cross_Reference. (mp 2.7.7) 20010214 Modified parse_sgml() in sgml.c to fix bug in handling multiline input files introduced by previous revision. (mp 2.7.8) 20010215 Modified check_Taxonomy() in syntax.c to permit more than one Taxonomic_Classification in Taxonomy. This was done at the request of the authors of the BDP to reflect their intent that the standard permit organisms from more than one kingdom to be documented in the same record. As it is formally written, the BDP makes it essentially impossible to include in the same record taxonomic classification info for both plants and animals, or plants and fungi, for example. (mp 2.7.9) 20010227 Modified syntax.c to check the text value of the BDP element Case_Sensitive and to not require Metadata_Review_Date if Metadata_Future_Review_Date is given. Thanks to Terry Giles (Johnson Controls working for USGS) for pointing this out. (mp 2.7.10) 20010326 Modified the paste command in mq so that it takes a string or XML string as its final argument rather than the address of a previously-detached subtree. Prior to this the attach command and the paste command did the same thing. (mq 2.5.5) 20010420 Modified write_xml_item() in xml.c to examine the static variable element_order when writing the children of an element. element_order is an enum that can be either STANDARD or ASIS (in future there could be others). If its value is STANDARD, then the elements are written in the order given in FGDC-STD-001-1998 and extensions are written after all standard elements. This is and has been the default behavior. If the value is ASIS, then elements are written in the order they appear in the parse tree (as input in mp, as modified by Tkme). Modified config.c to recognize the element "order" which will be used when found under output:xml. The effect of this change is to allow people to explicitly request that mp and Tkme NOT rearrange the elements in the order given in the FGDC standard. Since most profiles have not put extensions after all standard elements, using the standard order causes the extensions to be put out of the order expected in the profile. This change allows mp and Tkme to retain the input order. (mp 2.7.11) (tkme 2.8.10) (xtme 2.6.3) (mq 2.5.7) 20010525 Modified write_html() and write_faq_html() in html.c to look for a config option header_file if header is not specified under output:html in the config file. If header_file is found, its value is expected to be a readable text file whose contents will be incorporated into the HTML output as if they had been specified as the header in the config file itself. This allows you to maintain the HTML header separate from the config file. A corresponding change was made for footers; if the footer directive is not found under output:html, then if a footer_file directive is present, its contents will be used as the HTML footer. (mp 2.7.12) 20010529 Modified write_faq_html() in html.c so that a   is written if the Enumerated_Domain_Value_Definition is empty. (mp 2.7.13) 20010621 Modified upgrade.c to remove blank lines within Enumerated_Domain elements. The code there was creating an extraneous container when a blank line immediately followed Enumerated_Domain and preceded the corresponding Enumerated_Domain_Value. Thanks to Peg Rawson (USGS and National Atlas) for helping to find this bug. (mp 2.7.14) 20010626 Modified keyword.c to include element names in catalan provided by Dr. Ing. Carlos Lopez of Uruguay. This includes updated spanish element names as well. (mp 2.7.15) (mq 2.5.9) (cns 2.7.1) (xtme 2.6.4) (tkme 2.8.11) 20010719 Fixed buffer overflow in extension_of() and extension_of_sgml() in local.c. All programs affected. (mp 2.7.16) (mq 2.5.10) (cns 2.7.2) (xtme 2.6.5) (tkme 2.8.12) (err2html 2.1.1) 20010803 Modified html.c to strip carriage-returns from the text included via the header_file directive of the config file. (mp 2.7.18) 20010905 Modified upgrade.c to avoid seg fault when an empty Enumerated_Domain element is encountered in the input. Thanks to Tirumal R. Jallepalli for pointing this out. (mp 2.7.19) 20011004 Modified local.c to recognize the element "Z39.50_Tag" as the same as "z3950". This tag is not used by mp at present, and may never be used. The change makes the extension file just a little less code-like and a little more readable, but for practical purposes has no effect on the behavior of mp or the other programs. (version not changed) 20011009 Modified upgrade.c so that it leaves alone any value of Metadata_Standard_Version beginning with "FGDC-STD-001" This causes it to leave alone the specific version numbers of approved profiles of FGDC-STD-001-1998. (mp 2.7.20) 20011020 Modified xml.c so that leading spaces in element values are skipped. This is equivalent to the behavior of mp when reading indented text files, and makes it easy to check for specific text values such as for Progress. Thanks to Travis Stevens (NOAA-NGDC) for pointing this out. (mp 2.7.21) 20011029 Modified config.c to recognize element "Label". Modified html.c to look for "label" within output:html:link instead of "text" in the same place, to preserve the compound nature of "text" as a config element within "output". (mp 2.7.22) 20011031 Modified mq so that elements or subtrees inserted or pasted without an explicit -before or -after directive will be emplaced in the order specified by the Standard. (mq 2.5.12) 20011109 Modified mp.c to remember the character encoding of the input file. This is kept as a string, and should normally be either UTF-8 or ISO-8859-1. New global functions set_character_encoding and get_character_encoding access this attribute. Modified html.c to get the character encoding and output UTF-8 characters correctly. Previous to this, UTF-8 characters were interpreted as ISO-8859-1, which they aren't, so Europeans who used UTF-8 got strange-looking stuff in their HTML. (mp 2.7.23) 20011126 Modified attach_subtree_in_order() in mq.c to fix bug in which the subtree was appended to the end of the root node even after it had been placed properly. This caused a loop to occur in the tree. (mq 2.5.13) 20020128 Modified syntax.c to note as an error the use of 00 in the month and day parts of a date, as in "19950000". I see no provision in the available documentation to suggest that 00 is a legitimate value for either the month or the day. (mp 2.7.24) 20020225 Modified html.c to use write_html_value() instead of just munge() for expressing the Entity_Type_Definition and Attribute_Definition in FAQ-style HTML. (mp 2.7.25) 20020416 Modified sgml.c and xml.c so that parsing an SGML document is actually done by the XML parser code. There's a problem with the old SGML parser code that I see no good reason to find and fix. The only negative effect of this change is that ASTM tags for SGML metadata are no longer supported. But I suspect that no one has ever used ASTM tags for metadata, and that no one ever will. (mp 2.5.26) (mq 2.5.14) 20020509 Modified write_html_faq() in html.c to fix bug that caused fault where Single_Date/Time has no Calendar_Date inside it. Thanks to Aleta Vienneau for pointing this out. (mp 2.7.27) 20020510 Modified config.c to recognize style_file and script_file. Modified html.c to look for these elements within output:html and include the contents of the files they name within the head element of the HTML output. (mp 2.7.28) 20020613 Modified check_date() in syntax.c so that zeros beyond the first four decimal digits are not considered an error IF the date has one of the prefixes bc, cc, or cd. Thanks to Travis Stevens for pointing this problem out. (mp 2.7.29) 20020702 Modified copy_item() in tree.c so that it properly handles cases where p->d is NULL. This affected copy operations in Tkme where the input file was XML. Thanks to Hugh Phillips for pointing this out. Also made extensions not case sensitive when encountered in indented text input (XML tags are always case sensitive). (mp 2.7.30) (mq 2.5.15) (cns 2.7.3) (xtme 2.6.8) (tkme 2.8.17) (err2html 2.1.2) 20020710 Modified check_Standard_Order_Process() in syntax.c so that it generates an error (type "missing") message when the given Standard_Order_Process contains neither a Digital_Form nor a Non-digital_Form. Thanks to Terry Giles for pointing this out. (mp 2.7.31) 20020829 Modified decode_xml() in xml.c to handle properly a comment at the beginning of the XML input file. Thanks to Daniel Berhanu for bringing this problem to light. (mp 2.7.32) (xtme 2.6.9) (Tkme 2.8.18) (mq 2.5.16) 20020910 Modified write_xml_item() in xml.c so that when asked to write elements in standard order it uses a more straightforward method, avoiding find_key(). This allows it to discover child elements of the same type as the parent. Thanks to Helena Schaefer for alerting me to this problem. (mp 2.7.33) (mq 2.5.17) (xtme 2.6.10) (tkme 2.8.19) 20021001 Modified write_html() and write_faq_html() in html.c to remove carriage returns from included script files and included style files (it already did this to included header files. This avoids having extra end-of-line characters in the output on Windows systems. (mp 2.7.34) 20030204 Modified decode_xml() in xml.c so that when a comment is found, the buffer is emptied if it only contains whitespace to that point. Thanks to Archie Warnock for showing the problem. (mp 2.7.35) 20030225 Added direct support for the shoreline profile and the remote-sensing profile. This entails modifications in keyword.h, keyword.c, syntax.c, ps8.c, and actions.c. All programs got bumped a minor version number for this. One of the side effects is a simplification so that all elements defined in any of these profiles will be recognized and their structure checked. However unless you put "profile rs" or "profile sh" into the input section of the config file, standard elements won't be judged according to the rules specified in the profile. So if you use elements from one of the profiles but you don't tell mp you're using that profile, those elements may be flagged as errors because mp thinks they aren't supposed to be there. But that, in my opinion, is better than having them come up as unrecognized elements. (mp 2.8.0) (mq 2.6.0) 20030307 Modified write_html() and write_faq_html() to look for an option omit_url within output:html; if present, the URL is not written out at the end of the file. (mp 2.8.1) 20030520 Modified upgrade.c to better handle blank lines within the Keywords sections and multiple Theme, Place, Stratum, and Temporal elements. Thanks to Tobin Smith (Veridyne) for bringing the problem to my attention. (mp 2.8.2) 20030528 Modified do_meta_tags() in html.c so that the dc.lang element is not always "en", but could be the value of a Language or Metadata_Language element or the value assigned to the input:language option in the config file. It does not (yet) pick up a language specified on the command line with the -l switch. Thanks to Terry Giles for pointing this issue out. (mp 2.8.3) 20030723 Modified actions.c to add Description_of_Geographic_Extent within Spatial_Domain in shprofile_element array. Modified syntax.c to allow Description_of_Geographic_Extent in Spatial_Domain and Barometric_Pressure within Marine_Weather_Conditions when using shoreline profile. Thanks to Mike Moeller (NOAA) for pointing these problems out. (mp 2.8.4) 20030805 Modified parts of html.c to make the output html conform to "HTML 4.01 Transitional", giving both the outline-style and the FAQ-style HTML output the DOCTYPE declaration ><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd"> Thanks to Florence Wong (USGS) for pointing out the need to make some fixes in the code to achieve better conformance. (mp 2.8.5) 20030806 Modified insert_item_after(), insert_item_before(), delete_children(), and delete_item() in tree.c to simplify their structure and, I believe, fix a bug that caused an inserted element to not appear in the tree if an adjacent element was deleted. These changes should improve Tkme but have their biggest effect on mq, so mq's version number is incremented. mp would not be affected. (mq 2.6.1) 20030813 Modified local.c, creating a function add_extension() that can be called while parsing an XML file to add an unrecognized tag as an extension (albeit one about which we know little). Modified decode_xml() in xml.c to call this function. The net effect of this change is that unrecognized elements in XML and SGML files will be tolerated and can be manipulated, rather than ignored and possibly discarded altogether. The practical effect, I hope, is that XML records produced by any version of ArcCatalog can be edited safely using Tkme and parsed with mp. Ideally, mp would be told exactly where the extended elements go and how they should appear, but with the emphasis on XML, it is possible to include new tags in metadata without documenting them beyond inclusion in a DTD, and the DTD step could conceivably be skipped also, since well-formed XML suits most applications. (mp 2.8.6) (mq 2.6.2) (tkme 2.9.3) (xtme 2.7.2) 20030923 Modified keyword.c to add a function is_standard() which, given an element's numerical key, tells whether the element is part of the FGDC standard (through FGDC-STD-001-1998) or not. Modified allow() in syntax.c so that only a warning is issued, not an error, when a non-standard element appears where it is not expected to be. This is a significant change from the regulatory perspective. It means that mp will not call an error the occurrence of an extension even if mp doesn't know what the extension is or where it's supposed to go. The chief benefit of this is that ESRI and other creators of metadata tools can now invent and add elements to the metadata they create without me having to play catch-up and try to get accurate descriptions of what these extensions are and where they should go. This doesn't mean that downstream software will understand the information in nonstandard elements, but if your metadata record was generated by some software, mp won't tell you it's wrong just because there's a nonstandard element in it that mp doesn't know about ahead of time. (mp 2.8.7) 20031218 Modified allow() in syntax.c to correct bug in the previous change. Modified decode_xml() in xml.c to generate a warning when an unknown extension is encountered. (mp 2.8.8) 20040107 Modified decode_xml() in xml.c to use expat, a freely-available XML parser, rather than the original parser code. This will make the input of XML documents more consistent, and fix the bug in which the occurrence of a comment within the text of an XML element caused the parser to discard the entire text (thanks to Hugh Phillips for noticing this problem). Backed off a prior change in which unrecognized extensions were recognized and not discarded. Tkme wasn't written to distinguish unknown compound extensions from scalar elements, so its behavior became inconsistent when unknown extensions were permitted. (mp 2.8.9) (mq 2.6.3) (tkme 2.9.5) (xtme 2.7.3) 20040123 (sigh) Fixed bug in xml input code, where a textual value larger than 8k bytes would cause a segmentation violation. No new behavior other than this. (mp 2.8.10) (tkme 2.9.6) (mq 2.6.4) (xtme 2.7.4) 20040126 Modified the element command in mq so that it distinguishes among the known profiles (biological data, shoreline, remote-sensing) that are built into mp. (mq 2.6.5) 20040129 Created a new extension file ESRI-ISO.ext within tools/ext. I built this from the document type definition file found inside http://www.esri.com/metadata/esri_iso01dtd.zip Since I don't understand the ISO documents fully I cannot judge how well the DTD corresponds to the ISO standard. I have used the new ext file to parse metadata output by ArcCatalog 8.3, and it appears to work, causing mp to understand and not discard the ISO elements that ESRI has used. In principle this should also have the effect of allowing people to edit ArcCatalog metadata in place using Tkme without having to export the record first. However caution is advised: make a backup copy beforehand, and keep it until you are certain that both mp and ArcCatalog can read the XML file successfully. 20040129 Modified config.c and local.c to send to an internally-managed buffer any error or warning messages generated while parsing the config file or the extension files. The contents of these buffers can be retrieved by calling config_errors() or ext_errors(). So the calling program (typically mp or Tkme) decides if, when, and how those messages are shown to the user. Modified mp.c to show those messages after the banner is displayed. (mp 2.8.11) 20040213 Modified array rule_table in syntax.c to properly direct mp to run check_Bounding_Altitudes when a Bounding_Altitudes is detected (previously this entry was not present, and so the Bounding_Altitudes element was checked as though it were a scalar element). Thanks to Terry Giles for pointing this problem out. (mp 2.8.12) 20040405 Modified element_cmd() in mq.c to return the section number from the standard document (1998 version) using the element command like [element $name -number]. Note that section numbers are highly problematic; they only refer to a specific version of the standard document, some of the numbers have changed from the 1994 version to the 1998 version, one of the elements now has two different section numbers (Online_Linkage occurs both within Citation_Information and within Metadata_Extensions), and section numbers do not occur at all in the biological or remote sensing profiles. (mq 2.6.7) 20040412 Modified parse_sgml() within xml.c so that parsing of SGML actually begins with <metadata> rather than at the beginning of the file. This is because expat chokes on the SGML DOCTYPE declaration, which doesn't have the same form as an XML DOCTYPE. Modified decode_xml() in xml.c to first parse a set of entity declarations if the record begins with <metadata>. These entity declarations are not included if the file begins with an XML declaration. The chief use of them is to help parse SGML records previously created using mp. Thanks to Sarah McGuire (NPS, Madison) for pointing this issue out. (mp 2.8.13) (xtme 2.7.6) (tkme 2.9.9) (mq 2.6.8) 20040603 Modified check_Range_of_Dates_Times() within syntax.c so that it generates a missing element error if you have a Beginning_Date but no Ending_Date. This problem only occurred when using the bio profile. Thanks to Terry Giles (USGS) for pointing this out. (mp 2.8.14) 20040811 Modified check_Digital_Form() in syntax.c so that Digital_Transfer_Option is not repeatable. Thanks to Eric Compas (NPS, UWisc) for pointing this problem out. (mp 2.8.15) 20040816 Modified character_data() in xml.c to allow text of more than 8k characters to be ingested at a time. This became a problem if you had an XML metadata record containing a textual value that was larger than 8192 bytes all on the same line. (mp 2.8.16) 20040913 Expat automatically converts ISO-8859-1 to UTF-8, so if you read an XML file, the internal representation of characters will be UTF-8. I still think it's good to store the plain text as ISO-8859-1, so I now need to convert UTF-8 to ISO-8859-1 when writing plain text. Modified write_text_item() in text.c to convert UTF-8 characters to ISO-8859-1. Created a separate module encoding.c to contain the function unicode_of_utf8(), now global in scope. Removed the copies of that function found in html.c and xml.c. (mp 2.8.16) (mq 2.6.9) (tkme 2.9.11) (xtme 2.7.7) 20040914 Modified write_html() and write_html_faq() in html.c to translate the title from UTF-8 to ISO-8859-1 if the metadata are stored in UTF-8. The code already carried out this translation for the text within the metadata but I had neglected to do the same for the title, specifically when writing the html <title> element within the document head and when writing the heading tag <h1> in outline style, <h3> in FAQ style. (mp 2.8.17) 20040915 Modified check_Map_Projection() in syntax.c to not allow Other_Projection's_Definition within Map_Projection. Modified check_Map_Projection_Parameters() to allow Other_Projection's_Definition to occur there. Modified numbers.c to assign the correct new section number for Other_Projection's_Definition. Modified actions.c to move Other_Projection's_Definition from Map_Projection to Map_Projection_Parameters. This allows tkme and xtme to know where to let you put it. Modified write_html_faq() in html.c to better handle the occurrence of Other_Projection's_Definition, and also of Map_Projection_Parameters in Map_Projection. Thanks to James W. Allor (US Census Bureau) for pointing this out. Modified upgrade(). If Other_Projection's_Definition appears within Map_Projection, move it into Map_Projection_Parameters. If no Map_Projection_Parameters exists, create one for this. (mp 2.8.17) (err2html 2.1.4) (tkme 2.9.12) (xtme 2.7.8) 20040917 Modified equalize_indented_scalars() in text.c so that parent links and indent values are set properly when correcting text containing variously-indented lines. Thanks to Jo Anne Stapleton for giving me enough data to find and correct this rare problem. (mp 2.8.18) (mq 2.6.10) (xtme 2.7.9) (tkme 2.9.13) 20041104 Modified config.c to recognize a new directive "schema" within output:xml. Modified xml.c to write the value given in output:xml:schema as one of the attributes of the root element (Metadata) when writing XML. If a schema is specified in the config file and the root element has no other attributes, then the root element will be assigned the attribute xmlns:xsi with the value http://www.w3.org/2001/XMLSchema-instance along with the attribute xsi:noNamespaceSchemaLocation, whose value will be the text given to output:xml:schema in the config file. (mp 2.8.19) (mq 2.6.11) (tkme 2.9.14) (xtme 2.7.10) 20050112 Modified keyword.c, replacing array fr_std[] with new French-language element names provided by John Cree. This completes support for French for both standard elements (those contained within the 1998 CSDGM) and the Biological Data Profile. At this writing, elements from the shoreline and remote sensing profiles remain untranslated (that is, the "French" versions of those extended elements are the English. (mp 2.8.20) (cns 2.8.1) (xtme 2.7.11) (tkme 2.9.15) (mq 2.6.12) (err2html 2.1.6) 20050128 Modified write_html() and write_html_faq() in html.c to accept the stylesheet element in the config file in much the same way as it is handled for xml. In this case the stylesheet type can be omitted, in which case the type will be assumed text/css. The value href will be used as the link to an external CSS. This information is written into the head element as a link element: <link rel="stylesheet" type="text/css" href="..."> Furthermore it is possible to have more than one stylesheet element within output:html; each generates a separate <link> element in the HTML output. (mp 2.8.21) 20050210 Modified url fragments in links.c so that spaces and punctuation are replaced with the corresponding hex code, like '%20' for space. (err2html 2.1.7) 20050211 Modified write_html_faq() to pass the value of Unrepresentable_Domain through write_html_value() rather than simply writing it out. My opinion is that the text in Unrepresentable_Domain ought to be short and uncomplicated (that is, not containing >'s) but I see no reason not to accommodate those who would put lists or other semi-structured information there. (mp 2.8.22) 20050405 Modified the French-language name of the element Quantitative_Attribute_Accuracy_Assessment. Thanks to John Cree (Environment Canada) for pointing this problem out. (mp 2.8.23) (cns 2.8.2) (xtme 2.7.12) (tkme 2.9.17) (mq 2.6.13) 20050615 Modified mp.c to not upgrade metadata by default if the file format is XML. The upgrade process was making mischief when a keyword thesaurus element was out of order. (mp 2.8.24) 20050829 Modified subcommands of mq. Fixed "attach" so that it returns an error when called without the address of a snippet to attach. Changed "name", "line", "parent", "child", "prev", and "next" so that these work when the address given is a detached subtree (snippet). Previously these subcommands required the address to be a valid node in the current parse tree. (mq 2.6.14) 20050922 Modified mq.c to add a subcommand prune for a parsed metadata record. (mq 2.6.15) 20050926 Modified syntax.c to flag as an error the co-occurrence of more than one type of domain within Attribute_Domain_Values. Modified uppgrade.c to try to patch affected metadata by inserting additional Attribute_Domain_Values elements where there are extra domain values sections. (mp 2.8.25) 20051121 Modified actions.c to eliminate duplication in the remote-sensing profile element Instrument_Information that was causing XML files to contain duplicate information. Also modified upgrade.c to replace the Metadata_Standard_Version value only if the existing value does not begin with "FGDC-STD-" rather than the more specific "FGDC-STD-001". Thanks to Pete Keehn (NOAA) for pointing these problems out. (mp 2.8.26) (mq 2.6.16) (xtme 2.7.12) (tkme 2.9.18) 20060127 Integrated support for German element names and composed a German help file for the editors. This is based on the work of Peter Korduan (University of Rostock). The help file includes information about the precision agriculture extension that was the subject of his work. Created a help file for French as well, using the translation of the Standard provided by Environment Canada. Modified html.c to avoid a crash when reading an XML file whose Citation Title contains blank lines. Thanks to Hanna Habashy (Univ. South Carolina) for pointing to this problem. (mp 2.8.27) (cns 2.8.3) (xtme 2.7.14) (tkme 2.9.19) (mq 2.6.17) 20060228 Modified element_start() in xml.c to flag as an error any non-blank text found in the text buffer when an element start tag is found in XML input. Previously, mp was holding onto the text and would assign it to the next element for whom a close tag was encountered. So it was possible to have text in the wrong place but mp would put it somewhere else. This is surely a rare and strange condition, but it is one that ought to be flagged as a significant error if it occurs. (mp 2.8.28) (xtme 2.7.15) (tkme 2.9.20) (mq 2.6.18) 20060331 Modified config.c to recognize element "encoding" to be used under input (as a substitute for 'codeset') or under output:html. Modified html.c to allow the output to be written as UTF-8 if output:html:encoding is set to UTF-8 in the config file, otherwise HTML output defaults to ISO-8859-1. I believe this will make it possible in principle to support Russian text. (mp 2.8.29) 20060503 Fixed a few problems with the implementation of UTF-8 conversion, particularly in HTML output files. (mp 2.8.30) 20060504 Modified encoding.c to fix UTF-8 encoding. Thanks to Olga Vasik (TINRO, Vladivostok, Russia) for persistence in following up on this issue. Because this has the effect of making UTF-8 encoding really work for HTML output, I'm bumping up the 2nd-level version number. (mp 2.9.0) 20060511 Modified mp.c so that -fixdoc is detected before -f in the list of command-line arguments. This re-enables -fixdoc (which should be rarely needed, but I needed to use it today!) (mp 2.9.1) 20060630 Modified upgrade() in upgrade.c so that no change is made if there is exactly one *_Keyword_Thesaurus in a Theme, Place, Stratum, or Temporal element, regardless of its position within that element. (mp 2.9.2) 20061003 Modified syntax.c to allow Oblique_Line_Azimuth and Oblique_Line_Point within Map_Projection_Parameters instead of Oblique_Line_Latitude and Oblique_Line_Longitude. Thanks to Matthew McCready (Census Bureau) for pointing this mistake out. (mp 2.9.3) 20070305 Modified check_scalar_children() in syntax.c so that if an element that is supposed to contain some value contains instead another element (this can happen if someone misplaces an element in XML), it doesn't crash but instead generates a "misplaced element" error. Came to attention by examining web server error logs of the online metadata validation service. (mp 2.9.4) 20070920 Modified main() in mp.c to include the original input file name in the error output. Modified write_html() and write_html_faq() in html.c to include the original input file name as a <meta> tag in the HTML output, with name="generated-from". Modified upgrade() in upgrade.c so that Metadata_Standard_Version is upgraded only if it appears within Metadata_Reference_Information. The only way I can imagine it being anywhere else is in the metadata for mp itself, where it had occurred within the text of a couple of Process_Description elements. (mp 2.9.5) 20070927 Modified element_start() and element_end() in xml.c to not null-terminate the output data buffer if the buffer pointer is NULL. This condition arose when an end tag was encountered without the parser ever seeing any character data. Thanks to Gennady Khokhorin for noticing the problem. (mp 2.9.6) (xtme 2.7.17) (tkme 2.9.22) (mq 2.6.21) 20080122 Modified keyword.c, adding support for Portuguese element names kindly supplied by > Luis Cavalcanti Bahiana > Pesquisador em Informações Geográficas > IBGE- Coordenação de Geografia > Av. República do Chile 500-12 andar > Brazil (mp 2.9.7) (cns 2.8.4) (xtme 2.7.18) (tkme 2.9.23) (mq 2.6.22) (err2html 2.1.9) 20080505 Modified mp.c to write the date and time into the error log file as information. (mp 2.9.8) 20080916 Modified write_sgml_text() in sgml.c to behave like write_xml_text() in xml.c: don't convert characters to entity references. This allows UTF-8 characters to be put into the SGML output as they are, and we just hope that software downstream can be told they are UTF-8. mp 2.9.9 20090130 Modified html.c to remove unnecessary hyphen from "Frequently-anticipated questions". Modified html.c to enlarge the fixed sizes allowed for personal names in the FAQ HTML format. Modified xml.c to cope properly with UTF-8 characters at the end of an element's text. Under Microsoft Windows XP these were truncated due to unexpected behavior of the isspace() function on that system. Thanks to Aleta Vienneau for information leading to this fix. Modified build process for Microsoft Windows to use the MinGW compiler, creating Makefile.mgw for this purpose. (mp 2.9.10) (mq 2.6.23) (xtme 2.7.19) (Tkme 2.9.24) 20091102 Modified parse_name() in html.c to detect the situation where there are multiple commas or an "and" in the name given, and consider the value as a non-parseable name (that is, one that cannot be re-expressed as "last, first middle". (mp 2.9.11) 20091211 Modified check_scalar_children() in syntax.c so that valid values of Clock_Time_Drift are permitted (needed a negative sign in the first clause of the if statement). Thanks to Rudi Gens for pointing this problem out. Modified write_xml_text() in xml.c so that it does not translate single quote into the entity ' and double quote into the entity " I believe these translations are not necessary. Thanks to Dougl Dale-Johnson for pointing this problem out. (mp 2.9.12) 20100211 Modified write_contact() in html.c to replace "c/o" with "Attn:" when there is a Contact_Person within Contact_Organization_Primary. (mp 2.9.13) 20100426 Modified text.c and xml.c to disregard a UTF-8 byte order mark if one is present at the beginning of the text or XML input file. (mp 2.9.13) 20100819 Modified element_start() in xml.c, adjusting the line number to account for the entity declarations we may have parsed in decode_xml() before we began parsing the user's actual XML document. Without this adjustment, line numbers are 66 more than they should be. Thanks to Stuart Giles (USGS) for pointing this problem out. (mp 2.9.14) 20110923 Modified dif.c to properly account for the possibility that, with an empty Fees element, the corresponding text pointer might be NULL; such is the case when the input file is XML. This caused a seg fault in the online validator for some input files. (mp 2.9.15) 20120404 Significant changes to the error reporting system for mp, affecting numerous source files. Now if the error file name ends with .xml, the errors will be written as XML, including, where possible, an xpath description of the location of the element closest to the problem. In addition, within html.c, trailing space is trimmed from name components that might be rearranged in order to present the citation as a typical bibliographic reference entry. (mp 2.9.16) 20121003 Modified StartElement() in xml.c to correctly handle the case in which the first element encountered in the metadata record is an unrecognized extension. (mp 2.9.17) 20121031 Modified check_Metadata() in syntax.c to generate a warning if there is no Data_Quality_Information, Spatial_Data_Organization_Information, Spatial_Reference_Information, Entity_and_Attribute_Information, or Distribution_Information. (mp 2.9.18) 20121108 Modified allow() and check_extension() in syntax.c to include in the variable message only the first 512 characters of the textual value on the given input line if that value would be longer than 512 characters; this prevents a buffer overflow when reporting errors. (mp 2.9.19) 20121207 Added config option "head" within output:html; if specified, the contents of that config file element will be included verbatim at the end of the head element of the HTML output. Could be used for any static <meta>, <style>, <link>, or <script> information. (mp 2.9.20) 20121211 Modified parse_text() in text.c so that, if character encoding has not been explicitly declared in a config file, the text is scanned for conformance with UTF-8, and if the text conforms to UTF-8, the encoding is considered to be UTF-8 in further processing. This allows people to process text files that are, in fact, UTF-8 without having to use a config file to say so. (mp 2.9.21) (xtme 2.7.20) (tkme 2.9.21) (mq 2.6.24) 20130325 Modified write_text() and write_text_item() in text.c so that they no longer translate from UTF-8 to ISO-8859-1. This change simply reflects my increasing comfort with UTF-8. (mp 2.9.22) (xtme 2.7.21) (tkme 2.9.22) (mq 2.6.25) 20130618 Modified keyword.h, keyword.c, and syntax.c to accommodate the metadata extensions for lidar data published in USGS TM11-B4. That document does not include long element names, so I made up long element names to match the tags and definitions given in appendix 5 of the report. (mp 2.9.23) 20130701 Modified element_start() and element_end() in xml.c to consistently calculate the current line number in XML input files. Thanks to Aleta Vienneau for pointing out this problem. (mp 2.9.24) 20130912 Modified read_text() in text.c as well as utf8_of() in encoding.c and cns.c so that after the text is read, a test is carried out to determine whether the text is likely UTF-16, and if so, the UTF-16 is converted to UTF-8 for further processing. The test in this case is to determine whether the text contains any zero bytes; if it does, the text is assumed to be UTF-16. This is because I occasionally see problems in which people using Microsoft Windows have inadvertently converted their text files to UTF-16, which is really a binary format for unicode characters. This will certainly scramble any input files that are binary but are not UTF-16, but those files aren't going to be readable metadata anyway. (mp 2.9.25) (cns 2.8.5) (xtme 2.7.22) (tkme 2.9.28) (mq 2.6.26) 20130924 Modified translated() in html.c so that, in the FAQ format, URLs are not enclosed in angle brackets unless that is how they appear in the source metadata. Thanks to Peg Shealy for pointing out this now-obsolete feature. (mp 2.9.26) Also make live links for https. 20131202 Modified write_xml_item() in xml.c to minimize the number of newlines printed when indentation of the XML output is requested using a config file. Thanks to Drew Ignizio for suggesting this revision. (mp 2.9.27) (xtme 2.7.23) (tkme 2.9.29) (mq 2.6.27) 20140124 Modified decode_text() in text.c to examine the text checking character encoding. Generates a warning if the encoding was not specified but the characters are not UTF-8, generates an error if the encoding was supposed to be UTF-8 and there are non-UTF-8 characters. Modified write_contact() in html.c to remove a stray greater-than sign. (mp 2.9.28) 20141204 Modified check_scalar_children() in syntax.c to throw a bad-value error when Transfer_Size contains anything other than a single floating-point number. I had been writing units into this value, like "52 kilobytes" in order to avoid specifying very small fractional numbers of megabytes. But the standard does say "size in megabytes" and "type: real", so I'm going to have to lose this argument. Also I cleaned up a few of the stray and unclosed <p> tags in the FAQ metadata format output using CSS. HTML output needs a thorough rewrite, but that will take more time. (mp 2.9.29) 20150309 Modified check_date() in syntax.c to be more strict about date values. Standard does specify that dates must be simple date values and cannot, for example, specify a range like 2001-2006. Previous versions of mp permitted this, but because some downstream software throws an error on this type of value, we are better off discouraging loose date values. (mp 2.9.30) 20150313 Modified keyword.h, keyword.c, ps8.c, and syntax.c to incorporate new elements for the Lidar specification version 1.1. (mp 2.9.31) 20150414 Modified keyword.h, keyword.c, ps8.c, and syntax.c to change the Lidar extension elements Lidar_Density to Lidar_Nominal_Pulse_Density and Lidar_Pulse_Spacing to Lidar_Nominal_Pulse_Spacing. Added Lidar_Aggregate_Nominal_Pulse_Spacing and Lidar_Aggregate_Nominal_Pulse_Density per USGS tm11-B4 version 1.2. Thanks to Leslie Lansbery (USGS) for pointing this out. (mp 2.9.32) 20150618 Modified actions.c to delete duplicate entries for WDetailed_Description and WOverview_Description in the remote sensing profile. Their duplication was causing the XML generator to duplicate these elements in the XML when the remote sensing profile was used. Thanks to Florence Wong for pointing this out. (mp 2.9.33) 20160408 Created html5.c from html.c and used it instead of the older file. This is a rewrite of the outline-style HTML so that it conforms to HTML version 5, and uses CSS more appropriately. Each element name is contained within a <span> tag with the class element-name and an additional class name matching the XML tag for the element. Each element value is contained within a <div> (or <span> if the content is short) with a class element-value and an additional attribute matching the XML tag of the element. A <div> with class "child" controls the indentation. The default value for blank lines is an empty div with the class "blank". Conservative default values are included for these CSS classes. Modified element_end() in xml.c to strip leading space from blank lines; this was causing mp to think that a line containing spaces was not actually blank, now it considers that line to be blank. (mp 2.9.34) (tkme 3.0.2) (xtme 2.7.24) (mq 2.6.28) 20161007 Modified write_html_faq() in html5.c to set xml_tag_of(), avoiding call to a null pointer. Thanks to Peg Shealy for pointing this problem out. (mp 2.9.35) 20161108 Modified xml.c to check for a config file entry output:xml:skip_attributes, and if it is present, omit from the output XML any attributes that may have been in the input XML. Modified config.c to recognize this directive in the config file. (mp 2.9.36) 20161202 Modified text.c to properly handle a non-existent input file. (mp 2.9.37) (tkme 3.0.3) (mq 2.6.29) 20161228 Modified actions.c to include compound Lidar extension elements. Modified check.c to include scalar Lidar elements. This change enabled mq to read a metadata record containing Lidar extensions. (mq 2.6.30) 20170104 Modified write_html_faq() in html5.c to add the default CSS to the FAQ style HTML output. This controls things like the indentation of child elements and emphasis of element names. (mp 2.9.39) 20170201 Added a function check_order() to syntax.c to report as warnings elements that are not in the order given in the Standard. Modified mp.c to call this function prior to checking the syntax. (mp 2.9.40) 20170210 In html5.c, a missing asterisk at what was supposed to be the end of a comment caused the <a> tag for ftp links to not be closed. Fixing this causes ftp urls to be rendered correctly as hypertext links. (mp 2.9.41) 20170912 Modified keyword.c to include Turkish element names provided by Zafer Defne (USGS, Woods Hole Coastal and Marine Science Center). (mp 2.9.43) 20171003 Modified syntax.c so that a warning is given when a compound element has no children and all of its children are either optional or mandatory if applicable. (mp 2.9.44) 20171005 Modified html5.c so that a right parenthesis is not considered the end of a URL. Space must be used to end a URL; a period does not signify the end of a URL. (mp 2.9.45) 20180110 Modified decode_xml() in xml.c so that a parse error indicating the XML is not well formed will be reported in the error file as well as sent to stderr before the program exits with an error code. (mp 2.9.46) 20180125 Modified write_html_faq() in html5.c so that the tables generated for Attribute_Domain_Values have bootstrap classes assigned to them as well as a CSDGM-specific class name, for custom styling. Modified output of Currentness_Reference to keep it from running into the value of Ending_Date. (mp 2.9.47) 20180309 Modified check_Lidar_Information() in syntax.c so that Lidar_Collection_Information can be repeated. This enables data producers to describe multi-instrument data collection arrangements. (mp 2.9.48) 20180607 Modified syntax.c to flag as an error any extraneous characters following real-number values such as bounding coordinates. This catches situations in which a bounding coordinate is followed by a letter like W, E, N, or S to indicate the direction. Modified xml.c to report as an unrecognized-element error any XML tags that were not recognized. Originally the filtering of unrecognized XML tags was helpful as a way to remove software-specific tags such as those used in ArcCatalog. VeeAnn Cross helpfully pointed out that when a standard element's tag was misspelled, mp was ignoring the element and its contents and labeling the event an error but not counting it among the errors in the summary report, potentially misleading users as to the seriousness of the problem. (mp 2.9.49) 20180807 Modified check_order() in syntax.c so that blank elements are ignored when evaluating element order. Thanks to Ben Cole (Maryland DNR) for noticing this problem. (mp 2.9.50) 20190206 Peter N. Schweitzer U.S. Geological Survey mailing address

Mail Stop 954 National Center U.S. Geological Survey 12201 Sunrise Valley Drive

Reston VA 20192 USA (703) 648-6533 (703) 648-6252 pschweitzer@usgs.gov Although this program has been used by the USGS, no warranty, expressed or implied, is made by the USGS or the United States Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith. Executable and source code unzip 5 https://geology.usgs.gov/tools/metadata/mp-2.9.*.zip C ANSI (1987) Source and executable code with documentation in HTML gzip -d or tar xzvfo 1.8 https://geology.usgs.gov/tools/metadata/src.tar.gz none 20190206 Peter N. Schweitzer mailing address

Mail Stop 954 National Center U.S. Geological Survey 12201 Sunrise Valley Drive

Reston VA 20192 USA (703) 648-6533 (703) 648-6252 mailto:pschweitzer@usgs.gov Content Standard for Digital Geospatial Metadata FGDC-STD-001-1998