Formal metadata: information and software
Formal metadata information and software
Reply by Peter Schweitzer on 14 March 2001:
The answer is probably "not in the way you might hope". However, the real answer depends on what you really need to accomplish. What cns does is read a text file looking for metadata element names; when it finds one at the beginning of a line, it tries to incorporate that element into its understanding of the metadata. It builds a tree (= outline) in memory, and keeps track of what branch it's on. So when it finds an element name, it asks itself whether it is allowed to make that element a new branch from the current branch, a "sibling" of the current branch, or not. It's a little more complicated than that, but the point is that it does a limited amount of thinking, and cannot really anticipate what people have written. Its purpose is to clean up a variety of specific "mistakes" that people often make when they create metadata using word processors. It does a helpful job in those specific circumstances, but there are a lot of mistakes that might be obvious to people that it can't figure out--it can exercise only limited discretion.
The HTML files that are often amenable to processing with cns are those that were produced by mp or by something like mp. Occasionally someone will land in a job where they are asked to clean up some older metadata on the web. With a little investigation, they discover that the text files from which the HTML pages were generated (mp generates HTML) have been discarded. Consequently they are left with only the HTML. The procedure in this case is to save the HTML from a browser as text, then run cns and clean up manually what cns missed.
This situation is not common, however. I fear that your task might be more difficult. If the information in your HTML pages doesn't use the FGDC element names and is arranged quite differently than the FGDC metadata, you'll need to use another approach to convert them to FGDC structure and format. In that case here's the procedure I recommend: