DBFmeta: A tool to help document DBF files

DBFmeta is a software tool designed to facilitate the documentation of data contained in DBF files. These data files are increasingly common because they are used to store attributes of geographic features in shapefiles used and produced by ESRI products. DBFmeta comes with mp, a parser for formal metadata, and is most effectively used with Tkme, an editor for formal metadata.

Note that the program you run is called dbfmeta (all lower-case letters) but like Tkme, I refer to it using the mixed-case form DBFmeta.

Download

The best way to get DBFmeta is by downloading an appropriate package from <https://geology.usgs.gov/tools/metadata/>.
Users of Microsoft Windows can also download the executable directly: dbfmeta.exe.

Usage

Basic usage
Run DBFmeta like this:
dbfmeta myfile.dbf -o temp.met
That command will cause dbfmeta to read the existing file myfile.dbf and describe its contents as a Detailed_Description in terms compatible with the Content Standard for Digital Geospatial Metadata of the US FGDC.
Interactive mode
DBFmeta can be run in "interactive mode" in which the program asks the user for information that is not carried by the DBF format. To engage this behavior, include the command-line switch -i.
Use of non-standard elements
DBFmeta includes in its output three elements that are not part of the FGDC standard itself but are included in the ESRI profile that is used by Arc8. These elements are
  • Attribute_Type
  • Attribute_Width
  • Attribute_Precision
If you want DBFmeta to not include these elements in its output, add the command-line switch -strict.
Specifying an output file
DBFmeta will try to write its output into a disk file named dbfmeta.out. If that file already exists or if it cannot be created, you will be prompted to enter another file name. You may specify the name of the output file on the command line using the < code>-o switch.
If the file name you specify ends with .sgml or .xml, the output file will be written using XML tags. No SGML or XML declaration or processing instructions will be included, however. This output form will work as either SGML or XML.
Syntax
The full command-line syntax for DBFmeta is therefore
dbfmeta [-i] [-strict] input_file [-o output_file]
where braces indicate an option that may be omitted.

Output

What it is and what to do with it
DBFmeta creates a snippet of metadata that can be pasted into the left-side window of Tkme if Entity_and_Attribute_Information is selected there. After including the output of DBFmeta into a record in this manner, the metadata can be completed using Tkme's normal editing capability.
What you need to write in yourself
DBFmeta has no way of knowing some information, so it includes some empty elements in the output that the user must fill in:
Entity_Type_Definition
What are the things that each row of the dbf file describes?
Attribute_Definition
What do the attribute labels really mean?
Enumerated_Domain_Value_Definition
What do textual or numeric field values mean? See Metadata in Plain Language for a discussion that may help.
Attribute_Units_of_Measure
What are the units of the numerical values?
Attribute_Measurement_Resolution
What is the resolution of the numerical values?
What it doesn't include
DBFmeta omits those elements that are used to indicate the published source of a feature class, attribute definition, or category label:
  • Entity_Type_Definition_Source
  • Attribute_Definition_Source
  • Enumerated_Domain_Value_Definition_Source

Revision history

Version When Who What
1.142013-12-03PNSAdded htmlspecialchars() function to properly translate <, >, and & in XML output of values listed in Enumerated_Domain as text. Added a command-line option -udom label to enable the user to avoid listing distinct values of a field and instead use Unrepresentable_Domain for them. Thanks to Rob Norheim for suggesting these.
1.132004-04-14PNSFixed bug in which XML tag was not properly closed. Thanks to John Graves for letting me know about this problem.
1.122004-02-13PNSFix code calculating the maximum number of nonblank characters in a string field. Added code to read integer fields as long long 64-bit) signed integers. This is more complex but might be more reliable for long numerical code attributes the DBF file thinks are integer. Thanks to Hugh Phillips for pointing out the issue with long integers.
1.112003-03-12PNSModified to allow use to omit descriptions of specific fields by name.
1.102002-07-29PNSFor character fields, find the length of the longest value, treating trailing blanks as if they didn't exist. Report this to the user.
1.92002-06-14PNSFixed bug introduced by last fix; enlarge the space allocated for each string value by one to hold the terminating nul byte (D'oh!).
1.82002-06-11PNSFix code so that the single real number can occur more than once and still be considered a special value. Modified dbfopen.c to not strip space from string values by default.
1.72002-05-28PNSOpen the input file before worrying about naming the output file. Fixed bug that caused the verbose statistical report to not show intege values. Describe in an Enumerated_Domain a singleton real number. Thanks to Hugh Phillips for catching these problems.
1.62002-04-11PNSModified to output SGML or XML if the output file name looks like it has that extension.
1.52002-02-06PNSModified to open dbf file read-only and to report duplicate names to stdout rather than stderr.
1.42001-09-26PNSAdded code to report the number of blank records in a string field.
1.32001-02-27PNSRevised handling of attribute information by reading all fields at once. This enables me to check for duplicate field names.
1.22001-01-26PNSAdded code to skip attributes whose names match those that are intrinsic to Arc/Info. Fixed bug in counting unique string values.
1.12001-01-25PNSModified handling of enumerated domains to give some statistical information about the values.
1.02001-01-24PNSFirst release.

To-do list: known bugs and limitations

  1. When a field contains real numbers, the current code does not correctly discover the presence of one or two negative values and call them out as special values using an Enumerated_Domain element.

Technical contact:

    Peter N. Schweitzer
    Mail Stop 954, National Center
    U.S. Geological Survey
    Reston, VA 20192

    Tel: (703) 648-6533
    FAX: (703) 648-6252
    Email: pschweitzer@usgs.gov