Editorial: The scale problem

A user outside USGS asked:
The only place where the word "scale" is mentioned in the standard that I could find is in the data quality section under the list of sources of the data. For a source you can specify the source's scale denominator. However, there doesn't seem to be any place that covers the scale of the dataset itself. The scale of the source data may not be appropriate because the original data was generalized, for example. Also, the scale of the data isn't really absolute.

Reply by Peter Schweitzer on 28 July 2000:

Scale is one of the bugaboos of GI science. From a mathematical point of view one can argue that scale doesn't exist, and shouldn't be used. The problem is that people do use scale, and they find it helpful to speak of scale as a numerical quantity (although arguably they don't do arithmetic with the numbers). So people say "this is a 1:100k data set" or something like that. I think that when the FGDC standard was devised, the people involved recognized that scale isn't well defined for the digital products but that it is a recognizable characteristic of printed maps that were digitized. So scale appears in Source_Information but not in Identification_Information or Spatial_Reference_Information.

I believe the fundamental problem is actually entwined in the relationship between accuracy and precision. Without some statistical estimate of variability, accuracy and precision are hard to separate. Thinking of a geologic map written on a 1:24k, 7.5' topo quad, there are several different sources of error. The map itself is an abstraction of real topography, and the geologic features I write onto the map are positioned relative to the map's version of the area. Furthermore, I mark the map with a pencil or pen of a particular size, which usually matches the thickness of the printed lines on the map. How can the inevitable errors in position be calculated as a single number, since they have multiple causes, and those causes vary across the extent of the map?

I conclude that scale in this sense is better seen as a classification of the map data according to the general degree of detail shown, and that it is assigned by the map's author. This more general notion of scale is best written into the text of the abstract or even the title. Another place to put it is Citation:Other_Citation_Details. You could create an extension for it, but as your analysis shows, there are several different ways to think about it, a single number indicating the overall level of detail, or a range indicating that data "at other scales" might be combined with these data. Either way the quantity serves as part of an admonition by the producer that the user not mix data of different degrees of detail without being aware of those differences. I see some people putting admonitions of this sort into Use_Constraints. I prefer to see Use_Constraints used for licensing and other legal restrictions on, for example, redistribution. And I find such admonitions typically a little preachy and condescending.

The quantitative elements that specify spatial resolution aren't much help, since no two people fill them out the same way (thinking of Latitude and Longitude Resolution for geographic projections, and Abscissa and Ordinate Resolution for planar map projections). I now tell people that they should omit these elements if they have trouble filling them out.

Precision by itself doesn't answer the questions that one might pose regarding the detail of the data, and accuracy is hard to describe. The problem with real accuracy estimates is that if we knew the data were off, we'd fix them. So accuracy statements tend to only say what the producer did to try to check the data, not how far off the data are from reality. Also measures of both accuracy and precision tend to vary across the map extent, so they might be better seen as feature-level metadata (i.e. part of the data model and varying point-by-point, line-by-line, etc.)