Editorial: Role of the Clearinghouse

A user within USGS Geologic Division asked:
But I have a big question: what is the Clearinghouse? Who puts stuff there? Is the Clearinghouse the reason for metadata? Is there a rule that states that all digital datasets must be published on the Clearinghouse?

Reply by Peter Schweitzer on 6 Feb 2001:

What is the clearinghouse?

The Clearinghouse is a distributed catalog of metadata
distributed means
the information is kept on many different machines instead of gathered into one big database
catalog means
you can search it for things you're interested in, and when you find something that looks interesting you can read more about it, and will learn what it's about, where it came from, who made it (and why), and where to get it.
metadata are
documentation of geospatial data written in a consistent way.

Because geographic information can be used in a variety of ways by many different people, it's important to know more details of the data's origin, history, and characteristics than you would need for a library book. So metadata tend to be long descriptions of the data. But it's hard to read long descriptions, especially if the descriptions made by one organization look radically different from those produced by another. Most geospatial data have a lot in common; using standard structures and formats for documenting them makes it easier for people to find the characteristics of the data that will help them understand it quickly. The same structures and formats help us write software to search for key characteristics of the data and to present the documentation in a consistent manner.

The Clearinghouse is a way for USGS and other organizations to make it easier for people to find, get, and use our information effectively.

The biggest change the scientific community has undergone in the last twenty years is the dramatic improvement in the public's ability to use technology to see what we're doing and why. Only a few decades ago, ordinary people could not easily get geospatial scientific data in a form that allowed them to display, analyze, or combine those data with data from their own experience or from another organization. Now even inexpensive personal computers can carry out complicated mathematical operations and display the results quickly and clearly.

But that technological capability does not bring understanding by magic. People still need to figure out what they have, what they need, and how any data that are offered to them might help meet those needs. The most dramatic reversal of recent time is that in order to work well, regular people (including regular scientists) have to become data managers. They need to use some systems for keeping track of the data they have and the data they need, in the context of the problems they want to solve. USGS can help with that, but not if our solution is to always be different. We cannot expect people to use our information just because it's available, or because it's free, or because we work for the government.

In the early days of the web, an organization could craft a web page using any graphic design, without considering a systematic method for arranging its information. Each web site was like a jewel, cut and polished, and each jewel sparkled when a bright light shone on it. Now those web sites are like grains of sand on a vast beach. If you pick up any one and look at it under a microscope, you can see its beauty, but your understanding will never be able to encompass all of the sand grains on the beach. To be effective now and in the future, web sites must be part of a larger knowledge-organizing system, more systematic than type-in-the-box search engines, returning information that is easier to scan and evaluate than artistic web designs.

Who puts stuff on the clearinghouse?

Within USGS this varies from place to place. For the past four years I have striven to help people from the Geologic Division to create metadata, and I have built and maintained a clearinghouse node to serve that metadata to the public, <https://geo-nsdi.er.usgs.gov/>. Recently working with the other divisions on Gateway to the Earth, I have been able to explore additional ways by which people can find records on this node, including pick-by-place and pick-by-pubs-series as well as type-in-the-box search. This clearinghouse node is also searchable through the national interface, but regrettably the FGDC has not focused its attention on making that interface work well.

Is the clearinghouse the reason for metadata?

No, the users are the reason for metadata. We make metadata so that their work is easier to do, and we hope that our scientific research will inform them and will cause their decisions to be more thoughtful, the resulting actions more effective, and their work more valuable.

That said, the exercise of creating thorough metadata will often uncover problems in the data that you will want to fix. Attribute values are misspelled or out of range, longitude values sometimes have the wrong sign, and bibliographic references for sources are sometimes incorrect in published texts. These errors are often caught when you make good metadata.

Is there a rule that states that all digital datasets must be published on the Clearinghouse?

Yes, Executive Order 12906 mandates that all geospatial data be documented with FGDC metadata and that the metadata be available through the Clearinghouse. Within the USGS, Survey Manual chapter 504.1 reiterates this requirement. Within the Geologic Division, Policy 6 also reiterates this requirement.

It doesn't always work. Our publications people are given conflicting instructions (do it cheap, get it out the door vs. do it right), and I fear that despite my entreaties the managers of those groups and of their putative clients, the Team Chief Scientists, do not value consistency in structure and form for geospatial data. The best work, as you might expect, is done by people who care. I do too, so I often find myself making significant revisions to metadata that people send me or that appear on our web sites. For that reason the metadata on geo-nsdi will often be different than the metadata that are provided on other web sites.