Site Home

phpBB Services

Digests

Smartfeed

AJAX Shoutbox with Sounds

Other Software

Support Forums New!


Using the USGS Unit Values RDB Class


Table of Contents


 

A brief introduction

Looking for an easy way to acquire real-time streamflow data collected by the U.S. Geological Survey? The USGS Unit Values RDB PHP Class, which I wrote, may be the way to go. It converts near real-time streamflow and other data collected from the nation's streams, lakes and waterways, which are served by the USGS in old fashioned tab-delimited files, into more user-friendly forms like XML and Javascript Object Notation (JSON). A programmer of modest skills can then transform this data into myriad uses. Check out some of the examples below. If it interests you, download the USGS Unit Values RDB class and get started!

Disclaimer: This class is not a product of the U.S. Geological Survey. This class is in the public domain so it can be used for any use whatsoever, however it comes with no assurance of reliability or correctness and no warranty is expressed or implied.

What are unit values?

Unit values are time-series measurements or time-series calculations. The U.S. Geological Survey has many automated gages deployed in the field that regularly measure the nation's water (such as streamflow conditions) and in some cases meteorlogical data too. Each measurement is a unit value, i.e. a regular and periodic measurement of some type associated with a time and location (site). USGS unit values are popular because recent streamflow conditions measuring important information like water levels in lakes, or flow rates of streams and rivers. Unit values are also referred to as "instantaneous values". Thousands of regularly updated RDB files are hosted on the USGS Water Data for the Nation (NWISWeb) site. These files are of great interest and are frequently downloaded by the public.

What is an RDB file?

The U.S. Geological Survey publishes many datasets in a RDB format. RDB is a tab delimited file of data with structured comments embedded in them which, when parsed, provide valuable metadata. It also contains metadata in some non-comment lines. Here is an example of a simple RDB file.

# -------------------------------------------
# Documentation lines. These describe and
# identify the rdb file contents.
# -------------------------------------------
NAME   COUNT  TYP  AMT   OTHER   RIGHT
6s     5n     3s   5n    8s      8s
Bill   44     A    133   Another This
John   44          23    One     Is
Gary   77          77    Here    On
Mar    77     B    244   And     The
Greg   77     D    1111  So      Right

As you can see, comment lines begin with a # followed by a space. Comments are followed by a header line where the columns of data are named. The header line is followed by column type/length information, typically indicating the maximum size in characters that the column will contain followed by "s" or "n" indicating the field contains either a string or a number. With this basic information it is fairly straightforward for a programmer to write a program to read the data then use it for another purpose.

What is the PHP RDB Unit Values Class?

The PHP RDB Unit Values Class is a PHP class library designed to make it easier to consume unit value RDB files provided by the USGS Water Data for the Nation (NWISWeb) site. On its most basic level it transforms the data and metadata in these files into more useful, 21st century formats. Specifically it can read a unit values RDB file and provide its output in either Extensible Markup Language (XML) or in Javascript Object Notation (JSON). The class also provides filtering and sorting options to make it easier to grab data of interest. For example, you may be only interested in the most recent streamflow measurement. Through sorting and filtering, you can retrieve only the most recent unit value instead of all the unit values for a site for a particular day.

Essentially, the PHP Unit Values RDB class acts as a proxy consuming relatively unfriendly unit value RDB files and transforming them into friendly XML or JSON.

Examples

To get your feet wet, try these examples. I will explain how they work in a bit.

How the class works

As you can infer, the class requires PHP. PHP is a popular and robust scripting language typically found on web servers. If you have a web server, it likely already has PHP on it. Most people who use the class will want to re-serve the USGS water data on their own web site, perhaps mixing it with data of their own or create graphics from the data. However, PHP can also be installed on most desktop computers. If you are developing an application, you may prefer to install PHP on your desktop computer and "move it to production" on a real web server when it is ready to deploy.

PHP is an easy language to learn and fortunately you do not need to be a PHP expert to write programs with the class. If you are comfortable with Java, C, Python or many other block structured languages, PHP will seem easy. Unfortunately to use the class you will have to write a little PHP. This is because the PHP class is just a class, not an object. A class is like a blueprint to a car, rather than the actual car itself. So you will need a PHP program that creates a PHP RDB Unit Values object. It's pretty simple, but of course to be useful your program will probably need to read some input and write some output so you get a desired set of information. Here is a short program that demonstrates a simple use of the class. (This is the actual server side script called rdbajax.php used in the second example above.)

<?php

	include 'rdb.php';					// This indicates where to find the class to the program.
	
	// Get the site number from the client
	$my_site = htmlspecialchars($_GET['site']);
	
	// Construct the URL
	$my_url = 'http://waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=1&site_no=' . $my_site;

	$my_rdb = new rdb($my_url);			// Fetch this RDB file and load it into an object
	
	$my_rdb->show_columns = array(2,3);	// Only interested in columns 3 and 4 of the output. Arrays start with 0 in PHP
	$my_rdb->show_order = array(SORT_NUMBER, SORT_NUMBER);	// Sort column 3 as a number, then sort by column 4 as a number
	
	// Finally, output as JSON to the client
	echo $my_rdb->outputJSON(TRUE, TRUE, FALSE, TRUE); 			// Output the data as JSON

?>

Download and Installation

  1. If you don't have PHP on your development machine, download it and install it from php.net. You might also want to read a PHP tutorial.
  2. Download rdb.zip.
  3. Expand the archive using tools like WinZip (Windows) or gunzip (Unix and Linux). It is faster if you do this on your web server, otherwise after expanding the files locally you will have to move them individually to your web server. You will need full directory permissions wherever the files are stored.
  4. When you unzip the rdb.zip file, it should create a cache directory where it was unzipped. Create this directory on your web server if it does not exist and give it public write permissions (777). It is assumed that the cache directory will exist in the same directory where rdb.php exists. This can be changed if needed by editing rdb.php. The cache directory, as you might expect, holds copies of recently retrieved RDB files. Redundantly fetching the same RDB files from USGS servers is not a good idea, and can get your site blacklisted. Since most sites are updated no more than hourly, by default the RDB file will not be refetched unless the cache copy is more than an hour old. Retrieving data from a cached RDB file is also blazingly fast.
  5. The only file you absolutely need is rdb.php. The other files contain the examples shown on this site. To try it out, run the rdb2xml.php program or the flot-test.html program. (You will want to edit like 397 of flot-test.html to make it work on your server.) Assuming you have a recent version of PHP and current edition of a popular browser (which supports AJAX), these programs should run identically as they do on this site. You will need to use a URL pointing to your web server, for example: http://example.com/rdb2xml.php.

Using the class

Setting the rdb.php class defaults

You may want to edit lines 52-54 in rdb.php as you may not like the defaults.

	private $cache_dir = './cache';	// Location of the cache directory. If it does not exist the class will attempt to create it if caching is desired
private $cache_expir_days = 7; // Number of days to leave a cache file in the cache directory before deleting
private $cache_time = 3600; // Number of seconds to use cached RDB file before refetching

Instantiating the rdb.php class

The example above shows how to instantiate (create an object from) the rdb class. The mechanics are simple:

  1. Use an include statement that tells your program where to find the rdb.php class.

    include 'rdb.php'; // This indicates where to find the class to the program

  2. Instantiate the object holding the data with the new keyword, passing as an argument to the class a URL that returns a USGS Water Data for the Nation Unit Values RDB file. Note that it is possible to have one RDB file return data for more than one site or parameter. See this document.
	// Construct the URL
	$my_url = 'http://waterdata.usgs.gov/nwis/uv?cb_00060=on&format=rdb&period=1&site_no=' . $my_site; 
	$my_rdb = new rdb($my_url);			// Fetch this RDB file and load it into an object

When you instantiate, the RDB file is fetched from the USGS web server and parsed into a series of PHP arrays and variables. It may take a couple seconds to acquire and parse the file. In most instances this is quite fast.

  1. Immediately after instantiating the object, make sure it loaded correctly. This is done by checking the valid property of the object. It should be true. If not an error has occurred and your program should take appropriate action.
	if (!$my_rdb->valid)				// Immediately after fetching, always check your object's valid property. If FALSE, the RDB file is not valid, empty or some error occurred
{
echo 'Error: RDB file ' . $my_url . ' does not exist or is formatted incorrectly';
exit;
}

Even if an error occurred, there is an error structure in the object that provides more detail on the nature of the error. You can access these programatically if you choose. For example, when output as XML you will see this structure:

<error_info>
<error/>
<error_explanation/>
</error_info>

In this case empty tags indicate no error occurred.

With JSON you will see a similar structure, which can be easily queried with Javascript.

	"error_info" :  	
		{ 		"error" : 0,
		  		"error_explanation" : null
		},

Getting XML output

Once the class is instantiated, to get XML, call the outputXML method of the object you created. XML can be captured to a variable if desired but is typically just echoed.

 	echo $my_rdb->outputXML(); 			// Output the data as XML

If you look at the class you will see there are a variety of switches you can use with the function. The function also indicates the defaults used for each parameter.

 	public function outputXML ($suppress_headers=FALSE, $show_sites=TRUE, $iso_dates=TRUE, $show_dds=TRUE, $column_formats=TRUE, $compact=TRUE)

Comments in the class indicate what these parameters do:

		// Parameters:
// $suppress_headers == TRUE, then XML headers will be suppressed. Use if you intend to capture output to a variable // $show_sites == TRUE, site description information will appear in the XML tree if it is provided in the RDB File. This can be helpful
// if site description is desired.
// $iso_dates == TRUE then any USGS NWIS date/time strings in a YYYY-MM-DD HH:MM format are converted into ISO-8601 format YYYY-MM-DDTHH:MM
// $show_dds == TRUE, data descriptors and parameters information will appear in the XML tree if they were provided in the RDB file.
// $column_formats == TRUE, column format information will appear in the XML tree.
// $compact == TRUE, newlines and tabs are removed from output except from comment lines (needed for visibility of legal information)

Getting JSON output

Once the class is instantiated, the outputJSON method will render validated JSON. JSON is normally simply echoed, but it can be captured to a variable. If you capture it to a variable but do not plan to echo the data out via HTTP, you should set the $suppress_headers parameter to FALSE.

Javascript works best with timeseries data when time is provided as a Javascript timestamp. By default the class will attempt to provide times as Javascript timestamps, which greatly eases otherwise onorous programming chores like plotting time series values on a graph.

	echo $my_rdb->outputJSON();			// Output the data as XML

If you look at the class you will see there are a variety of switches you can use with the function. The function also indicates the defaults used for each parameter.

	public function outputJSON ($js_timestamps=TRUE, $use_arrays=TRUE, $suppress_headers=FALSE, $pretty=FALSE, $show_sites=TRUE, $show_dds=TRUE, $column_formats=TRUE)

Comments in the class indicate what these parameters do:

		// Parameters:
// $js_timestamps == TRUE, if the value appears to be a NWIS date in the format YYYY-MM-DD HH:SS, the date/time
// will be converted into a Javascript timestamp, which is a UNIX timestamp expressed in milliseconds. This eases plotting
// timeseries values in packages like flot.
// $use_arrays == TRUE, if this is true, records are written as arrays instead of object. This facilitates plotting data.
// $suppress_headers == TRUE, if true headers will not be output. It is assumed instead the JSON will be read by the calling // program into a variable for further processing.
// $pretty == TRUE, then tabs and newlines will be inserted so that it shows on the screen in a logically indented manner.
// $show_sites == TRUE, site description information will appear in the structure if it exists in the RDB file. This can be helpful if
// site description is desired.
// $show_dds == TRUE, data descriptors and parameters information will appear in the structure if they were provided in the RDB.
// $column_formats == TRUE, column format information will appear in the structure.

Sorting

You can use the rdb2xml.php program to get a raw view of the data you want as either XML or JSON. With no sorting option, data appears as it does in the RDB file, which is typically from the oldest to latest values.

When XML is output, the data itself can be found inside of the <records> tag. For XML, if you examine the structure you will see a <record> tag and inside each <record> tag will be a set of fields describing the data for that record. The field names will vary based on what is fetched. (If the RDB column name begins with a number, it is prepended with "field_" when XML output is desired. This is because XML tags cannot start with a number.) For example:

<record>
   <agency_cd>USGS</agency_cd>
   <site_no>01646500</site_no>
   <datetime>2009-03-16T00:00</datetime>
   <field_01_00060>3510</field_01_00060>
   <field_01_00060_cd/>
</record>

When JSON is output, look for an array called "records". For JSON, generally each element of the records array corresponds to a data record. For example:

 {
 	"agency_cd" : "USGS", 
 	"site_no" : "01646500", 
 	"datetime" : 1237161600000, 
 	"01_00060" : 3510, 
 	"01_00060_cd" : null 
 },

Whether XML or JSON is output the number of fields within the record is the same for a given RDB file.

To have the RDB class sort the data, you must first inspect a typical record and note the positions in the structure of the field of interest. In the case of the example above, "datetime" is an obvious candidate for sorting. Rather than see date times ordered from earliest to latest, you may want to see them from latest to earliest. To sort properly you have to understand a few things about PHP and the RDB class:

To sort the data, the RDB class has three properties that should be properly set . The main thing is to be consistent. If you are sorting on two columns, you must specify an array of two columns for the sort_columns property, and an array of two columns for the sort_order and sort_type properties, as explained in the RDB class.

	public $sort_columns = NULL;	// An array with the column sort sequence, ex: array(2,3) means sort by column 3 first, then by column 4 (0 is first column) 
public $sort_order = NULL; // An array with the column sort order sequence, ex: array(SORT_ASC, SORT_DESC) means sort by column 3 in ascending sequence, // then sort column 4 in descending sequence. (0 is first column). Number of elements in array must be consistent with // $sort_columns. For more information, see http://www.php.net/sort. Please use the constants in the array, ex:
// $my_rdb->sort_order = array(SORT_ASC, SORT_DESC);
public $sort_type = NULL; // An array with the column sort type sequence, ex: array(SORT_STRING, SORT_NUMBER) means sort by column 3 as a string,
// then sort column 4 as a number. (0 is first column). Number of elements in array must be consistent with $sort_columns.
// For a full list of allowed values, see http://www.php.net/sort. Please use the constants in the array, ex:
// $my_rdb->sort_type = array(SORT_STRING, SORT_NUMBER);

Using the example above, after instantiation, to sort the data so that the most recent measurements appear first, you would need code similar to this:

	$my_rdb->sort_columns = array(2);	// Actually the third field, where 0 is the first field
$my_rdb->sort_order = array(SORT_DESC); // SORT_DESC is the PHP constant used with sort $my_rdb->sort_type = array(SORT_STRING); // SORT_STRING is used because the columns was identified as a string in the RDB file, and times generally appear as strings like 2009-03-16 00:00

Once the sort criteria is set up, simply call the sort method of your object to sort the data. The method will return FALSE if your sort parameters were incorrect. Use logic like:

	if (!$my_rdb->sort())
{
echo 'Error: Bad sorting parameters were specified.';
exit;
}

Filtering

As a general practice you want to sort before you filter, unless you like the default order in the RDB file. If you do not filter, all unit value in the RDB files will be provided. Filtering is primitive and if specified only the first X values of the result set will be output when the outputXML or outputJSON functions are called.

To filter, set your object's limit property to a positive whole number. Example:

 	$my_rdb->limit = 5; // Limit to first five records

Writing AJAX Applications

The flot-test.html program that is included in the archive demonstrates one possible way of writing a dynamic application. There are many approaches and Javascript libraries that facilitate AJAX so choose one that works for you.

In general, if your Javascript makes a request to the server then you need to write a companion PHP program that provides a response that your Javascript can easily interpret. Typically, JSON is the most Javascript programmer-friendly format.

If you examine the source code for flot-test.html, you will see this key bit of logic that actually sends the AJAX request:

	url = encodeURI('http://' + document.domain + '/WebServices/rdbajax.php?&site=' + document.getElementById("site").value);
xmlHttp.open("GET",url,true);
xmlHttp.send(null);

To use in your environment, you would probably need to tweak the code a bit to remove or change the /WebServices/ path.

In this case, the program rdbajax.php responds to requests and sends back data in a JSON format. Javascript listens for a request in the background and when it receives it converts the raw JSON, which is just text, into a set of structures it can use. In the example, the json_parse library is used, which is a safer way to parse raw JSON than the Javascript eval() function.

 	var response = json_parse(xmlHttp.responseText);

In the example, the Google Flot library is used to render a hydrograph. If you are interested in creating your own graphics, there are numerous graphics libraries out there so choose one that suits your needs. By examining the Javascript, you can see how the JSON structure is parsed, in this case to plot points on a graph using the Flot library:

	// Populate a text area showing the raw result data
for (x in response.records)
{
d1.push(response.records[x]);
}

Warning on the provisional status of Unit Value Data

Virtually all unit value data on the USGS Water Data for the Nation (NWISWeb) site is considered provisional, which means the data may not be accurate. You are strongly encouraged to put a notice to that effect similar to what appears on the USGS NWISWeb site for provisional data and which appears in the comments:

The data you have obtained from this automated U.S. Geological Survey database have not received Director's approval and as such are provisional and subject to revision. The data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. Additional info: http://waterdata.usgs.gov/nwis/help/?provisional.

Crediting the USGS

As described here, the data provided by the USGS is in the public domain. Crediting the USGS is encouraged, provided it meets appropriate guidelines. Beware: the USGS logo is a tradmarked symbol and can only be used if it complies with USGS Visual Identity guidelines.

Licensing

This software is in the public domain, so feel free to use it in any way you want. However, no warranty is expressed or implied on its correctness or suitability for use. Use at your own risk.

Support and Suggestions

Use of this class does not imply that support will be provided. However, feel free to leave a post in my support forum with questions or suggestions.