Rensselaer team leads in making flood of government data intelligible

The Obama administration has begun to release raw government data in an unprecedented transparency initiative, but without context, the public cannot make sense of the data. Now a team from Rensselaer Polytechnic Institute has stepped up to make the data intelligible through sorting, combining and presenting the data in visual form. -db

The New York Times
November 15, 2010
By Steve Lohr

In May 2009, the Obama administration started putting raw government data on the Web. It started with 47 data sets. Today, there are more than 270,000 government data sets, spanning every imaginable category from public health to foreign aid.

This was all done in the name of openness, transparency and allowing citizens to tinker with the government’s data assets for themselves in the pursuit of knowledge, insights and even profit.

In a recent interview, Vivek Kundra, chief information officer of the United States, explained, “We want people to benefit from the data we’re democratizing.” Over the next few years, he predicted, a “new industry in data curation will be born.”

The White House official pointed to James Hendler, a computer scientist, and his team at Rensselaer Polytechnic Institute as a leading group of data curators.

At the International Open Government Data Conference, beginning on Monday in Washington, Mr. Hendler and his R.P.I. colleagues will be showing off some of their work. The three-day gathering is being sponsored by the United States General Services Administration.

All the data the American government is pouring onto the Web will be useful only if people can make sense of it. What’s needed are tools for sorting, combining and presenting raw data sets in visual form. That’s the challenge being tackled by Mr. Hendler and his team.

“There’s an unfathomable amount of data out there,” Mr. Hendler said in an interview on Friday afternoon. “We’re trying to help people use it and understand it.”

Mr. Hendler, joined by two faculty colleagues, Deborah McGuinness and Peter Fox, and a handful of undergraduate and graduate students, have made more than 50 demonstration data projects in recent months. The data demos, tools and tutorials are on R.P.I.’s Data-Gov Web site.

The goal, Mr. Hendler said, is to make visually appealing interactive data sites as easy to build as Web sites have become — something that ordinary people with some computing smarts can do instead of a task reserved for the programming elite.

“We’re not there yet, but that is only a year or two away,” Mr. Hendler said.

The R.P.I. data demonstrations are typically mash-ups of data sets, often presented as maps of the United States. One on clean air levels combines ozone readings from sensors with other data. Local areas with the same reading appear as circles. Click on a circle and the name of the location and parts-per-billion measurements appear. The American ozone map can also be sorted by categories like terrain and land use.

Another demonstration project maps smoking by state, combining several data sets, looking at prices and anti-smoking policies. The map can be sorted by the cost of a pack of cigarettes, state taxes per pack, and percentage of smoke-free bars, restaurants and workplaces.

A few sorts and moving the cursor over the national map quickly present these facts: The state with the highest price per pack (in 2007, the most recent data) was New Jersey, $6.34. The cheapest was South Carolina, $3.31. And New Jersey had the highest per-pack tax, $2.71, and South Carolina the lowest, 7 cents.

Smoking rates are somewhat correlated with price, but more closely correlated by high percentages of smoke-free environments, by local government regulation. Kentucky had the highest smoking rate, at 28.3 percent, while Utah was the lowest, at 11.7 percent.

Yet cigarettes are relatively inexpensive in Utah, at $3.98 a pack. Presumably, the low smoking rate in Utah is explained less by price and policy than by religion — all those non-smoking Mormons.

Copyright 2010 The New York Times Company   FAC Content Use Policy