New tools needed to sort the flood of government data

The federal government is releasing mountains of electronic data — everything from flight schedules to toxic chemical inventories. But it’s all just noise unless the public has ways to wrestle the information into shape. The tools are coming, slowly but surely. – dr

New York Times

In a blog post on Wednesday, Clay Johnson, director of Sunlight Labs, discussed the “data flood” coming out of Washington and the need for more applications to deal with the new era of government information.

Sunlight Labs is part of the Sunlight Foundation, a nonprofit organization with a goal of digitizing government data and building Web sites to help make the current data deluge more manageable. The foundation hopes to help solve some of these data overload problems with new tools, including a Web site they are currently testing: nationaldatacatalog.com. It will organize government data sets and try to give more context to this information.

As the government continues to open more information, tools like these will become invaluable.

As of February 2009, the federal government offers 1,087 different data feeds. These feeds range from Environmental Protection Agency inventory of toxic chemicals by location to Federal Aviation Administration flight schedule information. The federal government has also released 106,000 “shape files” on data.gov, which can be used to create maps using granular data.

Mr. Johnson explained that all of this data is a great sign for open government, but “a new problem is starting to arise — classifying and organizing this information.”

“Recently, we’ve seen some amazing tools spring up around data.gov, and even some companies,” he said. “One company uses the F.A.A. data and weather information to try to predict flight delays.”

More of the data coming out each week could be used to help tell any number of stories. For example, the State Department recently released a copious amount of fascinating information. One data set of the political conflicts on the African continent could be used to map the boundaries of those conflicts rather than the geographical borders.

Another interesting data set comes from the Department of Interior with information on fires across the United States started by humans from 1960 to 2008. Mr. Johnson explained, “You could start to see if fires are happening less frequently now than 40 years ago, and if they being created by humans.”

But it is still going to take more tools and Web sites to sift through this information and find the important pieces that will help shed light on this information flood.

“As we start pushing these data sets out they add a crack in a huge dam, and as more students, journalists and programmers have the opportunity to look through this information, we can learn more about our government,” Mr. Johnson said. 
“Eventually it’s going to explode, but it’s going to take a lot of work to organize it all.”