How An MIT Data Viz Guru Is Exposing Cryptic Government Data

Data USA proves that census data can be beautiful, while simultaneously pulling it out of Google’s blind spot: the deep web.

The U.S. government has access to the kind of data that makes guys like Cesar Hidalgo drool. The director of MIT Media Lab’s MacroConnections Group, Hidalgo’s passion is turning data into stories. Having previously helped visualize the biggest industries in Brazil, and the world’s international trade data, Hidalgo knows there are enough stories buried in the U.S. census data alone to keep him busy for the rest of his life.


But the data are hard to access, and even harder to Google. At best, they are only discoverable if you already know where they live; at worst, you have to download them through massive zip files. So Hidalgo created Data USA, a new resource that not only beautifully visualizes seven different data sets from the U.S. Census Bureau, Chamber of Commerce, and more, but also explicitly makes that data indexable by Google.

From a user perspective, Data USA is an easy-to-search resource that aims to surface the stories in public data. Search for your hometown, and Data USA will present you with everything it knows about where you were born in a series of dynamically generated charts and graphs. That doesn’t just include obvious metrics like population, wage distribution, or racial breakdown, but also more obscure ones such as your hometown’s most specialized jobs, the most common non-English languages, and what industries are thriving (or declining) there. In addition to searching by city, you can also view data visualizations across whole industries, occupations, and educational backgrounds. Hovering on any Data USA chart or graph breaks down what is being visualized, while readable modern fonts and colorful Creative Commons images make every page beautiful.

Data USA already features a number of great stories visualized this way, from an exploration of the Gender Wage Gap in Connecticut to who works in the arts and entertainment industry. But making U.S. census data beautiful was only part of what Hidalgo set out to do with Data USA. The other part was pulling government data out of the deep web–that dimly lit basement of the Internet that isn’t searchable by Google’s web crawlers.

Public data often falls into the deep web because it doesn’t encourage linking. “Many government sites require you to navigate through an excessive amount of UI elements, such as drop-downs and search entry fields, before you can access the data you want,” explains Hidalgo. “The analogy we’re using is they’re like shopping in a supermarket where all the boxes are the same, and you need to open every box to see what’s inside.” These sites aren’t accessible to web crawlers, which search engines like Google depend on to index the Internet. Continuing Hidalgo’s supermarket analogy, there’s no way on these sites for Google to just look at a data “box” and know what’s inside.

That’s why Data USA has been designed from the ground up to be Google-friendly. From individual charts to hometown profiles, everything is fully linkable. In fact, you can embed Data USA charts and graphs in blog posts, in news stories, or on Facebook–functionality that Hidalgo hopes more journalists (both citizen and professional) take advantage of. All of the charts and graphs are also backed up by text, which allows Google to index the data in a visualization, not just an image file. Hidalgo says he wants Data USA to be so open, you never have to even go to the website to access the data it contains, because Google already knows everything that’s there, and can answer a question about census data as easily as it does local movie showtimes.

Hidalgo says he hopes that the example of Data USA will ultimately inspire the U.S. government and other agencies to rethink the way they are presenting their data on the Internet. It’s time, he says, for government agencies like the U.S. Census Bureau to learn the same lessons about the Internet that publishers have: You can’t just “throw up” something on the web and expect people to go to your site. Traffic needs to come to it organically, through search or social media. “It’s time for these groups to take responsibility to making their data open,” Hidalgo says. It’s time for them to label the boxes in their supermarket.


Cover Photo: Biddiboo/Getty Images