Public Data Is More Important Than Ever–And Now It’s Easier To Find

The data mining start-up Enigma is launching a redesigned site that makes vast troves of open data easier to search and analyze.

Public data, in theory, is meant to be accessible to everyone. But in practice, even finding it can be near impossible, to say nothing of figuring out what to do with it once you do. Government data websites are often clunky and outdated, and some data is still trapped on physical media–like CDs or individual hard drives.


Tens of thousands of these CDs and hard drives, full of data on topics from Arkansas amusement parks to fire incident reporting, have arrived at the doorstep of the New York-based start-up Enigma over the past four years. The company has obtained thousands upon thousands more datasets by way of Freedom of Information Act (FOIA) requests. Enigma specializes in open data: gathering it, curating it, and analyzing it for insights into a client’s industry, for example, or for public service initiatives.

Enigma also shares its 100,000 datasets with the world through an online platform called Public—the broadest collection of public data that is open and searchable by everyone. Public has been around since Enigma launched in 2013, but today the company is introducing a redesigned version of the site that’s fresher and more user-friendly, with easier navigation and additional features that allow users to drill further down into the data.

[Image: courtesy Enigma]

But while the first iteration of Public was mostly concerned with making Enigma’s enormous trove of data—which it was already gathering and reformating for client work—accessible to the public, the new site focuses more on linking that data in new ways. For journalists, researchers, and data scientists, the tool will offer more sophisticated ways of making sense of the data that they have access to through Enigma.

When Hicham Oudghiri and Marc DaCosta founded Enigma four years ago, their focus was on gathering raw data in one place in order to mine it for insights into things like industry, policy, culture and social behavior. They found that open data held commercial appeal for large corporations that needed help analyzing business risks or industry shifts, as well as public service value. For example, using its trove of open data Enigma can help a major bank find solutions for investigating financial crimes based on recent criminal data. On the pro bono side, the company got considerable attention in 2015 for an effort in conjunction with city officials in New Orleans that targeted homes that are most at risk to fire and helped get smoke detectors installed there. Another recent public service project involves visualizing data that tracks U.S. sanctions. 

[Screenshot: Enigma]

As it worked both on commercial and public projects, the company found its mission starting to shift. “Our main thesis when we launched was that there is a tremendous disconnect between the data that is published and the people who can connect the dots,” says Oudghiri. “We thought it was mostly a problem of access. Now we’ve learned that it’s important to take a stab at interpreting, analyzing, and linking these datasets together so that they can solve problems.”


The shift in focus from merely gathering data to building tools that help people interpret and make use of that data also helped inform the redesign of Public. Before the redesign, Public was designed for users to search keywords, names, or entities and then access the raw data in various contexts. A search for Google, for instance, might pull up lobbying records or private investments. Searching Google’s name in Public would also bring up every instance that Google was mentioned in other datasets, so that users could find, for example, where exactly Google is buying its electricity from in Department of Energy datasets. Enigma’s tools allowed users to examine the data at such granular level as to find out how many megawatts are delivered to which Google facilities, and even the details of Google’s contracts and future terms with these companies–allowing journalists or data scientists to see, through data, if Google is, in fact, following through on its promise to be more sustainable.

[Screenshot: Enigma]

The new Public can do all of that, too, but the redesign focuses much more heavily on giving users tools to make those types of connections and interpretations of the data quickly and easily. It does so mainly by baking in metadata more directly. If you click on a sidebar in a set, it will show all of the metadata and what other datasets it’s connected to. For example, bringing up a dataset on Google’s lobbying records will also bring up a sidebar with other collections of which that data is a part—i.e., U.S. Federal Government and Lobbyist Disclosures. Clicking on those collections could bring up what other companies were lobbying for in the same year, illuminating connections in company interests across industries and possible connections to policies enacted in the same year. The new site also allows users to curate collections of datasets and keep building upon them and revisiting them for new projects. And Public offers access to datasets robust enough to be used to train models in machine learning.

Importantly, the new homepage also curates featured datasets and collections to enforce a sense of discoverability. For example, an Enigma-curated collection of U.S. sanctions data from the U.S. Treasury Department’s Office of Foreign Assets Control (OFAC) shows data on the restrictions on entities or individuals that American companies can and can’t do business with in an effort to achieve specific national security or foreign policy objectives. A new round of sanctions against Russia have been in the news lately as an effort by President Trump to loosen restrictions on blacklisted businesses and individuals in Russia was overruled by the Senate last week. Enigma’s curated data selection on U.S. sanctions could help journalists contextualize recent events with data that shows changes in sanctions lists over time by presidential administration, for instance–or they could compare the U.S. sanctions list to the European Union’s.

Enigma’s collections range far and wide, with other datasets pulled from cultural institutions, museums, and more. Users can explore the collection of datasets from the Tate Museum in London, one of a handful of museums worldwide to provide online access to its entire collection. These datasets could, for example, help users investigate the representation of minority artists in the Tate’s collection. Opening up the Tate data collection also points users toward datasets from New York’s Cooper Hewitt design museum or Museum of Modern Art (MoMA), leading to comparisons between museums–or just introducing users to other collections.

Eve Ahearn, a product manager for Enigma, says that one of the biggest pieces of feedback from Public’s current users was that they wanted the site to introduce them to new datasets that they might not think to search for. “If users find the right dataset, it can help people spark ideas and begin the process of coming up with a project,” she says. “We are constantly coming across new sources of data and people need to be made aware of them.”

