Wikipedia is a compendium of information, assembled by thousands of volunteers worldwide. But who is actually writing these articles? Are people from the United States writing about the United States? Or are those articles being published in another language from halfway across the world?
Mapping Wikipedia is a project by TraceMedia in conjunction with the Oxford Internet Institute. It’s an interactive map that allows you to select a language, and see where that language is writing about.
To build the map, the system examines all of the geo-located Wikipedia articles written in particular languages: English, French, Arabic, Hebrew, Swahili, and Farsi. The map never actually shows us who is writing from where, but it can show us that articles around the world are being written in certain languages, which has a number of rich and confounding implications.
"Why, for example, so many Swahili articles in Turkey, and why the distribution of Arabic articles in the U.S.?" asks TraceMedia’s Gavin Baily. "To be honest we’ve only just finished this version, and are in the process of understanding what we have learned!"
The HTML5-based map (built on Open Layers and Google Maps) is a technical feat. It’s condensing roughly a million points of data for us to enjoy on screen. But there’s so much more that it could do. Right now, Mapping Wikipedia enables a few other search criteria, like filtering regions that have the most highly edited articles or the highest propensity of anonymous edits. But it could show us a lot by pulling in information from a source other than Wikipedia itself. Consider, for instance, layering population density on top of all of this information. You could even normalize the results based upon this information to weigh whether hot spots were because of mere population (especially in big cities) or if the region is simply of particular importance.
"We’ve considered adding a population density basemap so that you can see the differences more easily," Baily tells Co.Design. "In English, there is quite a strong correlation. But China, for example, is far less dense and the U.S. far more. In French, there are disproportionately few articles for India, and the other languages have very particular focuses that don’t correlate that well with population."
And we could pitch other ideas to Baily’s team all day. It’s a magnificently complicated project that only grows more fun at every turn. Well, fun and extraordinarily difficult to make any semblance of sense out of.
[Hat tip: FlowingData]