Determining local influence in OpenStreetMap through automated language detection

OpenStreetMap (OSM) is a crowdsourced digital map of the world. The project began in Europe and still maintains its largest contributor base there, so when we look at OSM somewhere in the Global South, how much of the map is built by people with local knowledge and how much is built by “armchair mappers” from far away? Do local and nonlocal contributors even affect the map differently?

Dominant language of OSM contributor comments

Dominant language of OSM contributor comments

One possible source of answers is in the metadata comments left by contributors when they save their OSM edits. These comments can be written in many languages and are tied to the geographic bounding box of the edits; thus, we can map some of the patterns of language usage by OSM users.

Using the open source langid.py Python module, we detected the languages and locations of over 100,000 OSM user comments across South America. Publicly available OSM user profile pages were then consulted to estimate the percentages of people employing each language who are from inside and outside South America. About two-thirds of people using English are from outside the continent, whereas over 85% of those using Spanish and Portuguese are from inside the continent. Knowing these distributions, one can more confidently comprehend regions where nonlocal influence is high, such as the belt running across the interior of South America in a southwest-to-northeast direction in the map.

We calculated that these regions of heavy English usage are slightly poorer and more rural than other areas in South America, suggesting that residents of these places have less influence over the content of the map. When the types of features added by users of each language are tallied, English speaking users tend to add items related to travel and mass consumption (such as shopping malls), as well as things easily traced from the air (such as stadiums and “unclassified” roads). Spanish and Portuguese users, on the other hand, tend to add things related to fulfilling the routines of everyday life, such as bus stops, corner stores, playgrounds, and health clinics. It seems that a map predominantly made by local contributors is more useful for people on the ground.

View an early talk about the project

This talk was given at State of the Map 2014 when the research was still in progress. It is helpful for getting a general feel for the methods used; however, after the talk was given, some of the research was corrected and expanded. Thus, precise statistics should not be quoted from this video. The blog text above and the article below contain the latest correct figures.

Read more

Quinn, S. (2016). A Geolinguistic Approach for Comprehending Local Influence in OpenStreetMap. Cartographica, 51(2), 67–83. [PDF (Author’s accepted version)]