(Click to see full image)
This graphic illustrates the number of pages indexed by Google about each country.
Data
The data were collected through the Google Custom Search API. We searched for each country name in English and up to 23 other languages, that is all languages with more than 50 million native speakers according to Ethnologue (excluding Yue Chinese, which is not coded in GeoNames), resulting in 6046 queries. We supplemented this list of official names (such as ‘United States of America’) with commonly used names (such as ‘United States’), and common acronyms (such as ‘USA’). About 3% of alternative country names in GeoNames do not have an associated language, and have been excluded from the queries.
Preliminary experiments have shown little if any variation in the number of retrieved pages between the different Google domains (e.g., google.com, google.co.uk, etc.), for the same search query. Because of this we directed all queries to google.com. We didn’t use any further filters on the language or country of origin of pages.
The information is presented in an interactive tree map, where the area of each rectangle reflects one of four options: (1) the number of retrieved pages; (2) the number of Internet users; (3) the total population; (4) the area of each country. Each country is assigned a colour corresponding to its world region. A country is given a darker shade if it has a relatively high number of Web pages per Internet user (and a lighter shade if it has a relatively low number of Web pages per Internet user).
Being based on the names listed in the GeoNames gazetteer, this analysis is biased, due to the uneven geographies of the gazetteer, as described in a previous post. Most country have associated names in over 20 languages, whereas Kosovo only has associated names in 13 languages, Bonaire, Sint Eustatius and Saba, Curaçao, South Sudan, and Sint Maarten in seven languages, and Zimbabwe only in four languages. A country name may have the same spelling in different languages, thus, as no filter on language or country of the pages has been used, such names have been searched only once.
Findings
The most interesting finding in this work is the fact that Google appears to contain a relatively large number of pages about even the smallest and most sparsely populated territories. Even the Pitcairn Islands (a group of islands in the Pacific with a population of 56 people) are mentioned in over 10 million Web pages. One thing that is important to point out here is that Google claims that the number of results they return is an estimate: so, in some cases, they may be greatly overestimating the amount of content about a place.
Nonetheless, there is a strong correlation between the number of Web pages mentioning a country and the number of Internet users in that country, with the number of Internet users accounting for more than half of the variation of the amount of content about a place. In contrast, the total population or the area of a country account for only about 33% and 16%, respectively, of the variation in amount of content about a place.
The United States has the largest total number of mentions in Google’s index, followed by Japan, China, and the United Kingdom. The Marshall Islands is the country with the most pages per Internet user, with over 12,000 mentions per Internet user. Several other Pacific islands and European city-states (such as San Marino and Vanuatu) have similar ratios. Among the larger countries, those with a higher number of mentions per Internet user are the Central African Republic, Eritrea, and Chad, with over 400 mentions in Google’s index per Internet user.
Whilst Google does not seem to be characterised by the massive geographic inequalities that characterise many other types of digital information (e.g., Wikipedia or photo-sharing platforms) , we do still see a very selective representation of our planet. The fact that we see a strong correlation between mentions of countries and Internet users, unsurprisingly indicates that Google is simply reflecting the broader uneven information inequalities that make up the Web.