The World as Seen by a Search Algorithm

//The World as Seen by a Search Algorithm
Why is [x] so rich / poor?
(Click to see full image)
Click to see full image

These maps show what properties Google Autocomplete associates with countries when one asks the question “why is (country x) so …”. These results offer a window into how Google, and the preferences of millions of Internet users, can actively shape the knowledge we obtain about different parts of the world. We compiled various maps across a range of topics: physical, economic, socio-demographic and whimsical; scroll down to explore these.

Data

Many big technology companies have developed algorithms for providing query suggestions based on input to search fields and/or immediate feedback to users (see this, this, or this patent). These techniques are commonly referred to as autosuggest, incremental search or autocomplete. Google uses the latter name for its implementation in their Web search interface.

Google praises Autocomplete for “find[ing] information quickly by displaying searches that might be similar to the one you’re typing”. Google Autocomplete also relies on Google’s index of the Web, not just what users search for. Additionally, the Autocomplete results are also based on the user’s searches in the past if the user is signed in and the Web history feature is activated (we made sure this was not the case for this analysis), and on user profile data from Google+ (if the user is searching for a person).

The visualizations use geographic data and country names from Natural Earth. The official dataset contained United Kingdom as well as United States of America as country names. Great Britain, Britain and USA, US were added to these (the former case reflects a common point of confusion). Using a Python script, Google.com was queried for all countries with the term: “why [is/are] [the] (country x) so”. Queries thus encompassed, for example: “why is Kenya so“, “why are the Philippines so“, “why is the United Kingdom so“, etc. The phrasing of this crude and somewhat naïve query is meant to reflect what people may find remarkable, great, sad, annoying, surprising, or unknown about a country.

We found suggested questions that are driven by what seems to range from curiosity, compassion, and travel experience to naivety, politics, badmouthing and propaganda – in other words, the results reflect the whole range of human motivations and intentions; the best and the worst in us. The results that we get from Google.com further indicate that the data powering Autocomplete suggestions are probably largely taken from Western users, and thus likely fail to reflect the inputs of information searchers in other parts of the world.

Before mapping, the retrieved data needed to be cleaned up due to semantic issues. Results that seemed to represent a category mistake due to semantic ambiguity of a country name were removed entirely: a prime result being Turkey that yielded responses referring to poultry such as “high in Sodium”, “good for you”, “expensive this year”. Similar semantic ambiguities arose for example for:

  • “Why is Chad on so random”, “why is Chad Ochocinco so bad”: referring to an American football wide receiver,
  • “Why is chili so good/hot/addictive”,
  • “Why is guinea pig so called”,
  • “Why is Jersey so tacky/trashy”: likely referring to the TV show Jersey Shore set in New Jersey, rather than Jersey, the British Crown Dependency,
  • “Why is Kuwait Airways so cheap”, “why is Sri Lankan so expensive”: referring to airlines

These semantic mismatches were removed from the data.

Findings

Google stresses that “Autocomplete predictions are algorithmically determined […] without any human intervention”. Furthermore, the user base (and thus the training dataset) of Google Autocomplete is massive, since it is built into Google Web search in a way that it cannot be turned off by users.

Google may not give Autocomplete suggestions for search terms that are (for example) not popular enough or that became popular too recently, or when the query matches a “narrow class of search queries” about “pornography, violence, hate speech, and copyright infringement” (see also here). Despite these disclaimers and attempts at regulation, Google Autocomplete has spurred legal proceedings and complaints in various countries (for instance by the owners of a hotel in Ireland, the German former First Lady, two individuals whose names were associated with crimes and supporters of the US politician Pat Buchanan). Most recently, Google Autocomplete has been a means of campaigning for UN Women. Those adverts use Google Autocomplete with queries such as “women cannot “ or “women should “ to demonstrate sexism and discrimination against women that is built both into our societies and into our information environments.

Although we have no way of opening the black box of Google’s Autocomplete algorithm, by mapping its results we can peer into some of the variable representations of different parts of our world that they choose to suggest to their users. In what follows we highlight Google Autocomplete mappings of several properties – from physical, through the socio-economic, to the whimsical.

Physical Properties

Google Autocomplete associates many countries with either hot or cold. Interestingly, Thailand is the only country associated with both categories. The semantic ambiguity of the word “hot” (temperature-related versus the meaning of “receiving much interest or attention”) potentially affects parts of the result.

By | 2018-03-28T16:32:04+00:00 March 28th, 2018|