Using Big Data for Epidemiological Surveillance

Google Flu Trends and the Methodological Shift from ‘Supply’ to ‘Demand’

In June this year (2014), transcript published an edited volume on “Big Data“. I contributed a chapter to this publication, on a topic which fascinates me: data obtained through search engine queries – and hence based on the digital traces which users leave behind. While I have looked into this topic more generally with regards to Google Trends before, this paper analyses Google Flu Trends and the connection between Big Data and epidemiological surveillance more specifically. The paper is in German, but I recently discussed the topic with Max Haiven and Anna Sauerbrey at the SLOW Politics conference in Berlin. Below you can find a summary of the paper and a video of our discussion.

My paper sets out to provide a critique of methodological developments in epidemiological surveillance of Influenza since the 1980s. It focuses on recent studies which are characterized by fundamental changes in the data sources. While traditional approaches in epidemiological surveillance rely on data from clinical and virological diagnosis or mortality rate statistics, studies in “Infodemiology” or “Infoveillance” (Eysenbach 2002, 2006, 2009) are based on Big Data retrieved from Internet sources. Projects such as the Global-Public-Health-Intelligence-Network were able to show correlations between disease outbreaks and the frequency of disease-related keywords in online sources (f.i. news wires, websites). Subsequently, several studies have shown that also users’ search engine queries can indicate Influenza-intensities (Eysenbach 2006; Polgreen et al. 2008; Ginsberg et al. 2009). One can observe a methodological shift in epidemiological surveillance which goes back to technological developments in the field of digital media. In particular, Google Flu Trends (Ginsberg et al.) is a remarkable case of an Influenza-monitoring application based on web search logs.

After NSA-Gate | Max Haiven & Annika Richterich | Mod.: Anna Sauerbrey from Berliner Gazette on Vimeo.

However, these developments should be assessed critically since such Big Data are exclusively available to respective media companies, their advertisement partners and selected scientists. Moreover, recent misinterpretations in Google Flu Trends have shown that the algorithms translating users’ queries into predictions need to be re-calibrated regularly (Butler 2013). Google Flu Trends is staged as philanthropic investment, but it is only one out of many data mining possibilities which are based on the fact that users automatically pay their search engine queries with the data they leave behind. This paper therefore addresses the implications of such entanglements between public health services and corporate objectives: how should Social Sciences evaluate such developments which are relevant to dynamics of power/knowledge regarding public health surveillance? The paper also questions to what extent currently promoted frameworks for online privacy need to be redefined. It will discuss how services such as Google Flu Trends act on one hand as promotional flagships of Big Data, while they likewise partially disclose corporate data mining strategies.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s