How to find data on infectious disease outbreaks

By Rachel Reel

Over the past couple of weeks I have been investigating the current landscape of publicly available data on infectious disease outbreaks. This data has been reported through several channels. Below is a discussion regarding various datasets, what data is collected and how it is presented, how they are financed, and where they can be found.

1.   WHO/ UN databases

The World Health Organization (WHO) has a breadth of datasets available for public access on their webpage. There are some datasets available for specific infectious diseases/ groups of diseases, including: HIV/AIDS, tuberculosis, malaria, neglected tropical diseases, cholera, influenza, meningitis, and sexual transmitted infections. For a more comprehensive dataset on a number of infectious diseases, the ‘Global Health Estimates 2015’ includes disease specific mortality rates, by sex and age. The data is collected and reported from 2000-2015. The data is available in .csv and .xls files, ensuring user-friendly data extraction. This work is funded by the WHO, and can be accessed here.

2.   IHME- Global Burden of Disease (GBD) series

The Lancet has published the largest observational epidemiology study on the burden of disease, conducted from the Institute of Health Metrics Evaluation (IHME) out of the University of Washington. The most recent iteration of the study was conducted in 2016. The burden of disease is commonly measured by mortality, morbidity, incidence, and prevalence in these databases. The collected and reported data dates to 1970 until 2016 and spans 333 diseases and injuries. The data reported include communicable and non-communicable diseases. The databases are presented through multiple articles, allowing supplementary material, and figures and images to be accessed individually. These supplementary materials are not interactive and can only be accessed through PDF formats making it difficult to extract large amounts of data. Although there are a vast number of diseases, not all are present in the datasets- for example: cholera. Each study article was individually funded by several funders, some key organizations include: The Gates Foundation, the National Institutes of Health, the World Bank, the National Science Foundation, and the Indian Council of Medical Research. IHME has made a results tool available online which can filter the 2016 data by location, year, age, sex, and burden measurement across the identified causes/ diseases. This tool allows the data to be extracted onto a .csv file. In addition, all the data sources are available here.

3.   Nature Study

The original study published by Jones et al (2008), ‘Global Trends in Emerging Infectious Diseases’, has since been updated by Allen et al (2017), ‘Global Hotspots and Correlates of Emerging Zoonotic Diseases’. The data in this study is collected through an extensive literature review, collecting data from 1940 onwards. The study identifies the spatial, temporal and biological characteristics of a disease during its initial emergence in the human population. The study seeks to identify why diseases emerge within the human population, rather than providing metrics of mortality and morbidity for each outbreak. Supplementary information includes the dataset created and is made available online via the Nature journal. The initial study was funded by NSF, NIH, The New York Community Trust, V. Kann Rasmussen Foundation and Columbia University Earth Institute fellowship. The updated study was funded by the United States Agency for International Development (USAID) and the Department of the Defense, Defense Threat Reduction Agency.


The Global Infectious Diseases and Epidemiology Network (GIDEON) supplies infectious disease outbreak data to its subscribers, and can accessed here. GIDEON was founded in 1992 and is available as a web based application and an ebook series. The data is collected from peer-reviewed publications, national health ministry reports, and other key global health players (e.g. WHO & CDC). The system is updated frequently to ensure that the data is as accurate and relevant as possible. There are two main categories within GIDEON: infectious disease and microbiology. The database is accessible by a 15-day free trial, and after a monthly subscription fee of $99.90 (1-year contract) or $199.90 (monthly rolling bases). GIDEON is a private organization and funded through these subscription fees. Although GIDEON requires a subscription fee, it has been used in a number of published studies with databases made available. Most notably, Smith et al compiled a comprehensive dataset from GIDEON that spans over a 33-year period (1980-2013).

5.   HealthMap

HealthMap was established in 2006 at Boston Children’s Hospital to provide real-time surveillance of infectious disease outbreaks. The software uses freely available, informal online data sources, including but not limited to: ProMED, WHO, OIE, FAO, Google News, and EuroSurveillance. The data is displayed through a map, each point indicating an outbreak. The data can be filtered by disease, location, source, species, and date. Alternatively, the data can be viewed through a list format or over a time series graph. The data can be accessed online or through their mobile app “outbreaks near me”. The data source is made available primarily through funding by: Google, the Gates Foundation, Unilever, USAID, Amazon, Merck, Twitter, CIHR, CDC, Defense Threat Reduction Agency (DTRA), IARPA, and the U.S. National Library of Medicine. HealthMap can be accessed here.

6.   ProMED

The Program for Monitoring Emerging Disease (ProMED) is a program from the International Society for Infectious Diseases and tracks infectious disease outbreaks and acute exposures to toxins. The data is collected through media reports, official reports, online summaries, local observers, and others. The information submitted by individuals must be accompanied by affiliation identification, and is screened by the ProMED team prior to posting. ProMED is an archived database of infectious disease reports, which makes it difficult to extract large amounts of data efficiently. The program was created to increase communication among the international infectious diseases community, and encourages discussion. ProMED is available through an online website and allows individuals to subscribe to one or more of their “lists” in order to receive updated outbreak reports via email. The lists identify which topic areas are of interest within the ProMED database. ProMED collaborates with the HealthMap at the Boston Children’s Hospital. The funding for ProMED is primarily made available by the Wellcome Trust, Skoll Global Threats Funds, Google, the Gates Foundation, the Rockefeller Foundation, the Oracle Corporation, and the Nuclear Threat Initiative. ProMed can be accessed here.